{"id":19746369,"url":"https://github.com/astrolabsoftware/grafink","last_synced_at":"2026-03-06T08:01:56.778Z","repository":{"id":55985822,"uuid":"263824953","full_name":"astrolabsoftware/grafink","owner":"astrolabsoftware","description":"Grafink is a spark ETL job to load data into Janusgraph [GSoC 2020]","archived":false,"fork":false,"pushed_at":"2020-12-02T21:13:08.000Z","size":1074,"stargazers_count":6,"open_issues_count":4,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-06-25T01:07:10.982Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/astrolabsoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-14T05:37:35.000Z","updated_at":"2023-11-19T21:14:09.000Z","dependencies_parsed_at":"2022-08-15T10:50:34.672Z","dependency_job_id":null,"html_url":"https://github.com/astrolabsoftware/grafink","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/astrolabsoftware/grafink","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fgrafink","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fgrafink/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fgrafink/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fgrafink/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/astrolabsoftware","download_url":"https://codeload.github.com/astrolabsoftware/grafink/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fgrafink/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30166859,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T07:56:45.623Z","status":"ssl_error","status_checked_at":"2026-03-06T07:55:55.621Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T02:14:22.948Z","updated_at":"2026-03-06T08:01:56.722Z","avatar_url":"https://github.com/astrolabsoftware.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Grafink\n[![Build Status](https://travis-ci.org/astrolabsoftware/grafink.svg?branch=master)](https://travis-ci.org/astrolabsoftware/grafink)\n[![codecov](https://codecov.io/gh/astrolabsoftware/grafink/branch/master/graph/badge.svg?style=platic)](https://codecov.io/gh/astrolabsoftware/grafink)\n\nGrafink is a data loading and analysis tool for loading and analysing data into / from JanusGraph. It was created to load [Fink](https://fink-broker.org/) data into JanusGraph.\n\nGrafink has 3 components in general\n\n1. Api : This module exposes a REST interface on top of the loaded data in JanusGraph. The source is in ```api``` folder\n2. Core: This is the Spark job that helps in loading data into Janusgraph. The source is in ```core``` folder.\nThe Architecture of the core job is described [here](docs/Architecture.md)\n3. Shell: This provides a scala REPL for ad-hoc data analysis of the loaded data. This uses same configuration as the core module.\nThe source is in ```core/src/scala/com/astrolabsoftware/grafink/shell``` folder\n\n## Grafink Core\n\nGrafink Core is highly configurable. It has 5 major components:\n\n1. SchemaLoader: Loads the graph schema to the configured storage backend.\n2. IDManager: Maintains external copy of data with the custom ids generated for loading the vertices\n3. VertexProcessor: Loads the vertices data into JanusGraph.\n4. EdgeProcessor: Creates edges between the loaded vertices based on specified rules.\n5. VertexClassifier: These are classes that describe an algorithm for connecting vertices, thereby\nintroducing a set of edges in the graph.\n\n### Configuration\n\nThere are a number of configuration options using which we can customize grafink core job.\nHere is a sample config file with example options and inline comments for explanation\n\n```hocon\n// Data Reader options\nreader {\n  // Specify the base path to data\n  basePath = \"/test/base/path\"\n  // Format of the data to read\n  format = \"parquet\"\n  // Columns to keep when reading the data, since we might not be interested in all the data\n  keepCols = [\"objectId\", \"schemavsn\", \"publisher\", \"fink_broker_version\", \"fink_science_version\", \"candidate\", \"cdsxmatch\", \"rfscore\", \"snn_snia_vs_nonia\", \"roid\"]\n  // Renames a few columns after reading, to simplify processing\n  keepColsRenamed =\n    [ { \"f\": \"mulens.class_1\", \"t\": \"mulens_class_1\" },\n      { \"f\": \"mulens.class_2\", \"t\": \"mulens_class_2\" },\n      { \"f\": \"cutoutScience.stampData\", \"t\": \"cutoutScience\" },\n      { \"f\": \"cutoutTemplate.stampData\", \"t\": \"cutoutTemplate\" },\n      { \"f\": \"cutoutDifference.stampData\", \"t\": \"cutoutDifference\" },\n      { \"f\": \"candidate.classtar\", \"t\": \"classtar\" },\n      { \"f\": \"candidate.jd\", \"t\": \"jd\" }\n    ]\n  // Adds new columns to the data by applying specified sql expression on existing columns\n  // The columns being passed to the expression can be renamed columns as mentioned above as well\n  // For eg: below adds a 'rowkey' column to the data by concating objectId and jd columns\n  // where jd is a renamed column mentioned above\n  newCols = [\n    { \"name\": \"rowkey\", \"expr\": \"objectId || '_' || jd as rowkey\" }\n  ]\n}\n\n// IDManager options\nidManager {\n  // This is used by IDManagerSparkService to store the data along with generated ids\n  spark {\n    // This introduces a reserved space for custom id generated for loading the data\n    // This is reserved for adding some fixed vertices to the graph, different from the data being loaded,\n    // for example for adding 'similarity' vertices for specific recipes to which the data (alert) vertices\n    // can be connected via edges later.\n    // So in this case, custom ids will be generated from 201 onwards instead of 1\n    reservedIdSpace = 200\n    // The base path where data will be generated. Note that partitioning of the original data is maintained\n    dataPath = \"/test/intermediate/base/path\"\n    // Whether to clear IDManager data when running grafink in delete mode to delete data from JanusGraph\n    clearOnDelete = false\n  }\n  // This configuration is used by IDManager backed by HBase\n  hbase {\n    // The table name that stores current id offset\n    tableName = \"IDManagement\"\n    // Column family\n    cf = \"cf1\"\n    // Qualifier / column name\n    qualifier = \"id\"\n  }\n}\n\n// Options specific to data loading job, vertex and edge loaders\njob {\n  // Specifies the schema of the vertices to be loaded\n  schema {\n    // These vertex labels will be created together with the specified properties\n    vertexLabels = [\n          {\n            // Vertex label name\n            name = \"alert\"\n            // Here we can specify any property that we want to create, which is not present in the data\n            properties = []\n            // These columns from data will be converted to vertex properties in the graph\n            propertiesFromData = [\n              \"rfscore\",\n              \"snn_snia_vs_nonia\",\n              \"mulens_class_1\",\n              \"mulens_class_2\",\n              \"cdsxmatch\",\n              \"roid\",\n              \"classtar\",\n              \"objectId\",\n              \"rowkey\",\n              \"candid\",\n              \"jd\",\n              \"magpsf\",\n              \"sigmapsf\"\n            ]\n          },\n          {\n            name = \"similarity\"\n            // So these 2 properties will be created for similarity vertex label\n            properties = [\n              {\n                name = \"recipe\"\n                typ = \"string\"\n              },\n              {\n                name = \"equals\"\n                typ = \"string\"\n              }\n            ]\n            propertiesFromData = []\n          }\n        ]\n    // List of edge labels and their properties to be created\n    edgeLabels = [\n      {\n        name = \"similarity\"\n        properties = {\n          key = \"value\"\n          typ = \"int\"\n        }\n      }\n    ]\n    // List of indices to add while loading the schema\n    index {\n      // Adds composite indices, handled by JanusGraph storage backend\n      composite = [\n        {\n          // Name of the index, should be unique\n          name = \"objectIdIndex\"\n          // Vertex property keys to index\n          properties = [\"objectId\"]\n        }\n      ]\n      // Adds mixed indices, handled by JanusGraph indexing backend\n      mixed = [\n        {\n          // Name of the index, should be unique\n          name = \"rfScoreAndcdsx\"\n          // Vertex property keys to index\n          properties = [\"rfscore\", \"cdsxmatch\"]\n        }\n      ]\n      // Adds vertex centric indices, handled by JanusGraph storage backend \n      edge = [\n        {\n          // Name of the index, should be unique\n          name = \"similarityIndex\"\n          // Edge property keys to index, handled as 'RelationType' \n          properties = [\"value\"]\n          // Edge Label for which to create the index\n          label = \"similarity\"\n        }\n      ]\n    }\n  }\n  // VertexLoader batch settings\n  vertexLoader {\n    // Currently not being used, intended to batch the janusgraph vertex loading transactions\n    batchSize = 100\n    // The vertices created from the data will be labelled as this label\n    label = \"alert\"\n    // This config specifies path to a file that describes certain fixed vertices\n    // that will be created before loading the data. There is a check in place to make sure these vertices\n    // are not added again on every run.\n    fixedVertices = \"/fixedvertices.csv\"\n  }\n  // EdgeLoader settings\n  edgeLoader = {\n    // Currently not being used, intended to batch the janusgraph edge loading transactions\n    batchSize = 100\n    // Default parallelism for loading edges when the total edges to be loaded per classifier is less than taskSize\n    parallelism = 50\n    taskSize = 25000\n    // This config defines the rules to be applied for creating edges in the graph.\n    // Note that each rule specified here will add a SET of edges to the graph, as described\n    // by the algorithm of the rule (see VertexClassifiers)\n    rulesToApply = [\"twoModeClassifier\", \"sameValueClassifier\"]\n    // Configurations for each of the supported rules that can be applied to generate edges in the graph\n    // See the section on VertexClassifer to read more about these\n    rules {\n      similarityClassifer {\n        similarityExp = \"(rfscore AND snnscore) OR mulens OR cdsxmatch OR objectId OR roid\"\n      }\n      twoModeClassifier {\n        recipes = [\"supernova\", \"microlensing\", \"catalog\", \"asteroids\"]\n      }\n      sameValueClassifier {\n        colsToConnect = [\"objectId\"]\n      }\n    }\n  }\n  // JanusGraph storage settings, currently using hbase as storage backend\n  storage {\n    host: \"127.0.0.1\"\n    port: 8182\n    tableName = \"TestJanusGraph\"\n    // Additional configurations to be passed to hbase when opening connection to it\n    extraConf = [\"zookeeper.recovery.retry=3\", \"hbase.client.retries.number=0\"]\n  }\n  // JanusGraph Indexing Backend settings\n  indexBackend {\n    // We use ElasticSearch here, but other backends like Solr can also be used\n    name = \"elastic\"\n    // Name of the Elasticsearch index to create for mixed indices\n    indexName = \"elastictest\"\n    // Host Port for Elasticsearch\n    host: \"127.0.0.1:9200\"\n  }\n}\n\n// HBase client settings, in case we use HBase backed IDManager, not being used currently\nhbase {\n  zookeeper {\n    quoram = \"hbase-1.lal.in2p3.fr\"\n  }\n}\n```\n\n### SchemaLoader\n\nThe schema model is described [here](docs/Schema-Model.md)\nGrafink tries to take advantage of bulk-loading feature in Janusgraph and disables\nthe schema checks in place while loading vertices and edges.\nHence it pre-creates the required graph schema.\nThe supported Graph Elements while creating the schema include\n- Vertex Labels and Properties\n- Edge Label and Properties\n- Composite Index\n- Mixed Index\n- Vertex-Centric (Edge) Index\n\nThere is a mechanism to check if the schema in the target storage table already exists\nand then load the schema only if needed.\n\n### IDManager\n\nGrafink generates custom ids to load vertices into JanusGraph. ```IDManager``` will generate\nthese custom ids and maintain the loaded data along with these ids.\nWhen new data is loaded, it gets the max last id used and then adds the new ids starting from\nthe last max.\n\n### VertexProcessor\n\nLoads the vertices using custom ids into Janusgraph. Each alert data row is ingested as a vertex.\nThe alert data is processed as a dataframe, and then for each partition of the dataframe, an embedded\ninstance of janusgraph is created, and they are loaded parallely from spark executors.\n\n### EdgeProcessor\n\nThe EdgeProcessor will load edges between the generated vertices as well as between generated and old vertices\nin the graph.\nEdgeProcessor can be supplied a list of rules. Each rule will generate a ```Dataset``` of ```MakeEdge```,\nwhere each row represents an edge to be added.\nEach sets of these edges are then converted into JanusGraph edges.\n\nLike the VertexProcessor, EdgeProcessor will also load edge partitions in parallel.\nThe ```edgeLoader.parallelism``` controls the number of partitions being loaded in parallel,\nin case the number of edges to load is less than ```edgeLoader.taskSize```. In case edges to load\nare more than that, number of partitions are calculated as ```(number of edges to load / edgeLoader.taskSize) + 1```\n\n### VertexClassifer\n\nVertexClassifiers are rules, each of which creates a set of edges in the graph.\nIn grafink we can configure any number of such rules to add edges to the graph, when ingesting data.\n\nSupported classifiers are described in detail in this document: [VertexClassifiers](docs/classifiers/VertexClassifiers.md)\nAny of the supported classifiers can be configured as a rule to be applied to create the corresponding edges in the graph.\n\n### Compiling from source\n\n```\nsbt compile\n```\n\nTo compile against scala 2.11\n\n```\nsbt ++2.11.11 core/compile\n```\n\n### Code format\n\nThis project uses scalafmt to format code. For formatting code:\n\n```\nsbt scalafmt        // format sources\nsbt test:scalafmt   // format test sources\nsbt sbt:scalafmt    // format.sbt source\n```\n\n### Running unit tests\n\n```\nsbt test\n```\n\n### Creating Assembly jar\n\n```\nsbt assembly\n```\n\n### Creating Distribution\n\n```\nsbt core/dist\n```\n\nThe above creates a deployable zip file `grafink-\u003cversion\u003e.zip`. The contents of the zip file are:\n\n  - conf/application.conf  // Modify this config file according to the job requirements.\n  - grafink assembly jar   // The main executable jar for running spark job.\n  - bin/grafink            // The main executable script that user can invoke to start the job.\n\nFor compiling and packaging against scala 2.11:\n\n```\nsbt ++2.11.11 core/dist\n```\n\n### Running Job\n\nGrafink command line can be passed the following parameters\n\n| Parameter | Description | Mandatory | Default value if Not specified |\n|-----------|-------------|-----------|--------------------------------|\n|--config|Path to the configuration file|Yes|-|\n|--startdate|Start date for which to run the job in \u003cyyyy-MM-dd\u003e format|No|Yesterday's Date|\n|--duration|Number of days data for which the job will run, starting from startdate|No|1|\n|--num-executors|Spark config passed along to spark submit|No|-|\n|--driver-memory|Spark config passed along to spark submit|No|-|\n|--executor-memory|Spark config passed along to spark submit|No|-|\n|--executor-cores|Spark config passed along to spark submit|No|-|\n|--total-executor-cores|Spark config passed along to spark submit|No|-|\n|--total-executor-cores|Spark config passed along to spark submit|No|-|\n|--conf|Spark config|No|-|\n\nTo run locally\n\n```\n./bin/grafink --config conf/application.conf --startdate \u003cyyyy-MM-dd\u003e --duration 1 --num-executors 2 --driver-memory 2g --executor-memory 2g\n```\n\nTo run over Mesos cluster\n\n```\nexport SPARK_MASTER=\"mesos://\u003chost\u003e:\u003cport\u003e\"\n```\n\nThen run\n\n```\n./bin/grafink --config conf/application.conf --startdate \u003cYYYY-mm-dd\u003e --duration \u003c# of days\u003e --driver-memory 2g --executor-memory 3g --conf spark.mesos.principal=\u003cprincipal\u003e --conf spark.mesos.secret=\u003csecret\u003e --conf spark.mesos.role=\u003crole\u003e --conf spark.cores.max=100 --conf spark.executor.cores=2\n```\n\nfor eg:\n\n```\n./bin/grafink --config conf/application.conf --startdate 2019-11-01 --duration 1 --driver-memory 2g --executor-memory 3g --conf spark.mesos.principal=lsst --conf spark.mesos.secret=secret --conf spark.mesos.role=lsst --conf spark.cores.max=100 --conf spark.executor.cores=2\n```\n\nNote that by default grafink runs in ```client``` mode, but this is easily modifiable.\n\n## Grafink Shell\n\nGrafink supports querying the loaded data interactively via REPL shell. It is based off Ammonite REPL\nand comes with all the goodness that Ammonite provides out of the box like multi-line editing, syntax\nhighlighting, pretty printing etc.\nNote that Grafink disables autoloading and saving of scripts that is the default mode in Ammonite since\nit is not multi-user friendly, hence all the shell storage is in-memory.\nGrafink adds to Ammonite to provide a preconfigured connection to the desired JanusGraph storage backend\nby simply passing in the same configuration file which was used to load the data into JanusGraph\n\nTo run the shell, simply:\n\n```\n./bin/grafink-shell --config conf/application.conf\n```\n\nHere is a snapshot of the welcome screen\n\n```\n\n  .oooooo.                        .o88o.  o8o              oooo        \n d8P'  `Y8b                       888 `\"  `\"'              `888        \n888           oooo d8b  .oooo.   o888oo  oooo  ooo. .oo.    888  oooo  \n888           `888\"\"8P `P  )88b   888    `888  `888P\"Y88b   888 .8P'   \n888     ooooo  888      .oP\"888   888     888   888   888   888888.    \n`88.    .88'   888     d8(  888   888     888   888   888   888 `88b.  \n `Y8bood8P'   d888b    `Y888\"\"8o o888o   o888o o888o o888o o888o o888o \n                                                                       \n                                                                       \n                                                                       \nWelcome to Grafink Shell 0.1.0-SNAPSHOT\nJanusGraphConfig available as janusConfig\nJanusGraph available as graph, traversal as g\ngrafink\u003e\n\n```\n\nSome sample command executions\n\n```\ngrafink\u003eval mgmt = graph.openManagement\nmgmt: org.janusgraph.core.schema.JanusGraphManagement = org.janusgraph.graphdb.database.management.ManagementSystem@1c815814\n\ngrafink\u003emgmt.getGraphIndexes(classOf[Vertex]).asScala.toList\nres3: List[org.janusgraph.core.schema.JanusGraphIndex] = List(objectIdIndex, rfScoreAndcdsx)\n\ngrafink\u003eg.V().has(\"objectId\", \"ZTF19acmcetc\").next()\nres4: Vertex = v[256]\n\ngrafink\u003eg.V().count().next()\n632899 [main] WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes\nres0: java.lang.Long = 1046L\n\ngrafink\u003eg.V().outE(\"similarity\").has(\"value\", 2).asScala.toList\n219832 [main] WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes\nres11: List[Edge] = List(\n  e[6rzbi8-mbk-6c5-1fk0][28928-similarity-\u003e66816],\n  e[7ghekg-10cg-6c5-2pz4][47104-similarity-\u003e126976],\n  e[5fbzls-1edc-6c5-1rsw][65280-similarity-\u003e82688],\n  e[6rzb40-1fk0-6c5-mbk][66816-similarity-\u003e28928],\n  e[5fbz7k-1rsw-6c5-1edc][82688-similarity-\u003e65280],\n  e[21kfeo-296o-6c5-42kg][105216-similarity-\u003e189952],\n  e[7ghe68-2pz4-6c5-10cg][126976-similarity-\u003e47104],\n  e[21kf0g-42kg-6c5-296o][189952-similarity-\u003e105216]\n)\n\ngrafink\u003eg.V(\"28928\").outE(\"similarity\").has(\"value\", 2).asScala.toList\nres12: List[Edge] = List(e[6rzbi8-mbk-6c5-1fk0][28928-similarity-\u003e66816])\n\ngrafink\u003eval mgmt = graph.openManagement\ngrafink\u003eshow(mgmt.printSchema)\n\"\"\"------------------------------------------------------------------------------------------------\nVertex Label Name              | Partitioned | Static                                             |\n---------------------------------------------------------------------------------------------------\nalert                          | false       | false                                              |\n---------------------------------------------------------------------------------------------------\nEdge Label Name                | Directed    | Unidirected | Multiplicity                         |\n---------------------------------------------------------------------------------------------------\nsimilarity                     | true        | false       | MULTI                                |\n---------------------------------------------------------------------------------------------------\nProperty Key Name              | Cardinality | Data Type                                          |\n---------------------------------------------------------------------------------------------------\nrfscore                        | SINGLE      | class java.lang.Double                             |\nsnnscore                       | SINGLE      | class java.lang.Double                             |\nmulens_class_1                 | SINGLE      | class java.lang.String                             |\nmulens_class_2                 | SINGLE      | class java.lang.String                             |\ncdsxmatch                      | SINGLE      | class java.lang.String                             |\nroid                           | SINGLE      | class java.lang.Integer                            |\nclasstar                       | SINGLE      | class java.lang.Float                              |\nobjectId                       | SINGLE      | class java.lang.String                             |\nrowkey                         | SINGLE      | class java.lang.String                             |\ncandid                         | SINGLE      | class java.lang.Long                               |\njd                             | SINGLE      | class java.lang.Double                             |\nmagpsf                         | SINGLE      | class java.lang.Float                              |\nsigmapsf                       | SINGLE      | class java.lang.Float                              |\nvalue                          | SINGLE      | class java.lang.Integer                            |\n---------------------------------------------------------------------------------------------------\nVertex Index Name              | Type        | Unique    | Backing        | Key:           Status |\n---------------------------------------------------------------------------------------------------\nobjectIdIndex                  | Composite   | false     | internalindex  | objectId:     ENABLED |\nrowkeyIndex                    | Composite   | false     | internalindex  | rowkey:       ENABLED |\n---------------------------------------------------------------------------------------------------\nEdge Index (VCI) Name          | Type        | Unique    | Backing        | Key:           Status |\n---------------------------------------------------------------------------------------------------\n---------------------------------------------------------------------------------------------------\nRelation Index                 | Type        | Direction | Sort Key       | Order    |     Status |\n---------------------------------------------------------------------------------------------------\nsimilarityIndex                | similarity  | BOTH      | value          | asc      |    ENABLED |\n---------------------------------------------------------------------------------------------------\n\"\"\"\n\n```\n\n## Grafink API\nFind detailed information about the API module and supported APIs [here](docs/API.md)\n\n## Credits\nThis project would not have been possible without the outstanding work of the following projects:\n\n- [Ammonite-REPL](http://ammonite.io/): A modernized scala REPL, which provides the base for grafink-shell module.\n- [Apache Spark](https://spark.apache.org/): Unified Analytics Engine for Big Data, which is used for processing the data in grafink.\n- [Banana](https://github.com/yihleego/banana): A FIGlet utility for java, used to render startup logo in grafink-shell.\n- [circe](https://github.com/circe/circe): A JSON parsing library for scala, used by grafink-api module for serialization/deserialization of requests / response.\n- [FastParse](http://www.lihaoyi.com/fastparse/): A scala library for parsing strings into structured data, used by grafink to handle expressions of [SimilarityClassifier](docs/classifiers/VertexClassifiers.md#similarityclassifer).\n- [http4s](https://http4s.org/): A typeful, functional, streaming HTTP library for scala, used by grafink-api module.\n- [JanusGraph](https://janusgraph.org/): Scalable graph database optimized for storing and querying massive amounts of vertices and edges, used as the data sink for grafink jobs.\n- [Pureconfig](https://github.com/pureconfig/pureconfig): A scala library for loading configuration files, used by grafink to handle all its configuration.\n- [scala-csv](https://github.com/tototoshi/scala-csv): A scala library for parsing csv files, used by grafink to parse static data for [TwoModeClassifier](docs/classifiers/VertexClassifiers.md#twomodeclassifier).\n- [scopt](https://github.com/scopt/scopt): A scala library for constructing command line option parser, used by grafink to handle CLI options for the job.\n- [spark-daria](https://github.com/MrPowers/spark-daria): Spark helper methods, used by grafink for validation checks on dataframe column names.\n- [ZIO](https://zio.dev/): A zero dependency scala library for asynchronous, concurrent programming, used by zio for a clean, purely functional codebase.\n\n## Benchmarks\n\nSome benchmarks are specified [here](docs/Benchmarks.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrolabsoftware%2Fgrafink","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrolabsoftware%2Fgrafink","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrolabsoftware%2Fgrafink/lists"}