{"id":23291407,"url":"https://github.com/rothamsted/graphdb-benchmarks","last_synced_at":"2025-08-21T22:32:08.392Z","repository":{"id":49964258,"uuid":"145151215","full_name":"Rothamsted/graphdb-benchmarks","owner":"Rothamsted","description":"Application to benchmark Neo4j+Cypher/Virtuoso+SPARQL/ArcadeDB+Gremlin querying","archived":false,"fork":false,"pushed_at":"2024-06-23T23:05:44.000Z","size":44611,"stargazers_count":7,"open_issues_count":2,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-12-06T19:06:50.939Z","etag":null,"topics":["arcadedb","benchmark","graph-database","gremlin","neo4j","rdf","sparql"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Rothamsted.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-17T17:56:09.000Z","updated_at":"2024-09-26T19:21:29.000Z","dependencies_parsed_at":"2022-09-09T20:30:54.741Z","dependency_job_id":"28902990-3c21-4744-9d7c-9a82d20d0383","html_url":"https://github.com/Rothamsted/graphdb-benchmarks","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rothamsted%2Fgraphdb-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rothamsted%2Fgraphdb-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rothamsted%2Fgraphdb-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rothamsted%2Fgraphdb-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Rothamsted","download_url":"https://codeload.github.com/Rothamsted/graphdb-benchmarks/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230537060,"owners_count":18241519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arcadedb","benchmark","graph-database","gremlin","neo4j","rdf","sparql"],"created_at":"2024-12-20T05:17:02.845Z","updated_at":"2024-12-20T05:17:03.422Z","avatar_url":"https://github.com/Rothamsted.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Managing FAIR Knowledge Graphs as Polyglot Data End Points: A Benchmark based on the rdf2pg Framework and Plant Biology Data   \n \nThis repository contains code to benchmark three different graph databases and graph query languages, against \nplant biology datasets, which are conceptually aligned (based on the same data model) in the different database/language flavours.\n\nThis is work by the [KnetMiner team][I4] and [Carlos Bobed][I5].\n\nThe alignment is produced by means of the [rdf2pg framework][I10], and this work contributes to assess the benefits of managing data in multiple data languages and formats, by means of our rdf2pg tools.\n\nThis work is an extension of [previous work by the KnetMiner team][I20], which we presented at [SWAT4LS 2018][I30] ([old presentation here][I40]).\n\n[I4]: https://knetminer.com\n[I5]: https://scholar.google.com/citations?user=ycIA_f4AAAAJ\n[I10]: https://github.com/Rothamsted/rdf2pg\n[I20]: /Rothamsted/graphdb-benchmarks/releases/tag/swat4ls18\n[I30]: https://figshare.com/articles/Getting_the_best_of_Linked_Data_and_Property_Graphs_rdf2neo_and_the_KnetMiner_Use_Case/7314323\n[I40]: https://www.slideshare.net/mbrandizi/swat4l-2018brandizi\n\n## Test settings\n\nWe have tested three combinations of graph database, graph query language, data formats:\n\n1. SPARQL on the Virtuoso triple store, dealing with RDF data (and the corresponding model).\n1. Cypher on the Neo4j graph database, with data directly imported into the database from our rdf2neo tool.\n1. Gremlin on ArcadeDB, with data imported from files in graphML format.\n\nDetails on the test settings used are in the [dataset loading results report][TS10].\n\n## Test datasets\n\nFor each of the graph databases mentioned above, we have tested the loading and the query performance of three datasets:\n\n1. Biopax: a small dataset, mostly containing data about the Arabidopsis model organisms, including \n   pathways from [AraCyc][DS10] and gene annotations from [Gene Ontology][DS20].\n1. Arabidopsis: a medium-size dataset, containing more data about Arabidopsis, including AraCyc, Gene Ontology, gene annotations from [ENSEMBL Plants][DS30] and [TAIR][DS35], protein annotations from [UniProt][DS40], scientific publications from [PubMed][DS45].\n1. Poaceae: a large dataset with integrated data about different cereals (wheat, rice and barley), obtained from a variety of sources, including the ones mentioned above, plus genome-wide study data from [AraGWAS][DS50] and more. Partial access to this dataset is available via [KnetMiner programmatic data access endpoints][DS60].\n\n[DS10]: https://academic.oup.com/plphys/article/132/2/453/6111635?login=false\n[DS20]: https://academic.oup.com/nar/article/32/suppl_1/D258/2505186?login=false\n[DS30]: https://link.springer.com/protocol/10.1007/978-1-4939-3167-5_6\n[DS35]: https://doi.org/10.1093/nar/gkm965\n[DS40]: https://academic.oup.com/nar/article/43/D1/D204/2439939?login=false\n[DS45]: https://doi.org/10.1073/pnas.98.2.381\n[DS50]: https://doi.org/10.1093/nar/gkx954\n[DS60]: https://knetminer.com/data\n\n### Data schematisation\n\nThe figure below shows the main types contained in each dataset:\n\n[\u003cimg src = 'results/knet-schema-ex.png' width = '70%' /\u003e](results/knet-schema-ex.png)\n\nThese model was encoded based on [BioKNO, an application ontology][TS20], defined within the KnetMiner platform, to represent the data we deal with in the KnetMiner platform. This models common plant biology entities, some specific pattern used by KnetMiner applications and mappings to existing biology ontology and life science standards.\n\n[TS10]: results/loading-results.ipynb\n[TS20]: https://github.com/Rothamsted/bioknet-onto\n\n## Test approach\n\nWe have done two types of tests: \n\n### Data Loading tests\n\n[Loading tests][TS10], where we tested the time taken to populate each dataset with each of the tested datasets. See the linked report for details\n\n### Querying tests\n\nAfter loading each dataset, we performed [querying tests][TA10], where, for each dataset, we tested all of the chosen databases and query languages, each time timing the same set of queries. More precisely, for each of the tested query languages, we wrote conceptually equivalent queries.\n\nWhile \"conceptually equivalent\" is difficult to define precisely, informally, it means the best effort to search for data that have the same semantics and equivalent representations in the different technologies and formats being tested. It also means writing queries that, across different technologies, present similar levels of complexity and search engine challenges.\n\nFor example, where it is easy for Neo4 to return a node property or an empty value (because they are attached to the nodes), we have translated this as OPTIONAL matches in SPARQL (since looking for a resource property is a triple pattern like any other). \n\n[TA10]: results/querying-results.ipynb\n\n\n## Test results\n\nThe (Jupyter-based) reports linked above has more test details and detailed results linked above.\n\n### TODO: Updates about ArcadeDB\n\n* We have started testing ArcadeDB with its SQL dialect, using the same datasets and the same queries. [This is a preliminary result](querying-results-arcade-sql.ipynb), work to be continued.\n\n\n## Query List\n\nLike the data, the queries listed below are based on the already-mentioned [BioKNO ontology][TS20].\nWe have split the benchmark queries into categories that take into account both the query semantics and the kind of challenge it puts on the query engines.  \n\nRegarding the semantic motif queries, these produce patterns that occur often in KnetMiner, when we want to associate genes to relevant other entities (such as encoded proteins, biological processes, publications about genes or processes). In practice, a semantic motif query is a 'chain' pattern, it tries to follow a linear path from a gene to another entity, through a known chain of relations (eg, Gene -\u003e encodes -\u003e Protein -\u003e participates -\u003e Process -\u003e mentioend -\u003e Publication). Details in the [KnetMiner Wiki][QL10] and in the [KnetMiner paper][QL20]\n \n\n[QL10]: https://github.com/Rothamsted/knetminer/wiki/Semantic-Motif-Searching-in-Knetminer\n[QL20]: https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583\n[QL100]: https://github.com/Rothamsted/graphdb-benchmarks/blob/master/src/test/java/uk/ac/rothamsted/rdf/benchmarks/QueryListTest.java\n\n**WARNING**: *do not edit what follows! It is automatically generated via [this code][QL100].*\n\n### Category: counts\n\nCommon counts of elements like number of nodes, number of relations, etc.\n\n1. **cnt**: Counts instances, [SPARQL](src/main/assembly/resources/sparql/0010_cnt.sparql), [Cypher](src/main/assembly/resources/cypher/0010_cnt.cypher), [Gremlin](src/main/assembly/resources/gremlin/0010_cnt.gremlin)\n1. **cntType**: Instances of a given type, [SPARQL](src/main/assembly/resources/sparql/0020_cntType.sparql), [Cypher](src/main/assembly/resources/cypher/0020_cntType.cypher), [Gremlin](src/main/assembly/resources/gremlin/0020_cntType.gremlin)\n1. **cntRel**: Count relations, [SPARQL](src/main/assembly/resources/sparql/0030_cntRel.sparql), [Cypher](src/main/assembly/resources/cypher/0030_cntRel.cypher), [Gremlin](src/main/assembly/resources/gremlin/0030_cntRel.gremlin)\n1. **cntRelType**: Count relations of a given type, [SPARQL](src/main/assembly/resources/sparql/0040_cntRelType.sparql), [Cypher](src/main/assembly/resources/cypher/0040_cntRelType.cypher), [Gremlin](src/main/assembly/resources/gremlin/0040_cntRelType.gremlin)\n\n\n### Category: selects\n\nQueries that selects elements, including simple joins.\n\n1. **sel**: Select entity and properties, [SPARQL](src/main/assembly/resources/sparql/0050_sel.sparql), [Cypher](src/main/assembly/resources/cypher/0050_sel.cypher), [Gremlin](src/main/assembly/resources/gremlin/0050_sel.gremlin)\n1. **join**: Simple Join, [SPARQL](src/main/assembly/resources/sparql/0060_join.sparql), [Cypher](src/main/assembly/resources/cypher/0060_join.cypher), [Gremlin](src/main/assembly/resources/gremlin/0060_join.gremlin)\n1. **joinRel**: Join literal properties of reified relations, [SPARQL](src/main/assembly/resources/sparql/0070_joinRel.sparql), [Cypher](src/main/assembly/resources/cypher/0070_joinRel.cypher), [Gremlin](src/main/assembly/resources/gremlin/0070_joinRel.gremlin)\n1. **joinFilter**: Simple join + attribute filter, [SPARQL](src/main/assembly/resources/sparql/0080_joinFilter.sparql), [Cypher](src/main/assembly/resources/cypher/0080_joinFilter.cypher), [Gremlin](src/main/assembly/resources/gremlin/0080_joinFilter.gremlin)\n1. **joinRe**: Simple join + regex search, [SPARQL](src/main/assembly/resources/sparql/0090_joinRe.sparql), [Cypher](src/main/assembly/resources/cypher/0090_joinRe.cypher), [Gremlin](src/main/assembly/resources/gremlin/0090_joinRe.gremlin)\n1. **joinReif**: Join through relation property, [SPARQL](src/main/assembly/resources/sparql/0095_joinReif.sparql), [Cypher](src/main/assembly/resources/cypher/0095_joinReif.cypher), [Gremlin](src/main/assembly/resources/gremlin/0095_joinReif.gremlin)\n\n\n### Category: unions\n\nQueries that perform graph pattern and subquery unions.\n\n1. **2union**: 2 unions, no nesting, [SPARQL](src/main/assembly/resources/sparql/0120_2union.sparql), [Cypher](src/main/assembly/resources/cypher/0120_2union.cypher), [Gremlin](src/main/assembly/resources/gremlin/0120_2union.gremlin)\n1. **2union1Nest**: 2 unions, 1 nesting, [SPARQL](src/main/assembly/resources/sparql/0130_2union1Nest.sparql), [Cypher](src/main/assembly/resources/cypher/0130_2union1Nest.cypher), [Gremlin](src/main/assembly/resources/gremlin/0130_2union1Nest.gremlin)\n1. **2union1Nest+**: 2 unions, 1 nesting (with Cypher CALL), [SPARQL](src/main/assembly/resources/sparql/0135_2union1Nest+.sparql), [Cypher](src/main/assembly/resources/cypher/0135_2union1Nest+.cypher), [Gremlin](src/main/assembly/resources/gremlin/0135_2union1Nest+.gremlin)\n1. **pway**: Complex union of paths over pathways, [SPARQL](src/main/assembly/resources/sparql/0140_pway.sparql), [Cypher](src/main/assembly/resources/cypher/0140_pway.cypher), [Gremlin](src/main/assembly/resources/gremlin/0140_pway.gremlin)\n1. **exist**: Not exists, [SPARQL](src/main/assembly/resources/sparql/0200_exist.sparql), [Cypher](src/main/assembly/resources/cypher/0200_exist.cypher), [Gremlin](src/main/assembly/resources/gremlin/0200_exist.gremlin)\n1. **existAg**: Not exists + aggregation, [SPARQL](src/main/assembly/resources/sparql/0210_existAg.sparql), [Cypher](src/main/assembly/resources/cypher/0210_existAg.cypher), [Gremlin](src/main/assembly/resources/gremlin/0210_existAg.gremlin)\n\n\n### Category: aggregation\n\nQueries that perform data grouping and aggregations.\n\n1. **grp**: Group by, [SPARQL](src/main/assembly/resources/sparql/0150_grp.sparql), [Cypher](src/main/assembly/resources/cypher/0150_grp.cypher), [Gremlin](src/main/assembly/resources/gremlin/0150_grp.gremlin)\n1. **grpAg**: Group by + 2 aggregation functions, [SPARQL](src/main/assembly/resources/sparql/0170_grpAg.sparql), [Cypher](src/main/assembly/resources/cypher/0170_grpAg.cypher), [Gremlin](src/main/assembly/resources/gremlin/0170_grpAg.gremlin)\n1. **mulGrpAg**: Multiple subqueries having aggregations , [SPARQL](src/main/assembly/resources/sparql/0180_mulGrpAg.sparql), [Cypher](src/main/assembly/resources/cypher/0180_mulGrpAg.cypher), [Gremlin](src/main/assembly/resources/gremlin/0180_mulGrpAg.gremlin)\n1. **nestAg**: Nested and outer aggregations (see Q6 from the [Berlin benchmark](https://goo.gl/v4YbQ2)), [SPARQL](src/main/assembly/resources/sparql/0190_nestAg.sparql), [Cypher](src/main/assembly/resources/cypher/0190_nestAg.cypher), [Gremlin](src/main/assembly/resources/gremlin/0190_nestAg.gremlin)\n\n\n### Category: paths\n\nQueries that select and traverse paths.\n\n1. **varPathC**: Variable path query (fixed len), [SPARQL](src/main/assembly/resources/sparql/0100_varPathC.sparql), [Cypher](src/main/assembly/resources/cypher/0100_varPathC.cypher), [Gremlin](src/main/assembly/resources/gremlin/0100_varPathC.gremlin)\n1. **varPath**: Variable path query (unbound len and restricted on top), [SPARQL](src/main/assembly/resources/sparql/0110_varPath.sparql), [Cypher](src/main/assembly/resources/cypher/0110_varPath.cypher), [Gremlin](src/main/assembly/resources/gremlin/0110_varPath.gremlin)\n1. **shrtSmf**: Short Semantic Motif, [SPARQL](src/main/assembly/resources/sparql/250_shrtSmf.sparql), [Cypher](src/main/assembly/resources/cypher/250_shrtSmf.cypher), [Gremlin](src/main/assembly/resources/gremlin/250_shrtSmf.gremlin)\n1. **medSmf**: Medium length Semantic Motif, [SPARQL](src/main/assembly/resources/sparql/260_medSmf.sparql), [Cypher](src/main/assembly/resources/cypher/260_medSmf.cypher), [Gremlin](src/main/assembly/resources/gremlin/260_medSmf.gremlin)\n1. **lngSmf**: Long and Complex Semantic Motif, [SPARQL](src/main/assembly/resources/sparql/270_lngSmf.sparql), [Cypher](src/main/assembly/resources/cypher/270_lngSmf.cypher), [Gremlin](src/main/assembly/resources/gremlin/270_lngSmf.gremlin)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frothamsted%2Fgraphdb-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frothamsted%2Fgraphdb-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frothamsted%2Fgraphdb-benchmarks/lists"}