{"id":13529493,"url":"https://github.com/dbs-leipzig/gradoop","last_synced_at":"2026-02-21T01:02:27.559Z","repository":{"id":22973470,"uuid":"26323526","full_name":"dbs-leipzig/gradoop","owner":"dbs-leipzig","description":"Distributed Temporal Graph Analytics with Apache Flink","archived":false,"fork":false,"pushed_at":"2024-04-07T23:31:01.000Z","size":772122,"stargazers_count":242,"open_issues_count":83,"forks_count":90,"subscribers_count":26,"default_branch":"develop","last_synced_at":"2024-04-08T01:01:49.449Z","etag":null,"topics":["apache-flink","distributed-graph-analytics","graph","graph-mining","pattern-matching","property-graph","temporal-graph"],"latest_commit_sha":null,"homepage":"https://github.com/dbs-leipzig/gradoop","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dbs-leipzig.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2014-11-07T14:42:45.000Z","updated_at":"2024-04-15T01:35:29.580Z","dependencies_parsed_at":"2023-09-25T02:44:00.020Z","dependency_job_id":"81b01add-afe2-468d-8a25-d6a20fb3ac2e","html_url":"https://github.com/dbs-leipzig/gradoop","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbs-leipzig%2Fgradoop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbs-leipzig%2Fgradoop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbs-leipzig%2Fgradoop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbs-leipzig%2Fgradoop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dbs-leipzig","download_url":"https://codeload.github.com/dbs-leipzig/gradoop/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246670547,"owners_count":20815003,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-flink","distributed-graph-analytics","graph","graph-mining","pattern-matching","property-graph","temporal-graph"],"created_at":"2024-08-01T07:00:36.828Z","updated_at":"2025-10-20T15:50:04.102Z","avatar_url":"https://github.com/dbs-leipzig.png","language":"Java","funding_links":[],"categories":["Infrastructure","Query engines","Projects"],"sub_categories":["Graph Computing Frameworks","Frameworks"],"readme":"[![Apache License, Version 2.0, January 2004](https://img.shields.io/github/license/apache/maven.svg?label=License)](https://www.apache.org/licenses/LICENSE-2.0)\n[![Maven Central](https://img.shields.io/badge/Maven_Central-0.6.0-blue.svg?label=Maven%20Central)](http://search.maven.org/#search%7Cga%7C1%7Cgradoop)\n[![Build Status](https://github.com/dbs-leipzig/gradoop/workflows/Java%20CI/badge.svg)](https://github.com/dbs-leipzig/gradoop/actions?workflow=Java+CI)\n[![Code Quality: Java](https://img.shields.io/lgtm/grade/java/g/dbs-leipzig/gradoop.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/dbs-leipzig/gradoop/context:java)\n[![Total Alerts](https://img.shields.io/lgtm/alerts/g/dbs-leipzig/gradoop.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/dbs-leipzig/gradoop/alerts)\n\n## Gradoop: Distributed Graph Analytics on Hadoop\n\nGradoop is an open source (ALv2) research framework for scalable \ngraph analytics built on top of [Apache Flink](http://flink.apache.org/). It offers a graph data model which \nextends the widespread [property graph model](https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model) \nby the concept of logical graphs and further provides operators that can be applied \non single logical graphs and collections of logical graphs. The combination of these \noperators allows the flexible, declarative definition of graph analytical workflows.\nGradoop can be easily integrated in a workflow which already uses Flink\u0026reg; operators\nand Flink\u0026reg; libraries (i.e. Gelly, ML and Table).\n\nGradoop is **work in progress** which means APIs may change. It is currently used\nas a proof of concept implementation and far from production ready.\n\nThe project's documentation can be found in our [Wiki](https://github.com/dbs-leipzig/gradoop/wiki). \nThe Wiki also contains a [tutorial](https://github.com/dbs-leipzig/gradoop/wiki/Getting-started) to \nhelp getting started using Gradoop.\n\n##### Further Information (articles and talks)\n\n* [Distributed temporal graph analytics with GRADOOP, VLDB Journal, May 2021](https://dbs.uni-leipzig.de/file/Rost2021_Article_DistributedTemporalGraphAnalyt.pdf)\n* [Exploration and Analysis of Temporal Property Graphs, EDBT Demo, March 2021](https://dbs.uni-leipzig.de/file/EDBT_DEMO_Rost_2021_published.pdf)\n* [Graph Sampling with Distributed In-Memory Dataflow Systems, BTW, March 2021](https://dbs.uni-leipzig.de/file/A3-21.pdf)\n* [Evolution Analysis of Large Graphs with Gradoop, ECML PKDD LEG Workshop, September 2019](https://dbs.uni-leipzig.de/file/LEGECML-PKDD_2019_paper_9.pdf)\n* [Gradoop @Gridka Keynote Distributed Graph Analytics, August 2019](https://indico.scc.kit.edu/event/460/contributions/5772/attachments/2873/4171/gradoop_gridka19.pdf)\n* [Temporal Graph Analysis using Gradoop, BTW 2019-Workshopband, March 2019](https://dl.gi.de/bitstream/handle/20.500.12116/21797/C2-1.pdf)\n* [Declarative and distributed graph analytics with GRADOOP, VLDB Demo, August 2018](http://www.vldb.org/pvldb/vol11/p2006-junghanns.pdf)\n* [Cypher-based Graph Pattern Matching in Apache Flink, FlinkForward, September 2017](https://youtu.be/dZ8_v_P1j98)\n* [Cypher-based Graph Pattern Matching in GRADOOP, SIGMOD GRADES Workshop, May 2017](https://dbs.uni-leipzig.de/file/GRADES17_Cypher_in_Gradoop.pdf)\n* [DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems, arXiv, March 2017](https://arxiv.org/pdf/1703.01910.pdf)\n* [Distributed Grouping of Property Graphs with GRADOOP, BTW Conf., March 2017](http://dbs.uni-leipzig.de/file/BTW17_Grouping_Research.pdf)\n* [Graph Mining for Complex Data Analytics, ICDM Demo, December 2016](http://dbs.uni-leipzig.de/file/Graph_Mining_for_Complex_Data_Analytics.pdf)\n* [[german] Graph Mining für Business Intelligence, data2day, October 2016](http://www.slideshare.net/s1ck/gut-vernetzt-skalierbares-graph-mining-fr-business-intelligence)\n* [[german] Verteilte Graphanalyse mit Gradoop, JavaSPEKTRUM, October 2016](http://www.sigs-datacom.de/uploads/tx_dmjournals/junghans_petermann_JS_05_16_eeNZ.pdf)\n* [Extended Property Graphs with Apache Flink, SIGMOD NDA Workshop, June 2016](http://dbs.uni-leipzig.de/file/EPGM.pdf)\n* [Gradoop @Flink/Neo4j Meetup Berlin, March 2016](http://www.slideshare.net/s1ck/gradoop-scalable-graph-analytics-with-apache-flink-flink-neo4j-meetup-berlin)\n* [Gradoop @FOSDEM GraphDevroom, January 2016](https://fosdem.org/2016/schedule/event/graph_processing_gradoop_flink_analytics)\n* [Gradoop @FlinkForward, September 2015](http://www.slideshare.net/FlinkForward/martin-junghans-gradoop-scalable-graph-analytics-with-apache-flink) ([YouTube](https://youtu.be/WmP9xB_sG2o?list=PLDX4T_cnKjD31JeWR1aMOi9LXPRQ6nyHO))\n\n## Data Model\n\nIn the extended property graph model (EPGM), a database consists of multiple \nproperty graphs which are called logical graphs. These graphs describe\napplication-specific subsets of vertices and edges, i.e. a vertex or an edge can\nbe contained in multiple logical graphs. Additionally, not only vertices and edges \nbut also logical graphs have a type label and can have different properties.\n\nData Model elements (logical graphs, vertices and edges) have a unique identifier, \na single label (e.g. User) and a number of key-value properties (e.g. name = Alice).\nThere is no schema involved, meaning each element can have an arbitrary number of\nproperties even if they have the same label.\n\n### Graph operators\n\nThe EPGM provides operators for both single logical graphs as well as collections \nof logical graphs; operators may also return single graphs or graph collections. \nAn overview and detailed descriptions of the implemented operators can be found in the [Gradoop Wiki](https://github.com/dbs-leipzig/gradoop/wiki/List-of-Operators).\n\n## Setup\n\n### Use gradoop via Maven\n\n* Add one of the following dependencies to your maven project\n\nStable:\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.gradoop\u003c/groupId\u003e\n    \u003cartifactId\u003egradoop-flink\u003c/artifactId\u003e\n    \u003cversion\u003e0.6.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nLatest weekly build (additional repository is required):\n```xml\n\u003crepositories\u003e\n    \u003crepository\u003e\n        \u003cid\u003eoss.sonatype.org-snapshot\u003c/id\u003e\n        \u003curl\u003ehttps://oss.sonatype.org/content/repositories/snapshots\u003c/url\u003e\n        \u003creleases\u003e\u003cenabled\u003efalse\u003c/enabled\u003e\u003c/releases\u003e\n        \u003csnapshots\u003e\u003cenabled\u003etrue\u003c/enabled\u003e\u003c/snapshots\u003e\n    \u003c/repository\u003e\n\u003c/repositories\u003e\n```\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.gradoop\u003c/groupId\u003e\n    \u003cartifactId\u003egradoop-flink\u003c/artifactId\u003e\n    \u003cversion\u003e0.7.0-SNAPSHOT\u003c/version\u003e\n\u003c/dependency\u003e\n\n```\nIn any case you also need Apache Flink (version 1.9.3):\n```xml\n\u003cdependencies\u003e\n    \u003cdependency\u003e\n        \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n        \u003cartifactId\u003eflink-java\u003c/artifactId\u003e\n        \u003cversion\u003e1.9.3\u003c/version\u003e\n    \u003c/dependency\u003e\n    \u003cdependency\u003e\n        \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n        \u003cartifactId\u003eflink-clients_2.11\u003c/artifactId\u003e\n        \u003cversion\u003e1.9.3\u003c/version\u003e\n    \u003c/dependency\u003e\n\u003c/dependencies\u003e\n```\n\n### Build gradoop from source\n\n* Gradoop requires Java 8\n* Clone Gradoop into your local file system\n\n    \u003e git clone https://github.com/dbs-leipzig/gradoop.git\n    \n* Build and execute tests\n\n    \u003e cd gradoop\n    \n    \u003e mvn clean install\n\n* You might want to skip tests for faster builds. Also, some tests fail on Windows due to missing test dependencies\n\n    \u003e mvn clean install -DskipTests\n\n### Windows\n\n* Some operators require the Hadoop winutils\n\n## Gradoop modules\n\n### gradoop-common\n\nThe main contents of that module are the EPGM data model and a corresponding POJO \nimplementation which is used in Flink\u0026reg;. The persistent representation of the EPGM\nis also contained in gradoop-common and together with its mapping to HBase\u0026trade;.\n\n### gradoop-data-integration\n\nProvides functionalities to support graph data integration.\nThis includes minimal CSV and JSON importers as well as graph transformation operators\n(e.g. connect neighbors or conversion of edges to vertices and vice versa).\n\n### gradoop-accumulo\n\nInput and output formats for reading and writing graph collections from [Apache Accumulo\u0026reg;](https://accumulo.apache.org/).\n\n### gradoop-hbase\n\nInput and output formats for reading and writing graph collections from [Apache HBase\u0026trade;](https://hbase.apache.org/).\n\n### gradoop-flink\n\nThis module contains reference implementations of the EPGM operators. The \nEPGM is mapped to Flink\u0026reg; DataSets while the operators are implemented\nusing DataSet transformations. The module also contains implementations of \ngeneral graph algorithms (e.g. Label Propagation, Frequent Subgraph Mining)\nadapted to be used with the EPGM model.\n\n### gradoop-temporal\n\nThis module contains a reference implementation of the Temporal Property Graph Model (TPGM) and\nit's operators used to perform graph analysis with respect to the additional time dimension in real-world graphs.\n\n### gradoop-examples\n\nContains example pipelines showing use cases for Gradoop. \n\n*   Graph grouping example (build structural aggregates of property graphs)\n*   Social network examples (composition of multiple operators to analyze social networks graphs)\n*   Input/Output examples (usage of DataSource and DataSink implementations)\n\n### gradoop-checkstyle\n\nUsed to maintain the code style for the whole project.\n\n## Related Repositories\n\n### [Gradoop Tutorial](https://github.com/dbs-leipzig/gradoop-tutorial)\n\nGradoop Tutorial which has been shown in [BOSS20'](https://boss-workshop.github.io/boss-2020/) Workshop of VLDB 2020 international conference.\n\n### [Gradoop Benchmarks](https://github.com/dbs-leipzig/gradoop-benchmarks)\n\nThis repository contains sets of Gradoop operator benchmarks designed to run on a cluster to measure\nscalability and speedup of the operators.\n\n### [Gradoop Demo](https://github.com/dbs-leipzig/gradoop_demo)\n\nDemo application to show the functionalities of the grouping and query operator in an interactive web UI.\n\n\n### [Temporal Graph Explorer](https://github.com/dbs-leipzig/temporal_graph_explorer)\n\nGradoop Temporal Graph Explorer Demo which showcases some operators of the Temporal Property Graph Model.\n\n### [Gradoop GDL](https://github.com/dbs-leipzig/gdl)\n\nThis repository contains the definition of our Temporal Graph Definition Language (Temporal-GDL).\n\n### Version History\n\nSee the [Changelog](https://github.com/dbs-leipzig/gradoop/wiki/Changelog) at the Wiki pages. \n\n### Disclaimer\n\nApache\u0026reg;, Apache Accumulo\u0026reg;, Apache Flink, Flink\u0026reg;, Apache HBase\u0026trade; and \nHBase\u0026trade; are either registered trademarks or trademarks of the Apache Software Foundation \nin the United States and/or other countries.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbs-leipzig%2Fgradoop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdbs-leipzig%2Fgradoop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbs-leipzig%2Fgradoop/lists"}