{"id":13795122,"url":"https://github.com/graphframes/graphframes","last_synced_at":"2026-04-02T22:05:56.231Z","repository":{"id":41547106,"uuid":"50067430","full_name":"graphframes/graphframes","owner":"graphframes","description":"GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs","archived":false,"fork":false,"pushed_at":"2025-04-11T04:28:54.000Z","size":7967,"stargazers_count":1039,"open_issues_count":170,"forks_count":247,"subscribers_count":55,"default_branch":"master","last_synced_at":"2025-04-11T04:57:44.800Z","etag":null,"topics":["apache-spark","big-data","connected-components","dataframe","dataframes","graphs","network-motif","network-motifs","networks","spark"],"latest_commit_sha":null,"homepage":"http://graphframes.github.io/graphframes","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graphframes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-01-20T23:17:56.000Z","updated_at":"2025-04-11T04:28:58.000Z","dependencies_parsed_at":"2022-08-10T02:45:40.208Z","dependency_job_id":"7144f4d1-5d12-4dc0-8f3a-73f6b4e1c0ef","html_url":"https://github.com/graphframes/graphframes","commit_stats":{"total_commits":364,"total_committers":32,"mean_commits":11.375,"dds":0.7197802197802198,"last_synced_commit":"0b4df70038a4f0ff3b4544223089084fc8742da7"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphframes%2Fgraphframes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphframes%2Fgraphframes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphframes%2Fgraphframes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphframes%2Fgraphframes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graphframes","download_url":"https://codeload.github.com/graphframes/graphframes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248345273,"owners_count":21088244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","big-data","connected-components","dataframe","dataframes","graphs","network-motif","network-motifs","networks","spark"],"created_at":"2024-08-03T23:00:52.378Z","updated_at":"2026-04-02T22:05:56.166Z","avatar_url":"https://github.com/graphframes.png","language":"Scala","readme":"\u003cimg src=\"docs/img/GraphFrames-Logo-Large.png\" alt=\"GraphFrames Logo\" width=\"400\"/\u003e\n\n[![Scala CI](https://github.com/graphframes/graphframes/actions/workflows/scala-ci.yml/badge.svg)](https://github.com/graphframes/graphframes/actions/workflows/scala-ci.yml)\n[![Python CI](https://github.com/graphframes/graphframes/actions/workflows/python-ci.yml/badge.svg)](https://github.com/graphframes/graphframes/actions/workflows/python-ci.yml)\n[![pages-build-deployment](https://github.com/graphframes/graphframes/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/graphframes/graphframes/actions/workflows/pages/pages-build-deployment)\n\n# GraphFrames: DataFrame-based Graphs\n\nThis is a package for graphs processing and analytics on scale. It is built on top of Apache Spark and relies on DataFrame abstraction. Users can write highly expressive queries by leveraging the DataFrame API, combined with a new API for network motif finding. The user also benefits from DataFrame performance optimizations within the Spark SQL engine. GraphFrames works in Java, Scala, and Python.\n\nYou can find user guide and API docs at https://graphframes.github.io/graphframes\n\n## GraphFrames is Back!\n\nThis projects was in maintenance mode for some time, but we are happy to announce that it is now back in active development! We are working on a new release with many bug fixes and improvements. We are also working on a new website and documentation.\n\n## Installation and Quick-Start\n\nThe easiest way to start using GraphFrames is through the [Spark Packages system](https://spark-packages.org/package/graphframes/graphframes). Just run the following command:\n\n```bash\n# Interactive Scala/Java\n$ spark-shell --packages graphframes:graphframes:0.8.4-spark3.5-s_2.12\n\n# Interactive Python\n$ pyspark --packages graphframes:graphframes:0.8.4-spark3.5-s_2.12\n\n# Submit a script in Scala/Java/Python\n$ spark-submit --packages graphframes:graphframes:0.8.4-spark3.5-s_2.12 script.py\n```\n\nNow you can create a GraphFrame as follows.\n\nIn Python:\n\n```python\nfrom pyspark.sql import SparkSession\nfrom graphframes import GraphFrame\n\nspark = SparkSession.builder.getOrCreate()\n\nnodes = [\n    (1, \"Alice\", 30),\n    (2, \"Bob\", 25),\n    (3, \"Charlie\", 35)\n]\nnodes_df = spark.createDataFrame(nodes, [\"id\", \"name\", \"age\"])\n\nedges = [\n    (1, 2, \"friend\"),\n    (2, 1, \"friend\"),\n    (2, 3, \"friend\"),\n    (3, 2, \"enemy\")  # eek!\n]\nedges_df = spark.createDataFrame(edges, [\"src\", \"dst\", \"relationship\"])\n\ng = GraphFrame(nodes_df, edges_df)\n```\n\nNow let's run some graph algorithms at scale!\n\n```python\ng.inDegrees.show()\n\n# +---+--------+\n# | id|inDegree|\n# +---+--------+\n# |  2|       2|\n# |  1|       1|\n# |  3|       1|\n# +---+--------+\n\ng.outDegrees.show()\n\n# +---+---------+\n# | id|outDegree|\n# +---+---------+\n# |  1|        1|\n# |  2|        2|\n# |  3|        1|\n# +---+---------+\n\ng.degrees.show()\n\n# +---+------+\n# | id|degree|\n# +---+------+\n# |  1|     2|\n# |  2|     4|\n# |  3|     2|\n# +---+------+\n\ng2 = g.pageRank(resetProbability=0.15, tol=0.01)\ng2.vertices.show()\n\n# +---+-----+---+------------------+\n# | id| name|age|          pagerank|\n# +---+-----+---+------------------+\n# |  1| John| 30|0.7758750474847483|\n# |  2|Alice| 25|1.4482499050305027|\n# |  3|  Bob| 35|0.7758750474847483|\n# +---+-----+---+------------------+\n\n# GraphFrames' most used feature...\n# Connected components can do big data entity resolution on billions or even trillions of records!\n# First connect records with a similarity metric, then run connectedComponents.\n# This gives you groups of identical records, which you then link by same_as edges or merge into list-based master records.\nsc.setCheckpointDir(\"/tmp/graphframes-example-connected-components\")  # required by GraphFrames.connectedComponents\ng.connectedComponents().show()\n\n# +---+-----+---+---------+\n# | id| name|age|component|\n# +---+-----+---+---------+\n# |  1| John| 30|        1|\n# |  2|Alice| 25|        1|\n# |  3|  Bob| 35|        1|\n# +---+-----+---+---------+\n\n# Find frenemies with network motif finding! See how graph and relational queries are combined?\n(\n    g.find(\"(a)-[e]-\u003e(b); (b)-[e2]-\u003e(a)\")\n    .filter(\"e.relationship = 'friend' and e2.relationship = 'enemy'\")\n    .show()\n)\n\n# These are paths, which you can aggregate and count to find complex patterns.\n# +------------+--------------+----------------+-------------+\n# |           a|             e|               b|           e2|\n# +------------+--------------+----------------+-------------+\n# |{2, Bob, 25}|{2, 3, friend}|{3, Charlie, 35}|{3, 2, enemy}|\n# +------------+--------------+----------------+-------------+\n```\n\n## Learn GraphFrames\n\nTo learn more about GraphFrames, check out these resources:\n* [GraphFrames Documentation](https://graphframes.github.io/graphframes)\n* [GraphFrames Network Motif Finding Tutorial](https://graphframes.github.io/graphframes/docs/_site/motif-tutorial.html)\n* [Introducing GraphFrames](https://databricks.com/blog/2016/03/03/introducing-graphframes.html)\n* [On-Time Flight Performance with GraphFrames for Apache Spark](https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-graphframes-for-apache-spark.html)\n\n## Community Resources\n\n* [GraphFrames Google Group](https://groups.google.com/forum/#!forum/graphframes)\n* [#graphframes Discord Channel on GraphGeeks](https://discord.com/channels/1162999022819225631/1326257052368113674)\n\n## `graphframes-py` is our Official PyPi Package\n\nWe recommend using the Spark Packages system to install the latest version of GraphFrames, but now publish a build of our Python package to PyPi in the [graphframes-py](https://pypi.org/project/graphframes-py/) package. It can be used to provide type hints in IDEs, but does not load the java-side of GraphFrames so will not work without loading the GraphFrames package. See [Installation and Quick-Start](#installation-and-quick-start).\n\n```bash\npip install graphframes-py\n```\n\nThis project does not own or control the [graphframes PyPI package](https://pypi.org/project/graphframes/) (installs 0.6.0) or [graphframes-latest PyPI package](https://pypi.org/project/graphframes-latest/) (installs 0.8.4). \n\n## GraphFrames and sbt\n\nIf you use the sbt-spark-package plugin, in your sbt build file, add the following, pulled from [GraphFrames on Spark Packages](https://spark-packages.org/package/graphframes/graphframes):\n\n```\nspDependencies += \"graphframes/graphframes:0.8.4-spark3.5-s_2.12\"\n```\n\nOtherwise,\n\n```\nresolvers += \"Spark Packages Repo\" at \"https://repos.spark-packages.org/\"\n\nlibraryDependencies += \"graphframes\" % \"graphframes\" % \"0.8.4-spark3.5-s_2.12\"\n```\n\n## GraphFrames and Maven\n\nGraphFrames is not on Maven Central Repository but we are going to restore it soon. For now use Spark Packages system to install the package: [https://spark-packages.org/package/graphframes/graphframes](https://spark-packages.org/package/graphframes/graphframes).\n\n```xml\n\u003cdependencies\u003e\n  \u003c!-- list of dependencies --\u003e\n  \u003cdependency\u003e\n    \u003cgroupId\u003egraphframes\u003c/groupId\u003e\n    \u003cartifactId\u003egraphframes\u003c/artifactId\u003e\n    \u003cversion\u003e0.8.4-spark3.5-s_2.12\u003c/version\u003e\n  \u003c/dependency\u003e\n\u003c/dependencies\u003e\n\u003crepositories\u003e\n  \u003c!-- list of other repositories --\u003e\n  \u003crepository\u003e\n    \u003cid\u003eSparkPackagesRepo\u003c/id\u003e\n    \u003curl\u003ehttps://repos.spark-packages.org/\u003c/url\u003e\n  \u003c/repository\u003e\n\u003c/repositories\u003e\n```\n\n## GraphFrames Internals\n\nTo learn how GraphFrames works internally to combine graph and relational queries, check out the paper [GraphFrames: An Integrated API for Mixing Graph and\nRelational Queries, Dave et al. 2016](https://people.eecs.berkeley.edu/~matei/papers/2016/grades_graphframes.pdf).\n\n## Building and running unit tests\n\nTo compile this project, run `build/sbt assembly` from the project home directory. This will also run the Scala unit tests.\n\nTo run the Python unit tests, run the `run-tests.sh` script from the `python/` directory. You will need to set `SPARK_HOME` to your local Spark installation directory.\n\n## Release new version\n\nPlease see guide `dev/release_guide.md`.\n\n## Spark version compatibility\n\nThis project is compatible with Spark 3.4+.  Significant speed improvements have been made to DataFrames in recent versions of Spark, so you may see speedups from using the latest Spark version.\n\n## Contributing\n\nGraphFrames is collaborative effort among UC Berkeley, MIT, Databricks and the open source community. We welcome open source contributions as well!\n\n## Releases:\n\nSee [release notes](https://github.com/graphframes/graphframes/releases).\n","funding_links":[],"categories":["Packages"],"sub_categories":["Graph Processing"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphframes%2Fgraphframes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraphframes%2Fgraphframes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphframes%2Fgraphframes/lists"}