{"id":13791076,"url":"https://github.com/rstudio/graphframes","last_synced_at":"2025-04-25T19:31:50.800Z","repository":{"id":56937111,"uuid":"126239847","full_name":"rstudio/graphframes","owner":"rstudio","description":"R Interface for GraphFrames","archived":false,"fork":false,"pushed_at":"2021-10-21T06:36:40.000Z","size":167,"stargazers_count":37,"open_issues_count":6,"forks_count":12,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-15T09:44:28.200Z","etag":null,"topics":["graphframes","graphs","pagerank","rstats","spark","sparklyr"],"latest_commit_sha":null,"homepage":"https://spark.rstudio.com/graphframes/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rstudio.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-21T20:58:14.000Z","updated_at":"2025-03-22T11:07:58.000Z","dependencies_parsed_at":"2022-08-21T06:50:11.409Z","dependency_job_id":null,"html_url":"https://github.com/rstudio/graphframes","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rstudio%2Fgraphframes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rstudio%2Fgraphframes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rstudio%2Fgraphframes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rstudio%2Fgraphframes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rstudio","download_url":"https://codeload.github.com/rstudio/graphframes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250882637,"owners_count":21502341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graphframes","graphs","pagerank","rstats","spark","sparklyr"],"created_at":"2024-08-03T22:00:55.271Z","updated_at":"2025-04-25T19:31:50.428Z","avatar_url":"https://github.com/rstudio.png","language":"R","funding_links":[],"categories":["Sparklyr Analysis Tools"],"sub_categories":["Graph Mining"],"readme":"---\ntitle: \"R interface for GraphFrames\"\noutput:\n  github_document:\n    fig_width: 9\n    fig_height: 5\n---\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(eval = TRUE)\nknitr::opts_chunk$set(warning = FALSE)\nknitr::opts_chunk$set(fig.path = \"tools/readme/\", dev = \"png\")\n```\n\n[![Build Status](https://travis-ci.org/rstudio/graphframes.svg?branch=master)](https://travis-ci.org/rstudio/graphframes) [![Coverage status](https://codecov.io/gh/rstudio/graphframes/branch/master/graph/badge.svg)](https://codecov.io/github/rstudio/graphframes?branch=master) [![CRAN status](https://www.r-pkg.org/badges/version/graphframes)](https://cran.r-project.org/package=graphframes)\n\n- Support for [GraphFrames](https://graphframes.github.io/) which aims to provide the functionality of [GraphX](http://spark.apache.org/graphx/).\n- Perform graph algorithms like: [PageRank](https://graphframes.github.io/api/scala/index.html#org.graphframes.lib.PageRank), [ShortestPaths](https://graphframes.github.io/api/scala/index.html#org.graphframes.lib.ShortestPaths) and many [others](https://graphframes.github.io/api/scala/#package).\n- Designed to work with [sparklyr](https://spark.rstudio.com) and the [sparklyr extensions](http://spark.rstudio.com/extensions.html).\n\n## Installation\n\nFor those already using `sparklyr` simply run:\n\n```{r eval=FALSE}\ninstall.packages(\"graphframes\")\n# or, for the development version,\n# devtools::install_github(\"rstudio/graphframes\")\n```\n\nOtherwise, install first `sparklyr` from CRAN using:\n\n```{r eval=FALSE}\ninstall.packages(\"sparklyr\")\n```\n\nThe examples make use of the `highschool` dataset from the `ggplot` package.\n\n## Getting Started\n\nWe will calculate [PageRank](https://en.wikipedia.org/wiki/PageRank) over the built-in \"friends\" dataset as follows.\n\n```{r message=FALSE}\nlibrary(graphframes)\nlibrary(sparklyr)\nlibrary(dplyr)\n\n# connect to spark using sparklyr\nsc \u003c- spark_connect(master = \"local\", version = \"2.3.0\")\n\n# obtain the example graph\ng \u003c- gf_friends(sc)\n\n# compute PageRank\nresults \u003c- gf_pagerank(g, tol = 0.01, reset_probability = 0.15)\nresults\n```\n\nWe can then visualize the results by collecting the results to R:\n\n```{r, message = FALSE}\nlibrary(tidygraph)\nlibrary(ggraph)\n\nvertices \u003c- results %\u003e%\n  gf_vertices() %\u003e%\n  collect()\n\nedges \u003c- results %\u003e%\n  gf_edges() %\u003e%\n  collect()\n\nedges %\u003e%\n  as_tbl_graph() %\u003e%\n  activate(nodes) %\u003e%\n  left_join(vertices, by = c(name = \"id\")) %\u003e%\n  ggraph(layout = \"nicely\") +\n  geom_node_label(aes(label = name.y, color = pagerank)) +\n  geom_edge_link(\n    aes(\n      alpha = weight,\n      start_cap = label_rect(node1.name.y),\n      end_cap = label_rect(node2.name.y)\n    ),\n    arrow = arrow(length = unit(4, \"mm\"))\n  ) +\n  theme_graph(fg_text_colour = 'white')\n```\n\n## Further Reading\n\nAppart from calculating `PageRank` using `gf_pagerank`, many other functions are available, including: \n\n- `gf_bfs()`: Breadth-first search (BFS).\n- `gf_connected_components()`: Connected components.\n- `gf_shortest_paths()`: Shortest paths algorithm.\n- `gf_scc()`: Strongly connected components.\n- `gf_triangle_count()`: Computes the number of triangles passing through each vertex and others.\n- `gf_degrees()`: Degrees of vertices\n\nFor instance, one can calculate the degrees of vertices using `gf_degrees` as follows:\n\n```{r message=FALSE}\ngf_friends(sc) %\u003e% gf_degrees()\n```\n\nFinally, we disconnect from Spark:\n\n```{r}\nspark_disconnect(sc)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frstudio%2Fgraphframes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frstudio%2Fgraphframes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frstudio%2Fgraphframes/lists"}