https://github.com/rstudio/graphframes
R Interface for GraphFrames
https://github.com/rstudio/graphframes
graphframes graphs pagerank rstats spark sparklyr
Last synced: 5 months ago
JSON representation
R Interface for GraphFrames
- Host: GitHub
- URL: https://github.com/rstudio/graphframes
- Owner: rstudio
- License: apache-2.0
- Created: 2018-03-21T20:58:14.000Z (about 7 years ago)
- Default Branch: main
- Last Pushed: 2021-10-21T06:36:40.000Z (over 3 years ago)
- Last Synced: 2024-04-14T22:17:45.046Z (about 1 year ago)
- Topics: graphframes, graphs, pagerank, rstats, spark, sparklyr
- Language: R
- Homepage: https://spark.rstudio.com/graphframes/
- Size: 163 KB
- Stars: 38
- Watchers: 12
- Forks: 12
- Open Issues: 6
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- awesome-sparklyr - graphframes: R Interface for GraphFrames
README
---
title: "R interface for GraphFrames"
output:
github_document:
fig_width: 9
fig_height: 5
---```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = TRUE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(fig.path = "tools/readme/", dev = "png")
```[](https://travis-ci.org/rstudio/graphframes) [](https://codecov.io/github/rstudio/graphframes?branch=master) [](https://cran.r-project.org/package=graphframes)
- Support for [GraphFrames](https://graphframes.github.io/) which aims to provide the functionality of [GraphX](http://spark.apache.org/graphx/).
- Perform graph algorithms like: [PageRank](https://graphframes.github.io/api/scala/index.html#org.graphframes.lib.PageRank), [ShortestPaths](https://graphframes.github.io/api/scala/index.html#org.graphframes.lib.ShortestPaths) and many [others](https://graphframes.github.io/api/scala/#package).
- Designed to work with [sparklyr](https://spark.rstudio.com) and the [sparklyr extensions](http://spark.rstudio.com/extensions.html).## Installation
For those already using `sparklyr` simply run:
```{r eval=FALSE}
install.packages("graphframes")
# or, for the development version,
# devtools::install_github("rstudio/graphframes")
```Otherwise, install first `sparklyr` from CRAN using:
```{r eval=FALSE}
install.packages("sparklyr")
```The examples make use of the `highschool` dataset from the `ggplot` package.
## Getting Started
We will calculate [PageRank](https://en.wikipedia.org/wiki/PageRank) over the built-in "friends" dataset as follows.
```{r message=FALSE}
library(graphframes)
library(sparklyr)
library(dplyr)# connect to spark using sparklyr
sc <- spark_connect(master = "local", version = "2.3.0")# obtain the example graph
g <- gf_friends(sc)# compute PageRank
results <- gf_pagerank(g, tol = 0.01, reset_probability = 0.15)
results
```We can then visualize the results by collecting the results to R:
```{r, message = FALSE}
library(tidygraph)
library(ggraph)vertices <- results %>%
gf_vertices() %>%
collect()edges <- results %>%
gf_edges() %>%
collect()edges %>%
as_tbl_graph() %>%
activate(nodes) %>%
left_join(vertices, by = c(name = "id")) %>%
ggraph(layout = "nicely") +
geom_node_label(aes(label = name.y, color = pagerank)) +
geom_edge_link(
aes(
alpha = weight,
start_cap = label_rect(node1.name.y),
end_cap = label_rect(node2.name.y)
),
arrow = arrow(length = unit(4, "mm"))
) +
theme_graph(fg_text_colour = 'white')
```## Further Reading
Appart from calculating `PageRank` using `gf_pagerank`, many other functions are available, including:
- `gf_bfs()`: Breadth-first search (BFS).
- `gf_connected_components()`: Connected components.
- `gf_shortest_paths()`: Shortest paths algorithm.
- `gf_scc()`: Strongly connected components.
- `gf_triangle_count()`: Computes the number of triangles passing through each vertex and others.
- `gf_degrees()`: Degrees of verticesFor instance, one can calculate the degrees of vertices using `gf_degrees` as follows:
```{r message=FALSE}
gf_friends(sc) %>% gf_degrees()
```Finally, we disconnect from Spark:
```{r}
spark_disconnect(sc)
```