Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/RoheLab/aPPR

Approximate Personalized Page Rank
https://github.com/RoheLab/aPPR

Last synced: 9 days ago
JSON representation

Approximate Personalized Page Rank

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
error = TRUE
)
```

# aPPR

[![R-CMD-check](https://github.com/RoheLab/aPPR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/RoheLab/aPPR/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/RoheLab/aPPR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/RoheLab/aPPR?branch=main)

`aPPR` helps you calculate approximate personalized pageranks from large graphs, including those that can only be queried via an API. `aPPR` additionally performs degree correction and regularization, allowing you to recover blocks from stochastic blockmodels.

To learn more about `aPPR` you can:

1. Glance through slides from the [JSM2021](https://github.com/alexpghayes/JSM2021) talk
2. Read the accompanying [paper][chen]

### Installation

You can install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("RoheLab/aPPR")
```

### Find the personalized pagerank of a node in an `igraph` graph

```{r igraph-example, message = FALSE}
library(aPPR)
library(igraph)

set.seed(27)

erdos_renyi_graph <- sample_gnp(n = 100, p = 0.5)

erdos_tracker <- appr(
erdos_renyi_graph, # the graph to work with
seeds = "5", # name of seed node (character)
epsilon = 0.0005 # desired approximation quality (see ?appr)
)

erdos_tracker
```

You can access the Personalized PageRanks themselves via the `stats` field of `Tracker` objects.

```{r}
erdos_tracker$stats
```

Sometimes you may wish to limit computation time by limiting the number of nodes to visit, which you can do as follows:

```{r igraph-example2}
limited_visits_tracker <- appr(
erdos_renyi_graph,
seeds = "5",
epsilon = 1e-10,
max_visits = 20 # max unique nodes to visit during approximation
)

limited_visits_tracker
```

### Find the personalized pagerank of a Twitter user using `rtweet`

```{r rtweet-example}
ftrevorc_ppr <- appr(
rtweet_graph(),
"ftrevorc",
epsilon = 1e-4,
max_visits = 5
)

ftrevorc_ppr
```

### Logging

`aPPR` uses [`logger`](https://daroczig.github.io/logger/) for displaying information to the user. By default, `aPPR` is quite verbose. You can control verbosity by loading `logger` and setting the logging threshold.

```{r logging-example-1, eval = FALSE}
library(logger)

# hide basically all messages (not recommended)
log_threshold(FATAL, namespace = "aPPR")

appr(
erdos_renyi_graph, # the graph to work with
seeds = "5", # name of seed node (character)
epsilon = 0.0005 # desired approximation quality (see ?appr)
)
```

If you submit a bug report, please please please include a log file using the TRACE threshold. You can set up this kind of detailed logging via the following:

```{r log-file-example, eval = FALSE}

set.seed(528491) # be sure to set seed for bug reports

log_appender(
appender_file(
"/path/to/logfile.log" ## TODO: choose a path to log to
),
namespace = "aPPR"
)

log_threshold(TRACE, namespace = "aPPR")

tracker <- appr(
rtweet_graph(),
seed = c("hadleywickham", "gvanrossum"),
epsilon = 1e-6
)
```

### Ethical considerations

People have a right to choose how public and discoverable their information is. `aPPR` will often lead you to accounts that interesting, but also small and out of sight. Do not change the public profile or attention towards these the people running these accounts, or any other accounts, without their permission.

### References

1. Chen, Fan, Yini Zhang, and Karl Rohe. “Targeted Sampling from Massive Block Model Graphs with Personalized PageRank.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, no. 1 (February 2020): 99–126. https://doi.org/10.1111/rssb.12349. [arxiv][chen]

2. Andersen, Reid, Fan Chung, and Kevin Lang. “Local Graph Partitioning Using PageRank Vectors.” In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), 475–86. Berkeley, CA, USA: IEEE, 2006. https://doi.org/10.1109/FOCS.2006.44.

[chen]: https://arxiv.org/abs/1910.12937