https://github.com/mlr-org/rush

Parallel and distributed computing in R.
https://github.com/mlr-org/rush

mlr3 parallel-computing r

Last synced: 3 months ago
JSON representation

Parallel and distributed computing in R.

Host: GitHub
URL: https://github.com/mlr-org/rush
Owner: mlr-org
Created: 2023-08-10T12:17:33.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-04-28T16:13:00.000Z (about 1 year ago)
Last Synced: 2024-05-01T09:37:23.024Z (about 1 year ago)
Topics: mlr3, parallel-computing, r
Language: R
Homepage: https://rush.mlr-org.com/
Size: 8.75 MB
Stars: 4
Watchers: 5
Forks: 0
Open Issues: 8
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

output: github_document

---

# rush

*rush* is a package for parallel and distributed computing in R.

It evaluates an R expression asynchronously on a cluster of workers and provides a shared storage between the workers.

The shared storage is a [Redis](https://redis.io) data base.

Rush offers a centralized and decentralized network architecture.

The centralized network has a single controller (`Rush`) and multiple workers (`RushWorker`).

Tasks are created centrally and distributed to workers by the controller.

The decentralized network has no controller.

The workers sample tasks and communicate the results asynchronously with other workers.

# Features

* Parallelize arbitrary R expressions.

* Centralized and decentralized network architecture.

* Small overhead of a few milliseconds per task.

* Easy start of local workers with `processx`

* Start workers on any platform with a batch script.

* Designed to work with [`data.table`](https://CRAN.R-project.org/package=data.table).

* Results are cached in the R session to minimize read and write operations.

* Detect and recover from worker failures.

* Start heartbeats to monitor workers on remote machines.

* Snapshot the in-memory data base to disk.

* Store [`lgr`](https://CRAN.R-project.org/package=lgr) messages of the workers in the Redis data base.

* Light on dependencies.

## Install

Install the development version from GitHub.

```{r eval = FALSE}

remotes::install_github("mlr-org/rush")

```

And install [Redis](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/).

## Centralized Rush Network

![](man/figures/README-flow.png)

*Centralized network with a single controller and three workers.*

```{r, include=FALSE}

config = redux::redis_config()

r = redux::hiredis(config)

r$FLUSHDB()

```

The example below shows the evaluation of a simple function in a centralized network.

The `network_id` identifies the instance and workers in the network.

The `config` is a list of parameters for the connection to Redis.

```{r}

library(rush)

config = redux::redis_config()

rush = Rush$new(network_id = "test", config)

rush

```

Next, we define a function that we want to evaluate on the workers.

```{r}

fun = function(x1, x2, ...) {

  list(y = x1 + x2)

}

```

We start two workers.

```{r}

rush$start_local_workers(fun = fun, n_workers = 2)

```

Now we can push tasks to the workers.

```{r}

xss = list(list(x1 = 3, x2 = 5), list(x1 = 4, x2 = 6))

keys = rush$push_tasks(xss)

rush$wait_for_tasks(keys)

```

And retrieve the results.

```{r}

rush$fetch_finished_tasks()

```

## Decentralized Rush Network

![](man/figures/README-flow-2.png)

*Decentralized network with four workers.*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mlr-org/rush

Awesome Lists containing this project

README