https://github.com/mihaiconstantin/parabar
An `R` package for parallelizing tasks, tracking their progress, and displaying accurate progress bars.
https://github.com/mihaiconstantin/parabar
parallel-computing progress-bar r
Last synced: 10 months ago
JSON representation
An `R` package for parallelizing tasks, tracking their progress, and displaying accurate progress bars.
- Host: GitHub
- URL: https://github.com/mihaiconstantin/parabar
- Owner: mihaiconstantin
- License: other
- Created: 2022-11-10T10:53:45.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-12-16T23:14:43.000Z (over 1 year ago)
- Last Synced: 2025-04-09T20:44:15.996Z (about 1 year ago)
- Topics: parallel-computing, progress-bar, r
- Language: R
- Homepage: https://parabar.mihaiconstantin.com
- Size: 15.5 MB
- Stars: 21
- Watchers: 1
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
Progress Bar for Parallel Tasks
...and more
[`parabar`](https://parabar.mihaiconstantin.com) is a package designed to
provide a simple interface for executing tasks in parallel, while also providing
functionality for tracking and displaying the progress of the tasks.
This package is aimed at two audiences: (1) end-users who want to execute a task
in parallel in an interactive `R` session and track the execution progress, and
(2) `R` package developers who want to use
[`parabar`](https://parabar.mihaiconstantin.com) as a solution for parallel
processing in their packages.
## Installation
You can install `parabar` directly from `CRAN` using the following command:
```r
# Install the package from `CRAN`.
install.packages("parabar")
# Load the package.
library(parabar)
```
Alternatively, you can also install the latest development version from `GitHub`
via:
```r
# Install the package from `GitHub`.
remotes::install_github("mihaiconstantin/parabar")
# Load the package.
library(parabar)
```
## Usage
Below you can find a few examples of how to use
[`parabar`](https://parabar.mihaiconstantin.com) in your `R` scripts, both for
end-users, and for developers. All examples below assume that you have already
installed and loaded the package.
### Users
In general, the usage of [`parabar`](https://parabar.mihaiconstantin.com)
consists of the following steps:
1. Start a backend for parallel processing.
2. Execute a task in parallel.
3. Stop the backend.
Optionally, you can also configure the progress bar if the backend created
supports progress tracking, or perform additional operations on the backend.
#### Synchronous Backend
The simplest, and perhaps least interesting, way to use
[`parabar`](https://parabar.mihaiconstantin.com) is by requesting a synchronous
backend.
```r
# Start a synchronous backend.
backend <- start_backend(cores = 4, cluster_type = "psock", backend_type = "sync")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
At this point you will notice the following warning message:
```txt
Warning message:
Progress tracking not supported for backend of type 'SyncBackend'.
```
The reason for this is because progress tracking only works for asynchronous
backends, and [`parabar`](https://parabar.mihaiconstantin.com) enables progress
tracking by default at load time. We can disable this by option to get rid of
the warning message.
```r
# Disable progress tracking.
set_option("progress_track", FALSE)
```
We can verify that the warning message is gone by running the task again,
reusing the backend we created earlier.
```r
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
When we are done with this backend, we can stop it to free up the resources.
```r
# Stop the backend.
stop_backend(backend)
```
#### Asynchronous Backend
The more interesting way to use [`parabar`](https://parabar.mihaiconstantin.com)
is by requesting an asynchronous backend. This is the default backend type, and
highlights the strengths of the package.
First, let's ensure progress tracking is enabled (i.e., we disabled it above).
```r
# Enable progress tracking.
set_option("progress_track", TRUE)
```
Now, we can proceed with creating the backend and running the task.
```r
# Start an asynchronous backend.
backend <- start_backend(cores = 4, cluster_type = "psock", backend_type = "async")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
At this point, we can see that the progress bar is displayed, and that the
progress is tracked. The progress bar is updated in real-time, after each task
execution, e.g.:
```txt
> completed 928 out of 1000 tasks [ 93%] [ 3s]
```
We can also configure the progress bar. For example, suppose we want to display
an actual progress bar.
```r
# Change the progress bar options.
configure_bar(type = "modern", format = "[:bar] :percent")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
The progress bar will now look like this:
```txt
[====================>-------------------------------------------------] 30%
```
By default, [`parabar`](https://parabar.mihaiconstantin.com) uses the
[`progress`](https://cran.r-project.org/package=progress) package to display the
progress bar. However, we can easily swap it with another progress bar engine.
For example, suppose we want to use the built-in
[`utils::txtProgressBar`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/txtProgressBar.html).
```r
# Change to and adjust the style of the `basic` progress bar.
configure_bar(type = "basic", style = 3)
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
Check out `?configure_bar` for more information on the possible ways of
configuring the progress bar.
We can also disable the progress bar for asynchronous backends altogether, by
adjusting the package options.
```r
# Disable progress tracking.
set_option("progress_track", FALSE)
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
We can stop the backend when we are done.
```r
# Stop the backend.
stop_backend(backend)
```
#### No Backend
Finally, we can also the `?par_sapply` function without a backend, which will
resort to running the task sequentially by means of
[`utils::sapply`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html).
```r
# Run the task sequentially using the `base::sapply`.
results <- par_sapply(backend = NULL, 1:300, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
#### Additional Operations
As indicated above, the general workflow consists of starting a backend,
executing a task in parallel, and stopping the backend. However, there are
additional operations that can be performed on a backend (i.e., see
_**Developers**_ section). The table below lists all available operations that
can be performed on a backend.
| Operation | Description |
| :--------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------- |
| start_backend(backend) | Start a backend. |
| stop_backend(backend) | Stop a backend. |
| clear(backend) | Remove all objects from a backend. |
| peek(backend) | List the names of the variables on a backend. |
| export(backend, variables, environment) | Export objects to a backend. |
| evaluate(backend, expression) | Evaluate expressions on a backend. |
| par_sapply(backend, x, fun) | Run tasks in parallel on a backend. |
| par_lapply(backend, x, fun) | Run tasks in parallel on a backend. |
| par_apply(backend, x, margin, fun) | Run tasks in parallel on a backend. |
Check the documentation corresponding to each operation for more information and
examples.
### Developers
[`parabar`](https://parabar.mihaiconstantin.com) provides a rich API for
developers who want to use the package in their own projects.
From a high-level perspective, the package consists of **`backends`** and
**`contexts`** in which these backends are employed for executing tasks in
parallel.
#### Backends
A **`backend`** represents a set of operations, defined by the `?BackendService`
interface. Backends can be synchronous (i.e., `?SyncBackend`) or asynchronous
(i.e., `?AsyncBackend`). The former will block the execution of the current `R`
session until the parallel task is completed, while the latter will return
immediately and the task will be executed in a background `R` session.
The `?BackendService` interface defines the following operations:
- `start`: Start the backend.
- `stop`: Stop the backend.
- `clear`: Remove all objects from the backend.
- `peek`: Show the variables names available on the backend.
- `export`: Export variables from a given environment to the backend.
- `evaluate`: Evaluate an arbitrary expression on the backend.
- `sapply`: Run a task on the backend.
- `lapply`: Run a task on the backend.
- `apply`: Run a task on the backend.
- `get_output`: Get the output of the task execution.
Check out the documentation for `BackendService` for more information on each
method.
#### Contexts
A **`context`** represents the specific conditions in which a backend object
operates. The default context class (i.e., `?Context`) simply forwards the call
to the corresponding backend method. However, a more complex context can augment
the operation before forwarding the call to the backend. One example of a
complex context is the `?ProgressTrackingContext` class. This class extends the
regular `?Context` class and decorates, e.g., the backend `sapply` operation to
log the progress after each task execution and display a progress bar.
#### Main Classes
The following are the main classes provided by
[`parabar`](https://parabar.mihaiconstantin.com):
- `BackendService`: Interface for backend operations.
- `SyncBackend`: Synchronous backend extending the abstract `Backend` class and
implementing the `BackendService` interface.
- `AsyncBackend`: Asynchronous backend extending the abstract `Backend` class
and implementing the `BackendService` interface.
- `Specification`: Backend specification used when starting a backend.
- `BackendFactory`: Factory for creating `Backend` objects.
- `Context`: Default context for executing backend operations without
interference.
- `ProgressTrackingContext`: Context for decorating the `sapply` operation to
track and display the progress.
- `ContextFactory`: Factory for creating `Context` objects.
- `UserApiConsumer`: Wrapper around the developer `API`.
Additionally, [`parabar`](https://parabar.mihaiconstantin.com) also provides
several classes for creating and updating different progress bars, namely:
- `BasicBar`: A simple, but robust, bar created via
[`utils::txtProgressBar`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/txtProgressBar.html)
extending the `Bar` abstract class.
- `ModernBar`: A modern bar created via
[`progress::progress_bar`](https://cran.r-project.org/package=progress)
extending the `Bar` abstract class.
- `BarFactory`: Factory for creating `Bar` objects.
#### Examples
Below there is an example of how to use the package
[`R6`](https://adv-r.hadley.nz/r6.html) class API.
We start by creating a `?Specification` object instructing the `?Backend` object
how to create a cluster via the built-in function
[`parallel::makeCluster`](https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/makeCluster.html).
```r
# Create a specification object.
specification <- Specification$new()
specification$set_cores(4)
specification$set_type("psock")
```
We proceed by obtaining an asynchronous backend instance from the
`?BackendFactory` and starting the backend using the `?Specification` instance
above.
```r
# Create a backend factory.
backend_factory <- BackendFactory$new()
# Get an asynchronous backend instance.
backend <- backend_factory$get("async")
# Start the backend.
backend$start(specification)
```
Finally, we can run a task in parallel by calling, e.g., the `sapply` method on
the `backend` instance.
```r
# Run a task in parallel.
backend$sapply(1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
At this point, the task was deployed in a background `R` session, and the caller
process is free to do other things.
Calling `backend$get_output` immediately after the `backend$sapply` call will
throw an error, indicating that the task is still running, i.e.:
```txt
Error: A task is currently running.
```
We can, however, block the caller process and wait for the task to complete
before fetching the results.
```r
results <- backend$get_output(wait = TRUE)
```
We can now introduce the `context` concept to decorate the `backend` instance
and, in this example, track the progress of the task. First, we obtain an
`?Context` instance from the `?ContextFactory`. Furthermore, since we are using
an asynchronous backend, we can request a context that facilitates
progress-tracking.
```r
# Create a context factory.
context_factory <- ContextFactory$new()
# Get a progress-tracking context.
context <- context_factory$get("progress")
# Register the backend with the context.
context$set_backend(backend)
```
The `?Context` class (i.e., and it's subclasses) implements the
`?BackendService` interface, which means that we can use it to execute backend
operations.
Since we are using the `?ProgressTrackingContext` context, we also need to
register a `?Bar` instance with the context. First, let's obtain a `?Bar`
instance from the `?BarFactory`.
```r
# Create a bar factory.
bar_factory <- BarFactory$new()
# Get a `modern` bar (i.e., via `progress::progress_bar`).
bar <- bar_factory$get("modern")
```
We can now register the `bar` instance with the `context` instance.
```r
# Register the `bar` with the `context`.
context$set_bar(bar)
```
We may also configure the `bar`, or change its appearance. For instance, it may
be a good idea is to show the progress bar right away.
```r
# Configure the `bar`.
context$configure_bar(
show_after = 0,
format = " > completed :current out of :total tasks [:percent] [:elapsed]"
)
```
At this point, the `backend$sapply` operation is decorated with progress
tracking. Finally, we can run the task in parallel and enjoy our progress bar
using the `context` instance.
```r
# Run a task in parallel with progress tracking.
context$sapply(1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})
```
All there is left to do is to fetch the results and stop the backend.
```r
# Get the results.
results <- context$get_output()
# Stop the backend.
context$stop()
```
#### Design
Check out the UML diagram below for a quick overview of the package design.
**_Note._** For the sake of clarity, the diagram only displays the `sapply`
operation for running tasks in parallel. However, other operations are supported
as well (i.e., see table in the section *Additional Operations*).
## Contributing
- Any contributions are welcome and greatly appreciated. Please open a [pull
request](https://github.com/mihaiconstantin/parabar/pulls) on `GitHub`.
- To report bugs, or request new features, please open an
[issue](https://github.com/mihaiconstantin/parabar/issues) on `GitHub`.
## License
- The package source code in this repository is licensed under the [MIT
license](https://opensource.org/license/mit).
-
The documentation, vignettes, and other website materials by Mihai Constantin are licensed under CC BY 4.0
.