https://github.com/grailbio/rules_r
R rules for Bazel
https://github.com/grailbio/rules_r
bazel bazel-rules r rlang
Last synced: about 1 year ago
JSON representation
R rules for Bazel
- Host: GitHub
- URL: https://github.com/grailbio/rules_r
- Owner: grailbio
- License: apache-2.0
- Created: 2017-09-19T20:24:42.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-10-03T20:42:37.000Z (over 1 year ago)
- Last Synced: 2025-06-02T06:21:17.788Z (about 1 year ago)
- Topics: bazel, bazel-rules, r, rlang
- Language: Starlark
- Size: 730 KB
- Stars: 41
- Watchers: 6
- Forks: 18
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# R Rules for Bazel [](https://github.com/grailbio/rules_r/actions/workflows/tests.yml)
#### General Information
- [Overview](#overview)
- [Getting Started](#getting-started)
- [Configuration](#configuration)
- [External Packages](#external-packages)
- [Examples](#examples)
- [Contributing](#contributing)
- [Known Issues](#known-issues)
#### Rules
- [r_pkg](#r_pkg)
- [r_library](#r_library)
- [r_unit_test](#r_unit_test)
- [r_pkg_test](#r_pkg_test)
- [r_binary](#r_binary)
- [r_test](#r_test)
- [r_markdown](#r_markdown)
- [r_toolchain](#r_toolchain)
#### Repository Rules
- [r_repository](#r_repository)
- [r_repository_list](#r_repository_list)
- [r_rules_dependencies](#r_rules_dependencies)
- [r_coverage_dependencies](#r_coverage_dependencies)
- [r_register_toolchains](#r_register_toolchains)
#### Container Rules
- [r_library_image](R/container/README.md#r_library_image)
- [r_binary_image](R/container/README.md#r_binary_image)
## Overview
These rules are used for building [R][r] packages with Bazel. Although R has an
excellent package management system, there is no continuous build and
integration system for entire R package repositories. An advantage of using
Bazel, over a custom solution of tracking the package dependency graph and
triggering builds accordingly on each commit, is that R packages can be built
and tested as part of one build system in multi-language monorepos.
These rules are mature for production use.
## Getting started
The following assumes that you are familiar with how to use Bazel in general.
To begin, you can add the following or equivalent to your WORKSPACE file:
```python
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
# Change master to the git tag you want.
http_archive(
name = "rules_r",
strip_prefix = "rules_r-master",
urls = ["https://github.com/grailbio/rules_r/archive/master.tar.gz"],
)
load("@rules_r//R:dependencies.bzl", "r_register_toolchains", "r_rules_dependencies")
r_rules_dependencies()
r_register_toolchains()
```
You can load the rules in your BUILD file like so:
```python
load("@rules_r//R:defs.bzl",
"r_pkg", "r_library", "r_unit_test", "r_pkg_test")
```
Advanced users can also set up [Gazelle][gazelle] to maintain the BUILD files
for the R packages in their repo automatically.
## Configuration
The following software must be installed on your system:
1. bazel (v5.0.0 or above)
2. R (4.1.2 or above; should be locatable using the `PATH` environment variable)
**NOTE**: After re-installing or upgrading R, please reset the registered
toolchain with `bazel sync --configure` to rebuild your packages with the new
installation.
**NOTE**: It is possible to use R from a bazel package instead of a system
installation. See the toolchain `r-toolchain-nix` in the tests directory as an
example.
For each package, you can also specify a different Makevars file that can be
used to have finer control over native code compilation. The site-wide
Makevars files are configured by default in the toolchains, and these define
the compiler toolchain to use and the flags needed for these toolchains for
reproducible builds.
For _macOS_, this setup will help you cover the requirements for a large number
of packages:
```
brew install gcc pkg-config icu4c openssl
```
For _Ubuntu_, this (or equivalent for other Unix systems) helps:
```
apt-get install pkgconf libssl-dev libxml2-dev libcurl4-openssl-dev
```
#### Note
For no interference from other packages during the build (possibly other
versions installed manually by the user), it is recommended that packages other
than those with recommended priority be installed in the directory pointed to
by `R_LIBS_USER`. The Bazel build process will then be able to hide all the
other packages from R by setting a different value for `R_LIBS_USER`.
When moving to Bazel for installing R packages on your system, we recommend
cleaning up existing machines:
```
sudo Rscript \
-e 'options("repos"="https://cloud.r-project.org")' \
-e 'lib <- c(.Library, .Library.site)' \
-e 'non_base_pkgs <- installed.packages(lib.loc=lib, priority=c("recommended", "NA"))[, "Package"]' \
-e 'remove.packages(non_base_pkgs, lib=lib)'
# If not set up already, create the directory for R_LIBS_USER.
Rscript \
-e 'dir.create(Sys.getenv("R_LIBS_USER"), recursive=TRUE, showWarnings=FALSE)'
```
For more details on how R searches different paths for packages, see
[libPaths][libPaths].
## External packages
To depend on external packages from CRAN and other remote repos, you can define the
packages as a CSV with three columns -- Package, Version, and sha256. Then use
[r_repository_list](#r_repository_list) rule to define R repositories for each
package. For packages not in a CRAN like repo (e.g. github), you can use
[r_repository](#r_repository) rule directly. For packages on your local system
but outside your main repository, you will have to use `local_repository` with
a saved BUILD file. Same for VCS repositories.
```
load("@rules_r//R:repositories.bzl", "r_repository", "r_repository_list")
# R packages with non-standard sources.
r_repository(
name = "R_plotly",
sha256 = "24c848fa2cbb6aed6a59fa94f8c9b917de5b777d14919268e88bff6c4562ed29",
strip_prefix = "plotly-a60510e4bbce5c6bed34ef6439d7a48cb54cad0a",
urls = [
"https://github.com/ropensci/plotly/archive/a60510e4bbce5c6bed34ef6439d7a48cb54cad0a.tar.gz",
],
)
# R packages with standard sources.
# See below for an example of how to generate the CSV package_list.
r_repository_list(
name = "r_repositories_bzl",
build_file_overrides = "@myrepo//third-party/R:build_file_overrides.csv",
package_list = "@myrepo//third-party/R:packages.csv",
remote_repos = {
"BioCsoft": "https://bioconductor.org/packages/3.14/bioc",
"BioCann": "https://bioconductor.org/packages/3.14/data/annotation",
"BioCexp": "https://bioconductor.org/packages/3.14/data/experiment",
"CRAN": "https://cloud.r-project.org",
},
)
load("@r_repositories_bzl//:r_repositories.bzl", "r_repositories")
r_repositories()
```
The list of all external R packages configured this way can be obtained from
your shell with
```
$ bazel query 'filter(":R_", //external:*)'
```
**NOTE**: Periods ('.') in the package names are replaced with underscores
('\_') because bazel does not allow periods in repository names.
To generate and maintain a CSV file containing all your external dependencies
for use with `r_repository_list`, you can use the functions in the script
`repo_management.R`.
For example:
```bash
script="/path/to/rules_r/scripts/repo_management.R"
package_list_csv="/path/to/output/csv/file"
packages="comma-separated list of packages you want to add to the local cache"
bioc_version="bioc_version to use, e.g. 3.11"
# This will be the cache directory for a local copy of all the packages.
# The output CSV will always reflect the state of this directory.
local_r_repo="${HOME}/.cache/r-repo"
Rscript - <
## Examples
Some examples are available in the tests directory of this repo.
- See [tests/packages/exampleA][exampleA] for a barebones R package.
- See [tests/packages/exampleB][exampleB] for a barebones R package that depends on another package.
- See [tests/packages/exampleC][exampleC] for an R package that depends on external R packages and
user native code libraries.
- See [tests/packages/exampleD][exampleD] for an R package that depends on external R packages which
have been built with bazel provided native code libraries instead of a system requirement.
Also see [Razel scripts][scripts] that provide utility functions to generate `BUILD` files
and `WORKSPACE` rules.
## Contributing
Contributions are most welcome. Please submit a pull request giving the owners
of this github repo access to your branch for minor style related edits, etc. We recommend
opening an issue first to discuss the nature of your change before beginning work on it.
## Known Issues
Please check open issues at the github repo.
# Rules
## r_pkg
```python
r_pkg(srcs, pkg_name, deps, cc_deps, build_args, install_args, config_override,
roclets, roclets_deps, makevars, env_vars, inst_files, tools, build_tools,
metadata)
```
Rule to install the package and its transitive dependencies in the Bazel
sandbox, so it can be depended upon by other package builds.
The builds produced from this rule are tested to be byte-for-byte reproducible
with the same R installation. For native code compilation, the compiler flags
for reproducibility are defined in the default site Makevars file in the local
toolchain. If using your own toolchain, ensure that your site Makevars file has
the right flags.
Implicit output targets
name.bin.tar.gz
Binary archive of the package.
name.tar.gz
Source archive of the package.
name.so
Shared archive of package native code; empty file if package does not
have native code.
Attributes
srcs
List of files, required
Source files to be included for building the package.
pkg_name
String; optional
Name of the package if different from the target name.
deps
List of labels; optional
R package dependencies of type `r_pkg` or `r_library`.
cc_deps
List of labels; optional
cc_library dependencies for this package.
build_args
List of strings; default ["--no-build-vignettes", "--no-manual"]
Additional arguments to supply to R CMD build. Note that building
vignettes is disabled by default to not require Tex installation for
users. In order to build vignettes, override this attribute, and ensure
that the relevant binaries are available in your system default
PATH (usually /usr/bin and /usr/local/bin)
install_args
List of strings; optional
Additional arguments to supply to R CMD INSTALL.
config_override
File; optional
Replace the package configure script with this file.
roclets
List of strings; optional
roclets to run before installing the package. If this is non-empty,
then you must specify roclets_deps as the R package you want to
use for running roclets. The runtime code will check if devtools
is available and use `devtools::document`, failing which, it will
check if roxygen2 is available and use `roxygen2::roxygenize`.
roclets_deps
List of labels; optional
roxygen2 or devtools dependency for running roclets.
makevars
File; optional
Additional Makevars file supplied as R_MAKEVARS_USER.
env_vars
Dictionary; optional
Extra environment variables to define for building the package.
inst_files
Label keyed Dictionary; optional
Files to be bundled with the package through the inst directory.
The values of the dictionary will specify the package relative
destination path. For example, '' will bundle the files to the top level
directory, and 'mydir' will bundle all files into a directory mydir.
tools
List of labels; optional
Executables that code in this package will try to find in the system.
build_tools
List of labels; optional
Executables that native code compilation will try to find in the system.
metadata
String keyed Dictionary; optional
Metadata key-value pairs to add to the DESCRIPTION file before building.
When text is enclosed within `{}`, bazel volatile and stable status
files will be used to substitute the text. Inclusion of these files in
the build has consequences on local and remote caching. Also see `stamp`.
stamp
Integer; default -1
Same behavior as the stamp attribute in cc_binary rule.
## r_library
```python
r_library(pkgs, library_path)
```
Executable rule to install the given packages and all dependencies to a user
provided or system default R library. Run the target with --help for usage
information.
The rule used to provide a tar archive of the library as an implicit output.
That feature is now it's own rule -- `r_library_tar`. See documentation for
[r_library_tar rule][r_library_tar] and [example][docker] usage for
container_image rule.
Attributes
pkgs
List of labels, required
Package (and dependencies) to install.
library_path
String; optional
If different from system default, default library location for installation.
For runtime overrides, use bazel run [target] -- -l [path].
## r_unit_test
```python
r_unit_test(pkg, suggested_deps, env_vars, tools, data)
```
Rule to keep all deps in the sandbox, and run the provided R test scripts.
When run with `bazel coverage`, this rule will also produce a coverage report
in Cobertura XML format. The coverage report will contain coverage for R code
in the package, and C/C++ code in the `src` directory of R packages.
Attributes
pkg
Label; required
R package (of type r_pkg) to test.
suggested_deps
List of labels; optional
R package dependencies of type `r_pkg` or `r_library`.
env_vars
Dictionary; optional
Extra environment variables to define before running the test.
tools
List of labels; optional
Executables to be made available to the test.
data
List of labels; optional
Data to be made available to the test.
## r_pkg_test
```python
r_pkg_test(pkg, suggested_deps, check_args, env_vars, tools, data)
```
Rule to keep all deps of the package in the sandbox, build a source archive
of this package, and run R CMD check on the package source archive in the
sandbox.
Attributes
pkg
Label; required
R package (of type r_pkg) to test.
suggested_deps
List of labels; optional
R package dependencies of type `r_pkg` or `r_library`.
check_args
List of strings; default ["--no-build-vignettes, "--no-manual"]
Additional arguments to supply to R CMD build. Note that building
vignettes is disabled by default to not require Tex installation for
users. In order to build vignettes, override this attribute, and ensure
that the relevant binaries are available in your system default
PATH (usually /usr/bin and /usr/local/bin)
env_vars
Dictionary; optional
Extra environment variables to define before running the test.
tools
List of labels; optional
Executables to be made available to the test.
data
List of labels; optional
Data to be made available to the test.
## r_binary
```python
r_binary(name, src, deps, data, env_vars, tools, rscript_args, script_args)
```
Build a wrapper shell script for running an executable which will have all the
specified R packages available.
The target can be executed standalone, with `bazel run`, or called from other
executables if RUNFILES_DIR is exported in the environment with
the runfiles of the root executable.
Attributes
src
File; required
An Rscript interpreted file, or file with executable permissions.
deps
List of labels; optional
Dependencies of type r_binary, r_pkg,
or r_library.
data
List of labels; optional
Files needed by this rule at runtime.
env_vars
Dictionary; optional
Extra environment variables to define before running the binary.
tools
List of labels; optional
Executables to be made available to the binary.
rscript_args
List of strings; optional
If src file does not have executable permissions, arguments for the
Rscript interpreter. We recommend using the shebang line and giving
your script execute permissions instead of using this.
script_args
List of strings; optional
A list of arguments to pass to the src script.
## r_test
```python
r_test(name, src, deps, data, env_vars, tools, rscript_args, script_args)
```
This is identical to [r_binary](#r_binary) but is run as a test.
## r_markdown
```python
r_markdown(name, src, deps, data, env_vars, tools, rscript_args, script_args,
render_function="rmarkdown::render", input_argument="input", output_dir_argument="output_dir",
render_args)
```
This rule renders an R markdown through generating a stub to call the render
function. The render function and the argument names for the function are
default set for `rmarkdown::render` but can be customized. Note that
`render_args` will need to be quoted appropriately if set. This rule can be
used wherever an [r_binary](#r_binary) rule can be used.
If arguments are given on the command line when running the target, flags of
the form --arg=value are passed as keyword arguments to the render
function. The values can be arbitrary R expressions, and strings will need to
be quoted. The last argument without the prefix `--` will be the output
directory, else the output directory will be the default output
directory of the render function, typically the same directory as the input
file.
## r_toolchain
```python
r_toolchain(r, rscript, version, args, makevars_site, env_vars, tools, files, system_state_file)
```
Toolchain to specify the tools and environment for performing build actions.
Also see [r_register_toolchains](#r_register_toolchains) for how
to configure the default registered toolchains.
Attributes
r
String; default R
Absolute path to R, or name of R executable; the search path will include the directories for tools attribute.
rscript
String; default Rscript
Absolute path to Rscript, or name of Rscript executable; the search path will include the directories for tools attribute.
version
String; optional
If provided, ensure version of R matches this string in x.y form. This version check is
performed in the `r_pkg` and `r_binary` (and by extension, `r_test` and `r_markdown`)
rules. For stronger guarantees, perform this version check when generating the
`system_state_file` (see attribute below).
args
List of strings; default ["--no-save", "--no-site-file", "--no-environ"]
Arguments to R and Rscript, in addition to `--slave --no-restore --no-init-file`.
makevars_site
Label; optional
Site-wide Makevars file.
env_vars
Dictionary; optional
Environment variables for BUILD actions.
tools
List of labels; optional
Additional tools to make available in PATH.
files
List of labels; optional
Additional files available to the BUILD actions.
system_state_file
Label; optional
A file that captures your system state. Use it to rebuild all R packages whenever the
contents of this file change. This is ideally generated by a repository_rule with
`configure = True`, so that a call to `bazel sync --configure` resets this file.
# Repository Rules
## r_repository
```python
r_repository(urls, strip_prefix, type, sha256, build_file, rscript)
```
Repository rule in place of `new_http_archive` that can run razel to generate
the BUILD file automatically. See section on
[external packages](#external-packages) and [Razel scripts][scripts].
Attributes
urls
List of strings; required
URLs from which the package source archive can be fetched.
strip_prefix
String; optional
The prefix to strip from all file paths in the archive.
type
String; optional
Type of the archive file (zip, tgz, etc.).
sha256
String; optional
sha256 checksum of the archive to verify.
build_file
File; optional
Optional BUILD file for this repo. If not provided, one will be generated.
razel_args
Dictionary; optional
Other arguments to supply to buildify function in razel.
rscript
String; optional
Name, path or label (must start with `@` or `//`) of the interpreter to use
for running the razel script.
## r_repository_list
```python
r_repository_list(package_list, build_file_overrides, remote_repos, other_args, rscript)
```
Repository rule that will generate a bzl file containing a macro, to be called
as `r_repositories()`, for `r_repository` definitions for packages in
`package_list` CSV. See section on [external packages](#external-packages).
Attributes
package_list
File; required
CSV containing packages with name, version and sha256; with a header.
build_file_overrides
File; optional
CSV containing package name and BUILD file path; with a header.
remote_repos
Dictionary; optional
Repos to use for fetching the archives.
other_args
Dictionary; optional
Other arguments to supply to generateWorkspaceMacro function in razel.
rscript
String; optional
Name, path or label (must start with `@` or `//`) of the interpreter to use
for running the razel script.
r_version
String; optional
If provided, ensure version of R matches this string in x.y form.
## r_rules_dependencies
```python
load("@rules_r//R:dependencies.bzl", "r_rules_dependencies")
r_rules_dependencies()
```
Repository rule that provides repository definitions for dependencies of the
BUILD system. One such dependency is the site-wide Makevars file.
## r_coverage_dependencies
```python
load("@rules_r//R:dependencies.bzl", "r_coverage_dependencies")
r_coverage_dependencies()
load("@r_coverage_deps_bzl//:r_repositories.bzl", coverage_deps = "r_repositories")
coverage_deps()
```
Repository rule that provides repository definitions for dependencies in
computing code coverage for unit tests. Not needed if users already have
a repository definition for the [covr](https://github.com/r-lib/covr) package.
## r_register_toolchains
```python
load("@rules_r//R:dependencies.bzl", "r_register_toolchains")
r_register_toolchains(r_home, strict, makevars_site, version, args, tools)
```
Repository rule that generates and registers a platform independent toolchain
of type [r_toolchain](#r_toolchain) based on the user's system and
environment. If you want to register your own toolchain for specific platforms,
register them before calling this function in your WORKSPACE file to give them
preference.
**NOTE**: These toolchains read your system state and cache the findings for
future runs. Whenever you install a new R version, or if you want to reset the
toolchain for any reason, run:
```bash
bazel sync --configure
```
Attributes
r_home
String, optional
A path to `R_HOME` (as returned from `R RHOME`). If not specified,
the rule looks for R and Rscript in `PATH`. The environment variable
`BAZEL_R_HOME` takes precendence over this value.
strict
Bool; default True
Fail if R is not found on the host system.
makevars_site
Bool; default True
Generate a site-wide Makevars file.
version
String; optional
version attribute value for r_toolchain.
args
List of strings; default ["--no-save", "--no-site-file", "--no-environ"]
args attribute value for r_toolchain.
tools
List of strings; optional
tools attribute value for r_toolchain.
[r]: https://cran.r-project.org
[gazelle]: gazelle
[scripts]: scripts
[libPaths]: https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html
[r_library_tar]: R/internal/library.bzl
[docker]: R/container/README.md
[exampleA]: tests/packages/exampleA
[exampleB]: tests/packages/exampleB
[exampleC]: tests/packages/exampleC
[exampleD]: tests/packages/exampleD