https://github.com/elilillyco/rfacts

Call FACTS from R on Linux
https://github.com/elilillyco/rfacts

clinical-trials facts r simulation

Last synced: 9 months ago
JSON representation

Call FACTS from R on Linux

Host: GitHub
URL: https://github.com/elilillyco/rfacts
Owner: EliLillyCo
License: other
Created: 2020-07-01T02:20:29.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2022-08-19T13:02:00.000Z (over 3 years ago)
Last Synced: 2025-04-15T22:56:37.055Z (9 months ago)
Topics: clinical-trials, facts, r, simulation
Language: R
Homepage: https://EliLillyCo.github.io/rfacts
Size: 378 KB
Stars: 7
Watchers: 2
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r setup, include = FALSE}

suppressPackageStartupMessages(library(dplyr))

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# rfacts

[![cran](https://www.r-pkg.org/badges/version/rfacts)](https://cran.r-project.org/package=rfacts)

[![active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)

[![check](https://github.com/EliLillyCo/rfacts/workflows/check/badge.svg)](https://github.com/EliLillyCo/rfacts/actions?query=workflow%3Acheck)

[![lint](https://github.com/EliLillyCo/rfacts/workflows/lint/badge.svg)](https://github.com/EliLillyCo/rfacts/actions?query=workflow%3Alint)

The rfacts package is an R interface to the [Fixed and Adaptive Clinical Trial Simulator (FACTS)](https://www.berryconsultants.com/software/) on Unix-like systems. It programmatically invokes [FACTS](https://www.berryconsultants.com/software/) to run clinical trial simulations, and it aggregates simulation output data into tidy data frames. These capabilities provide end-to-end automation for large-scale simulation workflows, and they enhance computational reproducibility. For more information, please visit the [documentation website](https://elilillyco.github.io/rfacts/).

## Disclaimer

`rfacts` is not a product of nor supported by [Berry Consultants](https://www.berryconsultants.com/). The code base of `rfacts` is completely independent from that of [FACTS](https://www.berryconsultants.com/software/), and the former only invokes the latter though dynamic system calls.

## Limitations

* FACTS files prior to version 6.2.4 are unsupported.

* `rfacts` only works on Unix-like systems.

* `rfacts` requires paths to pre-compiled versions of Mono, FLFLL, and the FACTS Linux engines. See the installation instructions below and the [configuration guide](https://elilillyco.github.io/rfacts/articles/config.html).

## Installation

To install the latest release from CRAN, open R and run the following.

```{r, eval = FALSE}

install.packages("rfacts")

```

To install the latest development version:

```{r, eval = FALSE}

install.packages("remotes")

remotes::install_github("EliLillyCo/rfacts")

```

Next, set the `RFACTS_PATHS` environment variable appropriately. For instructions, please see the [configuration guide](https://elilillyco.github.io/rfacts/articles/config.html).

## Run FACTS simulations

First, create a `*.facts` XML file using the [FACTS](https://www.berryconsultants.com/software/) GUI. The `rfacts` package has several built-in examples, included with permission from Berry Consultants LLC.

```{r}

library(rfacts)

# get_facts_file_example() returns the path to

# an example a FACTS file from rfacts itself.

# For your own FACTS files you create yourself in the FACTS GUI,

# you can skip get_facts_file_example().

facts_file <- get_facts_file_example("contin.facts")

basename(facts_file)

```

Then, run trial simulations with `run_facts()`. By default, the results are written to a temporary directory. Set the `output_path` argument to customize the path.

```{r}

out <- run_facts(

  facts_file,

  n_sims = 2,

  verbose = FALSE

)

out

head(get_csv_files(out))

```

Use `read_patients()` to read and aggregate all the `patients*.csv` files. `rfacts` has several such functions, including `read_weeks()` and `read_mcmc()`.

```{r}

read_patients(out)

```

## The simulation process

`run_facts()` has two sequential stages:

1. `run_flfll()`: generate the `*.param` files and the folder structure for the FACTS Linux engines.

2. `run_engine()`: execute the instructions in the `*.param` files to conduct trial simulations and produce CSV output.

```{r}

out <- run_flfll(facts_file, verbose = FALSE)

run_engine(facts_file, param_files = out, n_sims = 4, verbose = FALSE)

read_patients(out)

```

`run_engine()` automatically detects the Linux engine required for your FACTS file. If you know the engine in advance, you can use a specific engine function such as `run_engine_contin()` or `run_engine_dichot()`.

```{r}

out <- run_flfll(facts_file, verbose = FALSE)

run_engine_contin(param_files = out, n_sims = 4, verbose = FALSE)

read_patients(out)

```

If you are unsure which engine function to use, call `get_facts_engine()`

```{r}

get_facts_engine(facts_file)

```

## Run a single scenario

If we take control of the simulation process, we can pick and choose which FACTS simulation scenarios to run and read.

```{r}

# Example FACTS file built into rfacts.

facts_file <- get_facts_file_example("contin.facts")

# Set up the files for the scenarios.

param_files <- run_flfll(facts_file, verbose = FALSE)

# Each scenario has its own folder with internal parameter files.

scenarios <- get_param_dirs(param_files) # not in rfacts <= 1.0.0

scenarios

# Let's pick one of those scenarios and run the simulations.

scenario <- scenarios[1]

run_engine_contin(scenario, n_sims = 2, verbose = FALSE)

read_patients(scenario)

```

## Parallel computing

rfacts makes it straightforward to parallelize across simulations. First, use `run_flfll()` to create a directory of param files. The example below uses a `tempfile()` to store the param files (i.e. `output_path`).  However, for distributed computing on traditional HPC clusters, `output_path` should be a directory path that all nodes can access.

```{r}

library(rfacts)

facts_file <- get_facts_file_example("contin.facts")

# On traditional HPC clusters, this should be a shared directory

# instead of a temp directory:

tmp <- fs::dir_create(tempfile())

param_files <- file.path(tmp, "param_files")

run_flfll(facts_file, param_files)

```

Next, write a custom function that accepts the param files, runs a single simulation for each param file, and returns the important data in memory. Be sure to set a unique seed for each simulation iteration.

```{r}

sim_once <- function(iter, param_files) {

  # Copy param files to a temp file in order to

  # (1) Avoid race conditions in parallel processing, and

  # (2) Make things run faster: temp files are on local node storage.

  out <- tempfile()

  fs::dir_copy(path = param_files, new_path = out)

  

  # Run the engine once per param file.

  run_engine_contin(out, n_sims = 1L, seed = iter)

  

  # Return aggregated patients files.

  read_patients(out) # Reads fast because `out` is a tempfile().

}

```

At this point, we should test this function locally without parallel computing.

```{r}

library(dplyr)

# All the patients files were named patients00001.csv,

# so do not trust the facts_sim column.

# For data post-processing, use the facts_id column instead.

lapply(seq_len(4), sim_once, param_files = param_files) %>%

  bind_rows()

```

Parallel computing happens when we call `sim_once()` repeatedly over several parallel workers. A powerful and convenient parallel computing solution is [`clustermq`](https://mschubert.github.io/clustermq/). Here is a sketch of how to use it with `rfacts`. `mclapply()` from the `parallel` package is a quick and dirty alternative. 

```{r, eval = FALSE}

# Configure clustermq to use our grid and your template file.

# If you are using a scheduler like SGE, you need to write a template file

# like clustermq.tmpl. To learn how, visit

# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1

options(clustermq.scheduler = "sge", clustermq.template = "clustermq.tmpl")

# Run the computation.

library(clustermq)

patients <- Q(

  fun = sim_once,

  iter = seq_len(50),

  const = list(params = params),

  pkgs = c("fs", "rfacts"),

  n_jobs = 4

) %>%

  bind_rows()

# Show aggregated patient data.

patients

```

Alternatives to `clustermq` include `parallel::mclapply()`, `furrr::future_map()`, and `future.apply::future_lapply()`.

## Helpers

Various `get_facts_*()` functions interrogate FACTS files.

```{r}

get_facts_scenarios(facts_file)

get_facts_version(facts_file)

get_facts_versions()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/elilillyco/rfacts

Awesome Lists containing this project

README