https://github.com/wlandau/targets-stan

An example project to validate a Stan model in a targets pipeline
https://github.com/wlandau/targets-stan

bayesian-statistics data-sceince high-performance-computing pipeline r reproducibility reproducible-research rstats stan-model statistics targets

Last synced: about 1 month ago
JSON representation

An example project to validate a Stan model in a targets pipeline

Host: GitHub
URL: https://github.com/wlandau/targets-stan
Owner: wlandau
License: other
Created: 2020-06-20T21:46:01.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2021-04-21T12:42:54.000Z (about 4 years ago)
Last Synced: 2024-02-26T19:41:40.266Z (about 1 year ago)
Topics: bayesian-statistics, data-sceince, high-performance-computing, pipeline, r, reproducibility, reproducible-research, rstats, stan-model, statistics, targets
Language: R
Homepage: https://rstudio.cloud/project/1430719/
Size: 332 KB
Stars: 26
Watchers: 5
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

        ---

output: github_document

bibliography: README.bib

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>"

)

```

# `targets` R package Stan model example

[![Launch RStudio Cloud](https://img.shields.io/badge/RStudio-Cloud-blue)](https://rstudio.cloud/project/1430719/)

The goal of this workflow is to validate a small Bayesian model using an interval-based method similar to simulation-based calibration [SBC; @cook2006; @talts2020]. We simulate multiple datasets from the model and fit the model on each dataset. For each model fit, we determine if the 50% credible interval of the regression coefficient `beta` contains the true value of `beta` used to generate the data. If we implemented the model correctly, roughly 50% of the models should recapture the true `beta` in 50% credible intervals.

## Consider stantargets

The [`stantargets`](https://wlandau.github.io/stantargets/) R package is an extension to [`targets`](https://docs.ropensci.org/targets/) and [`cmdstanr`](https://github.com/stan-dev/cmdstanr) for Bayesian data analysis, and it makes the latter two packages easier to use together. The pipeline in this repo can be written far more concisely using the [`tar_stan_mcmc_rep_summary()`](https://wlandau.github.io/stantargets/reference/tar_stan_mcmc_rep_summary.html) function (see [this vignette](https://wlandau.github.io/stantargets/articles/mcmc_rep.html)).  is a version of this example project that uses [`stantargets`](https://wlandau.github.io/stantargets/), and the [pipeline in the `_targets.R` file](https://github.com/wlandau/stantargets-example-validation/blob/main/_targets.R) is much simpler and easier to define.

## The model

```{r, eval = FALSE}

y_i ~ iid Normal(alpha + x_i * beta, sigma^2)

alpha ~ Normal(0, 1)

beta ~ Normal(0, 1)

sigma ~ HalfCauchy(0, 1)

```

## The `targets` pipeline

The [`targets`](https://github.com/wlandau/targets) R package manages the workflow. It automatically skips steps of the pipeline when the results are already up to date, which is critical for Bayesian data analysis because it usually takes a long time to run Markov chain Monte Carlo. It also helps users understand and communicate this work with tools like the interactive dependency graph below.

```{r, eval = FALSE}

library(targets)

tar_visnetwork()

```

![](./images/graph.png)

## How to access

You can try out this example project as long as you have a browser and an internet connection. [Click here](https://rstudio.cloud/project/1430719/) to navigate your browser to an RStudio Cloud instance. Alternatively, you can clone or download this code repository and install the R packages [listed here](https://github.com/wlandau/targets-minimal/blob/03835c2aa4679dcf3f28c623a06d7505b18bee17/DESCRIPTION#L25-L30).

## How to run

In the R console, call the [`tar_make()`](https://wlandau.github.io/targets/reference/tar_make.html) function to run the pipeline. Then, call `tar_read(hist)` to retrieve the histogram. Experiment with [other functions](https://wlandau.github.io/targets/reference/index.html) such as [`tar_visnetwork()`](https://wlandau.github.io/targets/reference/tar_visnetwork.html) to learn how they work.

## File structure

The files in this example are organized as follows.

```{r, eval = FALSE}

├── run.sh

├── run.R

├── _targets.R

├── _targets/

├── sge.tmpl

├── R

│   ├── functions.R

│   └── utils.R

├── stan

│   └── model.stan

└── report.Rmd

```

File | Purpose

---|---

[`run.sh`](https://github.com/wlandau/targets-stan/blob/main/run.sh) | Shell script to run [`run.R`](https://github.com/wlandau/targets-stan/blob/main/run.R) in a persistent background process. Works on Unix-like systems. Helpful for long computations on servers.

[`run.R`](https://github.com/wlandau/targets-stan/blob/main/run.R) | R script to run `tar_make()` or `tar_make_clustermq()` (uncomment the function of your choice.)

[`_targets.R`](https://github.com/wlandau/targets-stan/blob/main/_targets.R) | The special R script that declares the [`targets`](https://github.com/wlandau/targets) pipeline. See `tar_script()` for details.

[`sge.tmpl`](https://github.com/wlandau/targets-stan/blob/main/sge.tmpl) | A [`clustermq`](https://github.com/mschubert/clustermq) template file to deploy targets in parallel to a Sun Grid Engine cluster. The comments in this file explain some of the choices behind the pipeline construction and arguments to `tar_target()`.

[`R/functions.R`](https://github.com/wlandau/targets-stan/blob/main/R/functions.R) | A custom R script with the most important user-defined functions.

[`R/utils.R`](https://github.com/wlandau/targets-stan/blob/main/R/utils.R) | A custom R script with helper functions.

[`stan/model.stan`](https://github.com/wlandau/targets-stan/blob/main/stan/model.stan) | The specification of our Stan model.

[`report.Rmd`](https://github.com/wlandau/targets-stan/blob/main/report.Rmd) | An R Markdown report summarizing the results of the analysis. For more information on how to include R Markdown reports as reproducible components of the pipeline, see the `tar_render()` function from the [`tarchetypes`](https://wlandau.github.io/tarchetypes) package and the [literate programming chapter of the manual](https://wlandau.github.io/targets-manual/files.html#literate-programming).

## Scaling out

This computation is currently downsized for pedagogical purposes. To scale it up, open the [`_targets.R`](https://github.com/wlandau/targets-stan/blob/main/_targets.R) script and increase the number of simulations (the number inside `seq_len()` in the `index` target).

## High-performance computing

You can run this project locally on your laptop or remotely on a cluster. You have several choices, and they each require modifications to [`run.R`](https://github.com/wlandau/targets-stan/blob/main/run.R) and [`_targets.R`](https://github.com/wlandau/targets-stan/blob/main/_targets.R).

Mode | When to use | Instructions for [`run.R`](https://github.com/wlandau/targets-stan/blob/main/run.R) | Instructions for [`_targets.R`](https://github.com/wlandau/targets-stan/blob/main/_targets.R)

---|---|---|---

Sequential | Low-spec local machine or Windows. | Uncomment `tar_make()` | No action required.

Local multicore | Local machine with a Unix-like OS. | Uncomment `tar_make_clustermq()` | Uncomment `options(clustermq.scheduler = "multicore")`

Sun Grid Engine | Sun Grid Engine cluster. | Uncomment `tar_make_clustermq()` | Uncomment `options(clustermq.scheduler = "sge", clustermq.template = "sge.tmpl")`

## References

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wlandau/targets-stan

Awesome Lists containing this project

README