https://github.com/ropensci/tarchetypes

Archetypes for targets and pipelines
https://github.com/ropensci/tarchetypes

data-science high-performance-computing peer-reviewed pipeline r r-package r-targetopia reproducibility rstats targets workflow

Last synced: about 1 month ago
JSON representation

Archetypes for targets and pipelines

Host: GitHub
URL: https://github.com/ropensci/tarchetypes
Owner: ropensci
License: other
Created: 2020-07-27T02:25:17.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2025-05-05T18:53:35.000Z (about 1 month ago)
Last Synced: 2025-05-05T19:54:36.293Z (about 1 month ago)
Topics: data-science, high-performance-computing, peer-reviewed, pipeline, r, r-package, r-targetopia, reproducibility, rstats, targets, workflow
Language: R
Homepage: https://docs.ropensci.org/tarchetypes
Size: 2.04 MB
Stars: 144
Watchers: 7
Forks: 21
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codemeta: codemeta.json

Awesome Lists containing this project

jimsghstars - ropensci/tarchetypes - Archetypes for targets and pipelines (R)

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# tarchetypes 

[![ropensci](https://badges.ropensci.org/401_status.svg)](https://github.com/ropensci/software-review/issues/401)

[![zenodo](https://zenodo.org/badge/282774543.svg)](https://zenodo.org/badge/latestdoi/282774543)

[![R Targetopia](https://img.shields.io/badge/R_Targetopia-member-blue?style=flat&labelColor=gray)](https://wlandau.github.io/targetopia/)

[![CRAN](https://www.r-pkg.org/badges/version/tarchetypes)](https://CRAN.R-project.org/package=tarchetypes)

[![status](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)

[![check](https://github.com/ropensci/tarchetypes/actions/workflows/check.yaml/badge.svg)](https://github.com/ropensci/tarchetypes/actions?query=workflow%3Acheck)

[![codecov](https://codecov.io/gh/ropensci/tarchetypes/branch/main/graph/badge.svg?token=3T5DlLwUVl)](https://app.codecov.io/gh/ropensci/tarchetypes)

[![lint](https://github.com/ropensci/tarchetypes/actions/workflows/lint.yaml/badge.svg)](https://github.com/ropensci/tarchetypes/actions?query=workflow%3Alint)

The `tarchetypes` R package is a collection of target and pipeline archetypes for the [`targets`](https://github.com/ropensci/targets) package. These archetypes express complicated pipelines with concise syntax, which enhances readability and thus reproducibility. Archetypes are possible because of the flexible metaprogramming capabilities of [`targets`](https://github.com/ropensci/targets). In [`targets`](https://github.com/ropensci/targets), one can define a target as an object outside the central pipeline, and the [`tar_target_raw()`](https://docs.ropensci.org/targets/reference/tar_target_raw.html) function completely avoids non-standard evaluation. That means anyone can write their own niche interfaces for specialized projects. `tarchetypes` aims to include the most common and versatile archetypes and usage patterns.

## Grouped data frames

`tarchetypes` has functions for easy dynamic branching over subsets of data frames:

* `tar_group_by()`: define row groups using `dplyr::group_by()` semantics.

* `tar_group_select()`: define row groups using `tidyselect` semantics.

* `tar_group_count()`: define a given number row groups.

* `tar_group_size()`: define row groups of a given size.

If you define a target with one of these functions, all downstream dynamic targets will automatically branch over the row groups.

```{r, echo = FALSE}

targets::tar_script({

  produce_data <- function() {

    expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))

  }

  list(

    tarchetypes::tar_group_by(data, produce_data(), var1, var2),

    tar_target(group, data, pattern = map(data))

  )

})

```

```{r, eval = FALSE}

# _targets.R file:

library(targets)

library(tarchetypes)

produce_data <- function() {

  expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))

}

list(

  tar_group_by(data, produce_data(), var1, var2),

  tar_target(group, data, pattern = map(data))

)

```

```{r}

# R console:

library(targets)

tar_make()

# First row group:

tar_read(group, branches = 1)

# Second row group:

tar_read(group, branches = 2)

```

## Literate programming

Consider the following R Markdown report.

```{r, echo = FALSE, comment = ""}

lines <- c(

  "---",

  "title: report",

  "output: html_document",

  "---",

  "",

  "```{r}",

  "library(targets)",

  "tar_read(dataset)",

  "```"

)

cat(lines, sep = "\n")

```

We want to define a target to render the report. And because the report calls `tar_read(dataset)`, this target needs to depend on `dataset`. Without `tarchetypes`, it is cumbersome to set up the pipeline correctly.

```{r, eval = FALSE}

# _targets.R

library(targets)

list(

  tar_target(dataset, data.frame(x = letters)),

  tar_target(

    report, {

      # Explicitly mention the symbol `dataset`.

      list(dataset)

      # Return relative paths to keep the project portable.

      fs::path_rel(

        # Need to return/track all input/output files.

        c( 

          rmarkdown::render(

            input = "report.Rmd",

            # Always run from the project root

            # so the report can find _targets/.

            knit_root_dir = getwd(),

            quiet = TRUE

          ),

          "report.Rmd"

        )

      )

    },

    # Track the input and output files.

    format = "file",

    # Avoid building small reports on HPC.

    deployment = "main"

  )

)

```

With `tarchetypes`, we can simplify the pipeline with the `tar_render()` archetype.

```{r, eval = FALSE}

# _targets.R

library(targets)

library(tarchetypes)

list(

  tar_target(dataset, data.frame(x = letters)),

  tar_render(report, "report.Rmd")

)

```

Above, `tar_render()` scans code chunks for mentions of targets in `tar_load()` and `tar_read()`, and it enforces the dependency relationships it finds. In our case, it reads `report.Rmd` and then forces `report` to depend on `dataset`. That way, `tar_make()` always processes `dataset` before `report`, and it automatically reruns `report.Rmd` whenever `dataset` changes.

## Alternative pipeline syntax

[`tar_plan()`](https://docs.ropensci.org/tarchetypes/reference/tar_plan.html) is a drop-in replacement for [`drake_plan()`](https://docs.ropensci.org/drake/reference/drake_plan.html) in the [`targets`](https://github.com/ropensci/targets) ecosystem. 

It lets users write targets as name/command pairs without having to call [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html).

```{r, eval = FALSE}

tar_plan(

  tar_file(raw_data_file, "data/raw_data.csv", format = "file"),

  # Simple drake-like syntax:

  raw_data = read_csv(raw_data_file, col_types = cols()),

  data =raw_data %>%

    mutate(Ozone = replace_na(Ozone, mean(Ozone, na.rm = TRUE))),

  hist = create_plot(data),

  fit = biglm(Ozone ~ Wind + Temp, data),

  # Needs tar_render() because it is a target archetype:

  tar_render(report, "report.Rmd")

)

```

## Installation

Type | Source | Command

---|---|---

Release | CRAN | `install.packages("tarchetypes")`

Development | GitHub | `remotes::install_github("ropensci/tarchetypes")`

Development | rOpenSci | `install.packages("tarchetypes", repos = "https://dev.ropensci.org")`

## Documentation

For specific documentation on `tarchetypes`, including the help files of all user-side functions, please visit the [reference website](https://docs.ropensci.org/tarchetypes/). For documentation on [`targets`](https://github.com/ropensci/targets) in general, please visit the [`targets` reference website](https://docs.ropensci.org/targets/). Many of the linked resources use `tarchetypes` functions such as [`tar_render()`](https://docs.ropensci.org/tarchetypes/reference/tar_render.html).

## Help

Please read the [help guide](https://books.ropensci.org/targets/help.html) to learn how best to ask for help using `targets` and `tarchetypes`.

## Code of conduct

Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/).

## Citation

```{r}

citation("tarchetypes")

```

```{r, echo = FALSE}

unlink("_targets.R")

tar_destroy()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ropensci/tarchetypes

Awesome Lists containing this project

README