Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ropensci/tarchetypes
Archetypes for targets and pipelines
https://github.com/ropensci/tarchetypes
data-science high-performance-computing peer-reviewed pipeline r r-package r-targetopia reproducibility rstats targets workflow
Last synced: 5 days ago
JSON representation
Archetypes for targets and pipelines
- Host: GitHub
- URL: https://github.com/ropensci/tarchetypes
- Owner: ropensci
- License: other
- Created: 2020-07-27T02:25:17.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-11-18T17:50:11.000Z (25 days ago)
- Last Synced: 2024-11-29T14:20:00.103Z (14 days ago)
- Topics: data-science, high-performance-computing, peer-reviewed, pipeline, r, r-package, r-targetopia, reproducibility, rstats, targets, workflow
- Language: R
- Homepage: https://docs.ropensci.org/tarchetypes
- Size: 1.67 MB
- Stars: 140
- Watchers: 7
- Forks: 18
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codemeta: codemeta.json
Awesome Lists containing this project
- jimsghstars - ropensci/tarchetypes - Archetypes for targets and pipelines (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# tarchetypes
[![ropensci](https://badges.ropensci.org/401_status.svg)](https://github.com/ropensci/software-review/issues/401)
[![zenodo](https://zenodo.org/badge/282774543.svg)](https://zenodo.org/badge/latestdoi/282774543)
[![R Targetopia](https://img.shields.io/badge/R_Targetopia-member-blue?style=flat&labelColor=gray)](https://wlandau.github.io/targetopia/)
[![CRAN](https://www.r-pkg.org/badges/version/tarchetypes)](https://CRAN.R-project.org/package=tarchetypes)
[![status](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![check](https://github.com/ropensci/tarchetypes/actions/workflows/check.yaml/badge.svg)](https://github.com/ropensci/tarchetypes/actions?query=workflow%3Acheck)
[![codecov](https://codecov.io/gh/ropensci/tarchetypes/branch/main/graph/badge.svg?token=3T5DlLwUVl)](https://app.codecov.io/gh/ropensci/tarchetypes)
[![lint](https://github.com/ropensci/tarchetypes/actions/workflows/lint.yaml/badge.svg)](https://github.com/ropensci/tarchetypes/actions?query=workflow%3Alint)The `tarchetypes` R package is a collection of target and pipeline archetypes for the [`targets`](https://github.com/ropensci/targets) package. These archetypes express complicated pipelines with concise syntax, which enhances readability and thus reproducibility. Archetypes are possible because of the flexible metaprogramming capabilities of [`targets`](https://github.com/ropensci/targets). In [`targets`](https://github.com/ropensci/targets), one can define a target as an object outside the central pipeline, and the [`tar_target_raw()`](https://docs.ropensci.org/targets/reference/tar_target_raw.html) function completely avoids non-standard evaluation. That means anyone can write their own niche interfaces for specialized projects. `tarchetypes` aims to include the most common and versatile archetypes and usage patterns.
## Grouped data frames
`tarchetypes` has functions for easy dynamic branching over subsets of data frames:
* `tar_group_by()`: define row groups using `dplyr::group_by()` semantics.
* `tar_group_select()`: define row groups using `tidyselect` semantics.
* `tar_group_count()`: define a given number row groups.
* `tar_group_size()`: define row groups of a given size.If you define a target with one of these functions, all downstream dynamic targets will automatically branch over the row groups.
```{r, echo = FALSE}
targets::tar_script({
produce_data <- function() {
expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))
}
list(
tarchetypes::tar_group_by(data, produce_data(), var1, var2),
tar_target(group, data, pattern = map(data))
)
})
``````{r, eval = FALSE}
# _targets.R file:
library(targets)
library(tarchetypes)
produce_data <- function() {
expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))
}
list(
tar_group_by(data, produce_data(), var1, var2),
tar_target(group, data, pattern = map(data))
)
``````{r}
# R console:
library(targets)
tar_make()# First row group:
tar_read(group, branches = 1)# Second row group:
tar_read(group, branches = 2)
```## Literate programming
Consider the following R Markdown report.
```{r, echo = FALSE, comment = ""}
lines <- c(
"---",
"title: report",
"output: html_document",
"---",
"",
"```{r}",
"library(targets)",
"tar_read(dataset)",
"```"
)
cat(lines, sep = "\n")
```We want to define a target to render the report. And because the report calls `tar_read(dataset)`, this target needs to depend on `dataset`. Without `tarchetypes`, it is cumbersome to set up the pipeline correctly.
```{r, eval = FALSE}
# _targets.R
library(targets)
list(
tar_target(dataset, data.frame(x = letters)),
tar_target(
report, {
# Explicitly mention the symbol `dataset`.
list(dataset)
# Return relative paths to keep the project portable.
fs::path_rel(
# Need to return/track all input/output files.
c(
rmarkdown::render(
input = "report.Rmd",
# Always run from the project root
# so the report can find _targets/.
knit_root_dir = getwd(),
quiet = TRUE
),
"report.Rmd"
)
)
},
# Track the input and output files.
format = "file",
# Avoid building small reports on HPC.
deployment = "main"
)
)
```With `tarchetypes`, we can simplify the pipeline with the `tar_render()` archetype.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
list(
tar_target(dataset, data.frame(x = letters)),
tar_render(report, "report.Rmd")
)
```Above, `tar_render()` scans code chunks for mentions of targets in `tar_load()` and `tar_read()`, and it enforces the dependency relationships it finds. In our case, it reads `report.Rmd` and then forces `report` to depend on `dataset`. That way, `tar_make()` always processes `dataset` before `report`, and it automatically reruns `report.Rmd` whenever `dataset` changes.
## Alternative pipeline syntax
[`tar_plan()`](https://docs.ropensci.org/tarchetypes/reference/tar_plan.html) is a drop-in replacement for [`drake_plan()`](https://docs.ropensci.org/drake/reference/drake_plan.html) in the [`targets`](https://github.com/ropensci/targets) ecosystem.
It lets users write targets as name/command pairs without having to call [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html).```{r, eval = FALSE}
tar_plan(
tar_file(raw_data_file, "data/raw_data.csv", format = "file"),
# Simple drake-like syntax:
raw_data = read_csv(raw_data_file, col_types = cols()),
data =raw_data %>%
mutate(Ozone = replace_na(Ozone, mean(Ozone, na.rm = TRUE))),
hist = create_plot(data),
fit = biglm(Ozone ~ Wind + Temp, data),
# Needs tar_render() because it is a target archetype:
tar_render(report, "report.Rmd")
)
```## Installation
Type | Source | Command
---|---|---
Release | CRAN | `install.packages("tarchetypes")`
Development | GitHub | `remotes::install_github("ropensci/tarchetypes")`
Development | rOpenSci | `install.packages("tarchetypes", repos = "https://dev.ropensci.org")`## Documentation
For specific documentation on `tarchetypes`, including the help files of all user-side functions, please visit the [reference website](https://docs.ropensci.org/tarchetypes/). For documentation on [`targets`](https://github.com/ropensci/targets) in general, please visit the [`targets` reference website](https://docs.ropensci.org/targets/). Many of the linked resources use `tarchetypes` functions such as [`tar_render()`](https://docs.ropensci.org/tarchetypes/reference/tar_render.html).
## Help
Please read the [help guide](https://books.ropensci.org/targets/help.html) to learn how best to ask for help using `targets` and `tarchetypes`.
## Code of conduct
Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/).
## Citation
```{r}
citation("tarchetypes")
``````{r, echo = FALSE}
unlink("_targets.R")
tar_destroy()
```