Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tidymodels/parsnip
A tidy unified interface to models
https://github.com/tidymodels/parsnip
Last synced: 2 days ago
JSON representation
A tidy unified interface to models
- Host: GitHub
- URL: https://github.com/tidymodels/parsnip
- Owner: tidymodels
- License: other
- Created: 2017-12-10T22:48:42.000Z (almost 7 years ago)
- Default Branch: main
- Last Pushed: 2024-11-07T20:38:35.000Z (4 days ago)
- Last Synced: 2024-11-07T21:30:45.446Z (4 days ago)
- Language: R
- Homepage: https://parsnip.tidymodels.org
- Size: 30.2 MB
- Stars: 597
- Watchers: 29
- Forks: 89
- Open Issues: 77
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
- jimsghstars - tidymodels/parsnip - A tidy unified interface to models (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```[![R-CMD-check](https://github.com/tidymodels/parsnip/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/parsnip/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/parsnip/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/parsnip?branch=main)
[![CRAN status](https://www.r-pkg.org/badges/version/parsnip)](https://CRAN.R-project.org/package=parsnip)
[![Downloads](https://cranlogs.r-pkg.org/badges/parsnip)](https://CRAN.R-project.org/package=parsnip)
[![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html)## Introduction
The goal of parsnip is to provide a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages.
## Installation
```{r, eval = FALSE}
# The easiest way to get parsnip is to install all of tidymodels:
install.packages("tidymodels")# Alternatively, install just parsnip:
install.packages("parsnip")# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidymodels/parsnip")
```## Getting started
One challenge with different modeling functions available in R _that do the same thing_ is that they can have different interfaces and arguments. For example, to fit a random forest regression model, we might have:
```{r eval = FALSE}
# From randomForest
rf_1 <- randomForest(
y ~ .,
data = dat,
mtry = 10,
ntree = 2000,
importance = TRUE
)# From ranger
rf_2 <- ranger(
y ~ .,
data = dat,
mtry = 10,
num.trees = 2000,
importance = "impurity"
)# From sparklyr
rf_3 <- ml_random_forest(
dat,
intercept = FALSE,
response = "y",
features = names(dat)[names(dat) != "y"],
col.sample.rate = 10,
num.trees = 2000
)
```Note that the model syntax can be very different and that the argument names (and formats) are also different. This is a pain if you switch between implementations.
In this example:
* the **type** of model is "random forest",
* the **mode** of the model is "regression" (as opposed to classification, etc), and
* the computational **engine** is the name of the R package.The goals of parsnip are to:
* Separate the definition of a model from its evaluation.
* Decouple the model specification from the implementation (whether the implementation is in R, spark, or something else). For example, the user would call `rand_forest` instead of `ranger::ranger` or other specific packages.
* Harmonize argument names (e.g. `n.trees`, `ntrees`, `trees`) so that users only need to remember a single name. This will help _across_ model types too so that `trees` will be the same argument across random forest as well as boosting or bagging.Using the example above, the parsnip approach would be:
```{r}
library(parsnip)rand_forest(mtry = 10, trees = 2000) %>%
set_engine("ranger", importance = "impurity") %>%
set_mode("regression")
```The engine can be easily changed. To use Spark, the change is straightforward:
```{r}
rand_forest(mtry = 10, trees = 2000) %>%
set_engine("spark") %>%
set_mode("regression")
```Either one of these model specifications can be fit in the same way:
```{r}
set.seed(192)
rand_forest(mtry = 10, trees = 2000) %>%
set_engine("ranger", importance = "impurity") %>%
set_mode("regression") %>%
fit(mpg ~ ., data = mtcars)
```A list of all parsnip models across different CRAN packages can be found at https://www.tidymodels.org/find/parsnip.
## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/parsnip/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).