Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tidymodels/baguette
parsnip Model Functions for Bagging
https://github.com/tidymodels/baguette
Last synced: 3 days ago
JSON representation
parsnip Model Functions for Bagging
- Host: GitHub
- URL: https://github.com/tidymodels/baguette
- Owner: tidymodels
- License: other
- Created: 2019-03-23T01:59:18.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2024-10-23T12:31:39.000Z (21 days ago)
- Last Synced: 2024-10-31T22:03:14.534Z (12 days ago)
- Language: R
- Homepage: https://baguette.tidymodels.org
- Size: 8.07 MB
- Stars: 24
- Watchers: 8
- Forks: 8
- Open Issues: 7
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```[![R-CMD-check](https://github.com/tidymodels/baguette/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/baguette/actions/workflows/R-CMD-check.yaml)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![CRAN status](https://www.r-pkg.org/badges/version/baguette)](https://cran.r-project.org/package=baguette)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/baguette/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/baguette?branch=main)## Introduction
The goal of baguette is to provide efficient functions for bagging (aka [bootstrap aggregating](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C7&q=bagging+predictors+breiman+1996&oq=Bagging+predictors+)) ensemble models.
The model objects produced by baguette are kept smaller than they would otherwise be through two operations:
- The [butcher](https://butcher.tidymodels.org/) package is used to remove object elements that are not crucial to using the models. For example, some models contain copies of the training set or model residuals when created. These are removed to save space.
- For ensembles whose base models use a formula method, there is a built-in redundancy because each model has an identical terms object. However, each one of these takes up separate space in memory and can be quite large when there are many predictors. The baguette package solves this problem by replacing each terms object with the object from the first model in the ensemble. Since the other terms objects are not modified, we get the same functional capabilities using far less memory to save the ensemble.
## Installation
You can install the released version of baguette from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("baguette")
```Install the development version from GitHub with:
``` r
# install.packages("pak")
pak::pak("tidymodels/baguette")
```## Available Engines
The baguette package provides engines for the models in the following table.
```{r, echo=FALSE, message=FALSE}
library(parsnip)parsnip_models <- get_from_env("models") %>%
setNames(., .) %>%
purrr::map_dfr(get_from_env, .id = "model")library(baguette)
baguette_models <- get_from_env("models") %>%
setNames(., .) %>%
purrr::map_dfr(get_from_env, .id = "model")dplyr::anti_join(
baguette_models, parsnip_models,
by = c("model", "engine", "mode")
) %>%
knitr::kable()
```## Example
Let's build a bagged decision tree model to predict a continuous outcome.
```{r}
library(baguette)bag_tree() %>%
set_engine("rpart") # C5.0 is also available hereset.seed(123)
bag_cars <-
bag_tree() %>%
set_engine("rpart", times = 25) %>% # 25 ensemble members
set_mode("regression") %>%
fit(mpg ~ ., data = mtcars)bag_cars
```The models also return aggregated variable importance scores.
## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/baguette/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).