{"id":13857348,"url":"https://github.com/tidymodels/workflowsets","last_synced_at":"2025-05-15T15:04:10.414Z","repository":{"id":37648231,"uuid":"315496922","full_name":"tidymodels/workflowsets","owner":"tidymodels","description":"Create a collection of modeling workflows","archived":false,"fork":false,"pushed_at":"2025-04-25T20:00:00.000Z","size":124195,"stargazers_count":94,"open_issues_count":13,"forks_count":10,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-14T01:54:57.038Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://workflowsets.tidymodels.org/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tidymodels.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-24T02:30:49.000Z","updated_at":"2025-04-25T19:51:28.000Z","dependencies_parsed_at":"2023-11-29T21:28:17.238Z","dependency_job_id":"9a8575a0-5f61-467b-a905-ec7b3cbf7c5a","html_url":"https://github.com/tidymodels/workflowsets","commit_stats":{"total_commits":223,"total_committers":9,"mean_commits":24.77777777777778,"dds":"0.37668161434977576","last_synced_commit":"b5a26c01b7570c614f34653822dd0c3115f74209"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fworkflowsets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fworkflowsets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fworkflowsets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fworkflowsets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tidymodels","download_url":"https://codeload.github.com/tidymodels/workflowsets/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254364270,"owners_count":22058878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:33.984Z","updated_at":"2025-05-15T15:04:10.408Z","avatar_url":"https://github.com/tidymodels.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# workflowsets\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/tidymodels/workflowsets/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/workflowsets/actions/workflows/R-CMD-check.yaml)\n[![R-CMD-check-no-suggests](https://github.com/tidymodels/workflowsets/actions/workflows/R-CMD-check-no-suggests.yaml/badge.svg)](https://github.com/tidymodels/workflowsets/actions/workflows/R-CMD-check-no-suggests.yaml)\n[![Codecov test coverage](https://codecov.io/gh/tidymodels/workflowsets/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/workflowsets?branch=main)\n[![Lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)\n\u003c!-- badges: end --\u003e\n\nThe goal of workflowsets is to allow users to create and easily fit a large number of models. workflowsets can create a _workflow set_ that holds multiple workflow objects. These objects can be created by crossing all combinations of preprocessors (e.g., formula, recipe, etc) and model specifications. This set can be tuned or resampled using a set of specific functions. \n\n\n\n## Installation\n\nYou can install the released version of workflowsets from [CRAN](https://CRAN.R-project.org) with:\n\n``` r\ninstall.packages(\"workflowsets\")\n```\n\nAnd the development version from [GitHub](https://github.com/) with:\n\n``` r\ninstall.packages(\"pak\")\npak::pak(\"tidymodels/workflowsets\")\n```\n\n## Example\n\nSometimes it is a good idea to try different types of models and preprocessing methods on a specific data set. The tidymodels framework provides tools for this purpose: [recipes](https://recipes.tidymodels.org/) for preprocessing/feature engineering and [parsnip model](https://parsnip.tidymodels.org/) specifications.  The workflowsets package has functions for creating and evaluating combinations of these modeling elements. \n\nFor example, the Chicago train ridership data has many numeric predictors that are highly correlated. There are a few approaches to compensating for this issue during modeling:\n\n 1. Use a feature filter to remove redundant predictors.\n \n 2. Apply principal component analysis to decorrelate the data. \n \n 3. Use a regularized model to make the estimation process insensitive to correlated predictors. \n \nThe first two methods can be used with any model while the last option is only available for specific models. Let's create a basic recipe that we will build on: \n\n\n```{r sshhh, include = FALSE}\nlibrary(tidymodels)\nlibrary(glmnet)\nlibrary(rpart)\nlibrary(vctrs)\nlibrary(Matrix)\nlibrary(rlang)\ntheme_set(theme_bw())\n```\n```{r recs}\nlibrary(tidymodels)\ndata(Chicago)\n# Use a small sample to keep file sizes down:\nChicago \u003c- Chicago |\u003e slice(1:365)\n\nbase_recipe \u003c-\n  recipe(ridership ~ ., data = Chicago) |\u003e\n  # create date features\n  step_date(date) |\u003e\n  step_holiday(date) |\u003e\n  # remove date from the list of predictors\n  update_role(date, new_role = \"id\") |\u003e\n  # create dummy variables from factor columns\n  step_dummy(all_nominal()) |\u003e\n  # remove any columns with a single unique value\n  step_zv(all_predictors()) |\u003e\n  step_normalize(all_predictors())\n```\n\nTo enact a correlation filter, an additional step is used: \n\n```{r filter}\nfilter_rec \u003c-\n  base_recipe |\u003e\n  step_corr(all_of(stations), threshold = tune())\n```\n\nSimilarly, for PCA: \n\n\n```{r pca}\npca_rec \u003c-\n  base_recipe |\u003e\n  step_pca(all_of(stations), num_comp = tune()) |\u003e\n  step_normalize(all_predictors())\n```\n\nWe might want to assess a few different models, including a regularized method (`glmnet`):\n\n```{r models}\nregularized_spec \u003c-\n  linear_reg(penalty = tune(), mixture = tune()) |\u003e\n  set_engine(\"glmnet\")\n\ncart_spec \u003c-\n  decision_tree(cost_complexity = tune(), min_n = tune()) |\u003e\n  set_engine(\"rpart\") |\u003e\n  set_mode(\"regression\")\n\nknn_spec \u003c-\n  nearest_neighbor(neighbors = tune(), weight_func = tune()) |\u003e\n  set_engine(\"kknn\") |\u003e\n  set_mode(\"regression\")\n```\n\nRather than creating all 9 combinations of these preprocessors and models, we can create a _workflow set_: \n\n```{r set}\nchi_models \u003c-\n  workflow_set(\n    preproc = list(\n      simple = base_recipe, filter = filter_rec,\n      pca = pca_rec\n    ),\n    models = list(\n      glmnet = regularized_spec, cart = cart_spec,\n      knn = knn_spec\n    ),\n    cross = TRUE\n  )\nchi_models\n```\nIt doesn't make sense to use PCA or a filter with a `glmnet` model. We can remove these easily: \n\n```{r rm}\nchi_models \u003c-\n  chi_models |\u003e\n  anti_join(tibble(wflow_id = c(\"pca_glmnet\", \"filter_glmnet\")),\n    by = \"wflow_id\"\n  )\n```\n\n\nThese models all have tuning parameters. To resolve these, we'll need a resampling set. In this case, a time-series resampling method is used: \n\n```{r rs}\nsplits \u003c-\n  sliding_period(\n    Chicago,\n    date,\n    \"day\",\n    lookback = 300, # Each resample has 300 days for modeling\n    assess_stop = 7, # One week for performance assessment\n    step = 7 # Ensure non-overlapping weeks for assessment\n  )\nsplits\n```\n\nWe'll use simple grid search for these models by running `workflow_map()`. This will execute a resampling or tuning function over the workflows in the `workflow` column: \n\n```{r tune}\nset.seed(123)\nchi_models \u003c-\n  chi_models |\u003e\n  # The first argument is a function name from the {{tune}} package\n  # such as `tune_grid()`, `fit_resamples()`, etc.\n  workflow_map(\"tune_grid\",\n    resamples = splits, grid = 10,\n    metrics = metric_set(mae), verbose = TRUE\n  )\nchi_models\n```\n\nThe `results` column contains the results of each call to `tune_grid()` for the workflows. \n\nThe `autoplot()` method shows the rankings of the workflows: \n\n```{r plot, fig.height = 4, dev = \"svg\"}\nautoplot(chi_models)\n```\n\nor the best from each workflow: \n\n\n```{r plot-best, fig.height = 4, dev = \"svg\"}\nautoplot(chi_models, select_best = TRUE)\n```\n\nWe can determine how well each combination did by looking at the best results per workflow: \n\n```{r best}\nrank_results(chi_models, rank_metric = \"mae\", select_best = TRUE) |\u003e\n  select(rank, mean, model, wflow_id, .config)\n```\n\n```{r save, eval = FALSE, echo = FALSE}\nsave(chi_models, file = \"data/chi_models.rda\", compress = \"bzip2\", version = 2)\n```\n\n\n## Contributing\n\nThis project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n\n- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://forum.posit.co/new-topic?category_id=15\u0026tags=tidymodels,question).\n\n- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/workflowsets/issues).\n\n- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.\n\n- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Fworkflowsets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftidymodels%2Fworkflowsets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Fworkflowsets/lists"}