{"id":13857603,"url":"https://github.com/tidymodels/butcher","last_synced_at":"2025-04-08T12:05:28.991Z","repository":{"id":34937748,"uuid":"190640087","full_name":"tidymodels/butcher","owner":"tidymodels","description":"Reduce the size of model objects saved to disk","archived":false,"fork":false,"pushed_at":"2025-03-19T00:49:12.000Z","size":24778,"stargazers_count":135,"open_issues_count":7,"forks_count":13,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-01T10:17:51.732Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://butcher.tidymodels.org/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tidymodels.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-06T19:45:18.000Z","updated_at":"2025-03-31T06:26:00.000Z","dependencies_parsed_at":"2024-06-21T05:48:14.648Z","dependency_job_id":"c824aad3-25f4-46f3-aafe-5c8c89343e9b","html_url":"https://github.com/tidymodels/butcher","commit_stats":{"total_commits":680,"total_committers":13,"mean_commits":52.30769230769231,"dds":"0.23088235294117643","last_synced_commit":"8315ed8818666307c4497fd3d988c82ef82d8e51"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fbutcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fbutcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fbutcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Fbutcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tidymodels","download_url":"https://codeload.github.com/tidymodels/butcher/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247838441,"owners_count":21004580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:41.727Z","updated_at":"2025-04-08T12:05:28.956Z","avatar_url":"https://github.com/tidymodels.png","language":"R","readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n\n# `devtools::build_readme()` evaluates in `callr:r_safe()` which causes issues\n# with butcher's memory profiling. Neither RStudio's Knit button nor\n# `rmarkdown::render()` have this issue (#280).\nif (identical(Sys.getenv(\"CALLR_IS_RUNNING\"), \"true\")) {\n  rlang::abort(c(\n    \"Build this README with `rmarkdown::render()` rather than `devtools::build_readme()`.\",\n    \"See tidymodels/butcher#280 for more info.\"\n  ))\n}\n```\n\n# butcher \u003ca href=\"https://butcher.tidymodels.org\"\u003e\u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"138\" alt=\"butcher website\" /\u003e\u003c/a\u003e\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/tidymodels/butcher/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/butcher/actions/workflows/R-CMD-check.yaml)\n[![CRAN status](https://www.r-pkg.org/badges/version/butcher)](https://CRAN.R-project.org/package=butcher)\n[![Codecov test coverage](https://codecov.io/gh/tidymodels/butcher/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/butcher?branch=main)\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n\u003c!-- badges: end --\u003e\n\n## Overview\n\nModeling or machine learning in R can result in fitted model objects that take up too much memory. There are two main culprits:\n\n1. Heavy usage of formulas and closures that capture the enclosing environment in model training\n2. Lack of selectivity in the construction of the model object itself\n\nAs a result, fitted model objects contain components that are often redundant and not required for post-fit estimation activities. The butcher package provides tooling to \"axe\" parts of the fitted output that are no longer needed, without sacrificing prediction functionality from the original model object.\n\n## Installation\n\nInstall the released version from CRAN:\n\n```{r, eval = FALSE}\ninstall.packages(\"butcher\")\n```\n\nOr install the development version from [GitHub](https://github.com/):\n\n```{r, eval = FALSE}\n# install.packages(\"pak\")\npak::pak(\"tidymodels/butcher\")\n```\n\n## Butchering\n\nAs an example, let's wrap an `lm` model so it contains a lot of unnecessary stuff:\n\n```{r example}\nlibrary(butcher)\nour_model \u003c- function() {\n  some_junk_in_the_environment \u003c- runif(1e6) # we didn't know about\n  lm(mpg ~ ., data = mtcars)\n}\n```\n\nThis object is unnecessarily large:\n\n```{r}\nlibrary(lobstr)\nobj_size(our_model())\n```\n\nWhen, in fact, it should only be:\n\n```{r}\nsmall_lm \u003c- lm(mpg ~ ., data = mtcars)\nobj_size(small_lm)\n```\n\nTo understand which part of our original model object is taking up the most memory, we leverage the `weigh()` function:\n\n```{r}\nbig_lm \u003c- our_model()\nweigh(big_lm)\n```\n\nThe problem here is in the `terms` component of our `big_lm`. Because of how `lm()` is implemented in the `stats` package, the environment in which our model was made is carried along in the fitted output. To remove the (mostly) extraneous component, we can use `butcher()`:\n\n```{r}\ncleaned_lm \u003c- butcher(big_lm, verbose = TRUE)\n```\n\nComparing it against our `small_lm`, we find:\n\n```{r}\nweigh(cleaned_lm)\n```\n\nAnd now it will take up about the same memory on disk as `small_lm`:\n\n```{r}\nweigh(small_lm)\n```\n\nTo make the most of your memory available, this package provides five S3 generics for you to remove parts of a model object:\n\n- `axe_call()`: To remove the call object.\n- `axe_ctrl()`: To remove controls associated with training.\n- `axe_data()`: To remove the original training data.\n- `axe_env()`: To remove environments.\n- `axe_fitted()`: To remove fitted values.\n\nWhen you run `butcher()`, you execute all of these axing functions at once. Any kind of axing on the object will append a butchered class to the current model object class(es) as well as a new attribute named `butcher_disabled` that lists any post-fit estimation functions that are disabled as a result.\n\n## Model Object Coverage\n\nCheck out the `vignette(\"available-axe-methods\")` to see butcher's current coverage. If you are working with a new model object that could benefit from any kind of axing, we would love for you to make a pull request! You can visit the `vignette(\"adding-models-to-butcher\")` for more guidelines, but in short, to contribute a set of axe methods:\n\n1. Run `new_model_butcher(model_class = \"your_object\", package_name = \"your_package\")`\n2. Use butcher helper functions `weigh()` and `locate()` to decide what to axe\n3. Finalize edits to `R/your_object.R` and `tests/testthat/test-your_object.R`\n4. Make a pull request!\n\n## Contributing\n\nThis project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n\n- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://forum.posit.co/new-topic?category_id=15\u0026tags=tidymodels,question).\n\n- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/butcher/issues).\n\n- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.\n\n- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).\n","funding_links":[],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Fbutcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftidymodels%2Fbutcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Fbutcher/lists"}