{"id":20284063,"url":"https://github.com/ccao-data/lightsnip","last_synced_at":"2025-10-15T19:11:25.979Z","repository":{"id":178774559,"uuid":"662249385","full_name":"ccao-data/lightsnip","owner":"ccao-data","description":"Hard fork of curso-r/treesnip specifically for CCAO LightGBM regressions","archived":false,"fork":false,"pushed_at":"2025-10-10T22:19:04.000Z","size":22935,"stargazers_count":2,"open_issues_count":2,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-11T00:03:13.322Z","etag":null,"topics":["lightgbm","machine-learning","r","r-package"],"latest_commit_sha":null,"homepage":"https://ccao-data.github.io/lightsnip/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ccao-data.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-04T17:30:38.000Z","updated_at":"2025-10-10T22:19:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"9505037f-9acd-4590-af60-bab9a43e2ba4","html_url":"https://github.com/ccao-data/lightsnip","commit_stats":null,"previous_names":["ccao-data/lightsnip"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ccao-data/lightsnip","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Flightsnip","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Flightsnip/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Flightsnip/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Flightsnip/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ccao-data","download_url":"https://codeload.github.com/ccao-data/lightsnip/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ccao-data%2Flightsnip/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279104582,"owners_count":26104541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-15T02:00:07.814Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lightgbm","machine-learning","r","r-package"],"created_at":"2024-11-14T14:18:12.473Z","updated_at":"2025-10-15T19:11:25.952Z","avatar_url":"https://github.com/ccao-data.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# Lightsnip \u003ca href='https://github.com/ccao-data/lightsnip'\u003e\u003cimg src='man/figures/logo.png' align=\"right\" height=\"139\" /\u003e\u003c/a\u003e\n\n[![R-CMD-check](https://github.com/ccao-data/lightsnip/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/R-CMD-check.yaml)\n[![test-coverage](https://github.com/ccao-data/lightsnip/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/test-coverage.yaml)\n[![pre-commit](https://github.com/ccao-data/lightsnip/actions/workflows/pre-commit.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/pre-commit.yaml)\n[![codecov](https://codecov.io/gh/ccao-data/lightsnip/branch/master/graph/badge.svg)](https://codecov.io/gh/ccao-data/lightsnip)\n\nLightsnip is a hard fork of [curso-r/treesnip](https://github.com/curso-r/treesnip). It adds LightGBM bindings for parsnip and enables more advanced LightGBM features, such as early stopping. It is not intended for general use, only as a dependency for CCAO regression models.\n\nFor detailed documentation on included functions, [**visit the full reference list**](https://ccao-data.github.io/lightsnip/reference/index.html).\n\n## Installation\n\nYou can install the released version of `lightsnip` directly from GitHub with one of the following commands:\n\n```{r, eval=FALSE}\n# Using remotes\nremotes::install_github(\"ccao-data/lightsnip\")\n\n# Using renv\nrenv::install(\"ccao-data/lightsnip\")\n\n# Using pak\npak::pak(\"ccao-data/lightsnip\")\n\n# Append the @ symbol for a specific version\nremotes::install_github(\"ccao-data/lightsnip@0.0.5\")\n```\n\nOnce it is installed, you can use it just like any other package. Simply call `library(assessr)` at the beginning of your script.\n\n## Differences compared to [treesnip](https://github.com/curso-r/treesnip)\n\n- Removed support for `tree` and `catboost` (LightGBM only)\n- Removed classification support for LightGBM (regression only)\n- Removed treesnip caps and warnings on `max_depth`, other parameters\n- Removed vignettes and samples\n- Remap parameters to engine args instead of parsnip model args\n- Added LightGBM-specific hyperparameter functions\n- Added LightGBM-specific save/load helpers\n- Added recipe/fit cleaning helpers\n- Force user to specify categorical columns by name, does _not_ implicitly convert factors to categoricals\n- Added early stopping from xgboost\n- Added more unit tests\n- Fixed a number of bugs\n\n## Basic usage with Tidymodels\n\nHere is a quick example using `lightsnip` with a Tidymodels cross-validation workflow: \n\n```{r message=FALSE, results='asis'}\nlibrary(dplyr)\nlibrary(lightgbm)\nlibrary(lightsnip)\nlibrary(parsnip)\nlibrary(recipes)\nlibrary(workflows)\n\n# Create a dataset for training\nmtcars_train \u003c- mtcars %\u003e%\n  dplyr::slice(1:28) %\u003e%\n  sample_n(size = 500, replace = TRUE) %\u003e%\n  mutate(cyl = as.factor(cyl), vs = as.factor(vs))\n\n# Create a test set\nmtcars_test \u003c- mtcars %\u003e%\n  dplyr::slice(29:32) %\u003e%\n  mutate(cyl = as.factor(cyl), vs = as.factor(vs))\n\n# Recipe to convert factors to categorical integers\nrec \u003c- recipe(mpg ~ ., mtcars_train) %\u003e%\n  step_integer(all_nominal(), zero_based = TRUE)\n\n# Split data into V-folds\nresamples \u003c- rsample::vfold_cv(mtcars_train, v = 2)\n\n# Create a model specification. LightGBM-specific parameters are passed to\n# set_engine, NOT to boost_tree\nmodel \u003c- parsnip::boost_tree(\n  trees = tune::tune()\n) %\u003e%\n  parsnip::set_engine(\n    engine = \"lightgbm\",\n    verbose = -1,\n    learning_rate = tune::tune(),\n    min_gain_to_split = tune::tune(),\n    feature_fraction = tune::tune(),\n    min_data_in_leaf = tune::tune(),\n    max_depth = tune::tune()\n  )\n\n# Run grid search\nsearch \u003c- tune::tune_grid(\n  parsnip::set_mode(model, \"regression\"),\n  preprocessor = rec,\n  resamples = resamples,\n  param_info = model %\u003e%\n    hardhat::extract_parameter_set_dials() %\u003e%\n    stats::update(\n      learning_rate = learning_rate(),\n      min_gain_to_split = min_gain_to_split(),\n      feature_fraction = feature_fraction(),\n      min_data_in_leaf = min_data_in_leaf(c(1L, 2L)),\n      max_depth = max_depth(c(3L, 6L))\n    ),\n  grid = 2,\n  metrics = yardstick::metric_set(yardstick::rmse)\n)\n\n# Finalize model\nfinal \u003c- model %\u003e%\n  tune::finalize_model(tune::select_best(search)) %\u003e%\n  parsnip::set_mode(\"regression\") %\u003e%\n  parsnip::fit(mpg ~ ., bake(prep(rec), mtcars_train))\n\n# Predict on test set\nmtcars_test %\u003e%\n  mutate(pred_mpg = predict(final, bake(prep(rec), .))$.pred) %\u003e%\n  select(actual_mpg = mpg, pred_mpg) %\u003e%\n  knitr::kable(digits = 2)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fccao-data%2Flightsnip","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fccao-data%2Flightsnip","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fccao-data%2Flightsnip/lists"}