https://github.com/ccao-data/lightsnip

Hard fork of curso-r/treesnip specifically for CCAO LightGBM regressions
https://github.com/ccao-data/lightsnip

lightgbm machine-learning r r-package

Last synced: 3 months ago
JSON representation

Hard fork of curso-r/treesnip specifically for CCAO LightGBM regressions

Host: GitHub
URL: https://github.com/ccao-data/lightsnip
Owner: ccao-data
License: agpl-3.0
Created: 2023-07-04T17:30:38.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2025-03-10T22:16:43.000Z (4 months ago)
Last Synced: 2025-04-12T14:11:28.372Z (3 months ago)
Topics: lightgbm, machine-learning, r, r-package
Language: R
Homepage: https://ccao-data.github.io/lightsnip/
Size: 21.9 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Citation: CITATION.cff
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# Lightsnip 

[![R-CMD-check](https://github.com/ccao-data/lightsnip/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/R-CMD-check.yaml)

[![test-coverage](https://github.com/ccao-data/lightsnip/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/test-coverage.yaml)

[![pre-commit](https://github.com/ccao-data/lightsnip/actions/workflows/pre-commit.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/pre-commit.yaml)

[![codecov](https://codecov.io/gh/ccao-data/lightsnip/branch/master/graph/badge.svg)](https://codecov.io/gh/ccao-data/lightsnip)

Lightsnip is a hard fork of [curso-r/treesnip](https://github.com/curso-r/treesnip). It adds LightGBM bindings for parsnip and enables more advanced LightGBM features, such as early stopping. It is not intended for general use, only as a dependency for CCAO regression models.

For detailed documentation on included functions, [**visit the full reference list**](https://ccao-data.github.io/lightsnip/reference/index.html).

## Installation

You can install the released version of `lightsnip` directly from GitHub with one of the following commands:

```{r, eval=FALSE}

# Using remotes

remotes::install_github("ccao-data/lightsnip")

# Using renv

renv::install("ccao-data/lightsnip")

# Using pak

pak::pak("ccao-data/lightsnip")

# Append the @ symbol for a specific version

remotes::install_github("ccao-data/[email protected]")

```

Once it is installed, you can use it just like any other package. Simply call `library(assessr)` at the beginning of your script.

## Differences compared to [treesnip](https://github.com/curso-r/treesnip)

- Removed support for `tree` and `catboost` (LightGBM only)

- Removed classification support for LightGBM (regression only)

- Removed treesnip caps and warnings on `max_depth`, other parameters

- Removed vignettes and samples

- Remap parameters to engine args instead of parsnip model args

- Added LightGBM-specific hyperparameter functions

- Added LightGBM-specific save/load helpers

- Added recipe/fit cleaning helpers

- Force user to specify categorical columns by name, does _not_ implicitly convert factors to categoricals

- Added early stopping from xgboost

- Added more unit tests

- Fixed a number of bugs

## Basic usage with Tidymodels

Here is a quick example using `lightsnip` with a Tidymodels cross-validation workflow: 

```{r message=FALSE, results='asis'}

library(dplyr)

library(lightgbm)

library(lightsnip)

library(parsnip)

library(recipes)

library(workflows)

# Create a dataset for training

mtcars_train <- mtcars %>%

  dplyr::slice(1:28) %>%

  sample_n(size = 500, replace = TRUE) %>%

  mutate(cyl = as.factor(cyl), vs = as.factor(vs))

# Create a test set

mtcars_test <- mtcars %>%

  dplyr::slice(29:32) %>%

  mutate(cyl = as.factor(cyl), vs = as.factor(vs))

# Recipe to convert factors to categorical integers

rec <- recipe(mpg ~ ., mtcars_train) %>%

  step_integer(all_nominal(), zero_based = TRUE)

# Split data into V-folds

resamples <- rsample::vfold_cv(mtcars_train, v = 2)

# Create a model specification. LightGBM-specific parameters are passed to

# set_engine, NOT to boost_tree

model <- parsnip::boost_tree(

  trees = tune::tune()

) %>%

  parsnip::set_engine(

    engine = "lightgbm",

    verbose = -1,

    learning_rate = tune::tune(),

    min_gain_to_split = tune::tune(),

    feature_fraction = tune::tune(),

    min_data_in_leaf = tune::tune(),

    max_depth = tune::tune()

  )

# Run grid search

search <- tune::tune_grid(

  parsnip::set_mode(model, "regression"),

  preprocessor = rec,

  resamples = resamples,

  param_info = model %>%

    hardhat::extract_parameter_set_dials() %>%

    stats::update(

      learning_rate = learning_rate(),

      min_gain_to_split = min_gain_to_split(),

      feature_fraction = feature_fraction(),

      min_data_in_leaf = min_data_in_leaf(c(1L, 2L)),

      max_depth = max_depth(c(3L, 6L))

    ),

  grid = 2,

  metrics = yardstick::metric_set(yardstick::rmse)

)

# Finalize model

final <- model %>%

  tune::finalize_model(tune::select_best(search)) %>%

  parsnip::set_mode("regression") %>%

  parsnip::fit(mpg ~ ., bake(prep(rec), mtcars_train))

# Predict on test set

mtcars_test %>%

  mutate(pred_mpg = predict(final, bake(prep(rec), .))$.pred) %>%

  select(actual_mpg = mpg, pred_mpg) %>%

  knitr::kable(digits = 2)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ccao-data/lightsnip

Awesome Lists containing this project

README