Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dmolitor/bolasso
Model consistent Lasso estimation through the bootstrap.
https://github.com/dmolitor/bolasso
bolasso bootstrap lasso rstats variable-selection
Last synced: about 2 months ago
JSON representation
Model consistent Lasso estimation through the bootstrap.
- Host: GitHub
- URL: https://github.com/dmolitor/bolasso
- Owner: dmolitor
- License: other
- Created: 2021-09-17T17:51:40.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-07-16T19:02:55.000Z (5 months ago)
- Last Synced: 2024-11-01T22:50:31.605Z (about 2 months ago)
- Topics: bolasso, bootstrap, lasso, rstats, variable-selection
- Language: R
- Homepage: https://dmolitor.github.io/bolasso/
- Size: 1.76 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r message=FALSE, warning=FALSE, paged.print=TRUE, echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "75%"
)set.seed(123) # Reproducible results
```[![R-CMD-check](https://github.com/dmolitor/bolasso/workflows/R-CMD-check/badge.svg)](https://github.com/dmolitor/bolasso/actions)
[![pkgdown](https://github.com/dmolitor/bolasso/workflows/pkgdown/badge.svg)](https://github.com/dmolitor/bolasso/actions)
[![Codecov test coverage](https://codecov.io/gh/dmolitor/bolasso/branch/main/graph/badge.svg)](https://app.codecov.io/gh/dmolitor/bolasso?branch=main)
[![CRAN status](https://www.r-pkg.org/badges/version/bolasso)](https://CRAN.R-project.org/package=bolasso)The goal of bolasso is to implement model-consistent Lasso estimation via the
bootstrap [[1]](#1).## Installation
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("dmolitor/bolasso")
```
## UsageTo illustrate the usage of bolasso, we'll use the
[Pima Indians Diabetes dataset](http://math.furman.edu/~dcs/courses/math47/R/library/mlbench/html/PimaIndiansDiabetes.html)
to determine which factors are important predictors of testing positive
for diabetes. For a full description of the input variables, see the link above.### Load requisite packages and data
```{r echo=TRUE, message=FALSE, warning=FALSE}
library(bolasso)data(PimaIndiansDiabetes, package = "mlbench")
# Quick overview of the dataset
str(PimaIndiansDiabetes)
```First, we run 100-fold bootstrapped Lasso with the `glmnet` implementation. We
can get a rough estimate of the elapsed time using `system.time()`.```{r}
system.time({
model <- bolasso(
diabetes ~ .,
data = PimaIndiansDiabetes,
n.boot = 100,
implement = "glmnet",
family = "binomial"
)
})
```We can get a quick overview of the model by printing the `bolasso` object.
```{r}
model
```### Extracting selected variables
Next, we can extract all variables that were selected in 90% and 100% of the
bootstrapped Lasso models. We can also pass any relevant arguments to `predict`
on the `cv.glmnet` or `cv.gamlr` model objects. In this case we will use the
lambda value that minimizes OOS error.```{r}
selected_vars(model, threshold = 0.9, select = "lambda.min")selected_vars(model, threshold = 1, select = "lambda.min")
```### Plotting selected variables
We can also quickly plot the selected variables at the 90% and 100% threshold
values.```{r}
plot(model, threshold = 0.9)plot(model, threshold = 1)
```### Parallelizing bolasso
We can execute `bolasso` in parallel via the
[future](https://CRAN.R-project.org/package=future) package. To
do so we can copy the code from above with only one minor tweak shown below.```{r}
future::plan("multisession")
``````{r include=FALSE}
# Include a warm-start, otherwise parallel will be slow first time around
future.apply::future_lapply(1:100, function(i) i)
```We can now run the code from above, unaltered, and it will execute in parallel.
```{r}
system.time({
model <- bolasso(
diabetes ~ .,
data = PimaIndiansDiabetes,
n.boot = 100,
implement = "glmnet",
family = "binomial"
)
})
```## References
[1] Bach, Francis. “Bolasso: Model Consistent Lasso Estimation
through the Bootstrap.” ArXiv:0804.1302 [Cs, Math, Stat], April 8, 2008.
https://arxiv.org/abs/0804.1302.