Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jaydu1/ensemble-cross-validation
Cross-validation methods designed for ensemble learning
https://github.com/jaydu1/ensemble-cross-validation
cross-validation ensemble-learning
Last synced: about 1 month ago
JSON representation
Cross-validation methods designed for ensemble learning
- Host: GitHub
- URL: https://github.com/jaydu1/ensemble-cross-validation
- Owner: jaydu1
- License: mit
- Created: 2023-10-11T23:53:16.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-12T22:59:06.000Z (2 months ago)
- Last Synced: 2024-10-13T00:10:59.459Z (about 1 month ago)
- Topics: cross-validation, ensemble-learning
- Language: Jupyter Notebook
- Homepage: https://jaydu1.github.io/overparameterized-ensembling/
- Size: 646 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![PyPI](https://img.shields.io/pypi/v/sklearn_ensemble_cv?label=pypi)](https://pypi.org/project/sklearn-ensemble-cv)
[![PyPI-Downloads](https://img.shields.io/pepy/dt/sklearn_ensemble_cv)](https://pepy.tech/project/sklearn_ensemble_cv)# Ensemble-cross-validation
`sklearn_ensemble_cv` is a Python module for performing accurate and efficient ensemble cross-validation methods from various [projects](https://jaydu1.github.io/overparameterized-ensembling/).
## Features
- The module builds on `scikit-learn`/`sklearn` to provide the most flexibility on various base predictors.
- The module includes functions for creating ensembles of models, training the ensembles using cross-validation, and making predictions with the ensembles.
- The module also includes utilities for evaluating the performance of the ensembles and the individual models that make up the ensembles.```python
from sklearn.tree import DecisionTreeRegressor
from sklearn_ensemble_cv import ECV# Hyperparameters for the base regressor
grid_regr = {
'max_depth':np.array([6,7], dtype=int),
}
# Hyperparameters for the ensemble
grid_ensemble = {
'max_features':np.array([0.9,1.]),
'max_samples':np.array([0.6,0.7]),
'n_jobs':-1 # use all processors for fitting each ensemble
}# Build 50 trees and get estimates until 100 trees
res_ecv, info_ecv = ECV(
X_train, y_train, DecisionTreeRegressor, grid_regr, grid_ensemble,
M=50, M_max=100, return_df=True
)
```It currently supports bagging- and subagging-type ensembles under square loss.
The hyperparameters of the base predictor are listed at [`sklearn.tree.DecisionTreeRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html) and the hyperparameters of the ensemble are listed at [`sklearn.ensemble.BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html).
Using other sklearn Regressors (`regr.is_regressor = True`) as base predictors is also supported.# Cross-validation methods
This project is currently in development. More CV methods will be added shortly.
- [x] split CV
- [x] K-fold CV
- [x] ECV
- [x] GCV
- [x] CGCV
- [x] CGCV non-square loss
- [ ] ALOCV# Usage
Check out Jupyter Notebooks in the [tutorials](https://github.com/jaydu1/ensemble-cross-validation/blob/main/tutorials) folder:
Name | Description
---|---
[basics.ipynb](https://github.com/jaydu1/ensemble-cross-validation/blob/main/tutorials/basics.ipynb) | Basics about how to apply ECV/CGCV on risk estimation and hyperparameter tuning for ensemble learning.
[cgcv_l1_huber.ipynb](https://github.com/jaydu1/ensemble-cross-validation/blob/main/tutorials/cgcv_l1_huber.ipynb) | Custom CGCV for M-estimator: l1-regularized Huber ensembles.
[multitask.ipynb](https://github.com/jaydu1/ensemble-cross-validation/blob/main/tutorials/multitask.ipynb) | Apply ECV on risk estimation and hyperparameter tuning for multi-task ensemble learning.
[random_forests.ipynb](https://github.com/jaydu1/ensemble-cross-validation/blob/main/tutorials/random_forests.ipynb) | Apply ECV on model selection of random forests via a simple utility function.The code is tested with `scikit-learn == 1.3.1`.
The [document](https://jaydu1.github.io/overparameterized-ensembling/sklearn-ensemble-cv/docs/index) is available.
The module can be installed via PyPI:
```cmd
pip install sklearn-ensemble-cv
```