{"id":17154242,"url":"https://github.com/jaydu1/ensemble-cross-validation","last_synced_at":"2026-01-21T08:12:14.555Z","repository":{"id":200071545,"uuid":"703793347","full_name":"jaydu1/ensemble-cross-validation","owner":"jaydu1","description":"Cross-validation methods designed for ensemble learning","archived":false,"fork":false,"pushed_at":"2025-02-08T14:16:41.000Z","size":549,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-25T11:50:39.381Z","etag":null,"topics":["cross-validation","ensemble-learning","model-selection","random-forests"],"latest_commit_sha":null,"homepage":"https://jaydu1.github.io/overparameterized-ensembling/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaydu1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-11T23:53:16.000Z","updated_at":"2025-02-08T14:16:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"6c5b1b88-9e3e-41e6-bf75-f59245e2e98d","html_url":"https://github.com/jaydu1/ensemble-cross-validation","commit_stats":null,"previous_names":["jaydu1/ensemble-cross-validation"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/jaydu1/ensemble-cross-validation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fensemble-cross-validation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fensemble-cross-validation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fensemble-cross-validation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fensemble-cross-validation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaydu1","download_url":"https://codeload.github.com/jaydu1/ensemble-cross-validation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaydu1%2Fensemble-cross-validation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28629922,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T04:47:28.174Z","status":"ssl_error","status_checked_at":"2026-01-21T04:47:22.943Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-validation","ensemble-learning","model-selection","random-forests"],"created_at":"2024-10-14T21:48:41.999Z","updated_at":"2026-01-21T08:12:14.540Z","avatar_url":"https://github.com/jaydu1.png","language":"HTML","readme":"[![Documentation Status](https://readthedocs.org/projects/sklearn-ensemble-cv/badge/?version=latest)](https://sklearn-ensemble-cv.readthedocs.io/en/latest/?badge=latest)\n[![PyPI](https://img.shields.io/pypi/v/sklearn_ensemble_cv?label=pypi)](https://pypi.org/project/sklearn-ensemble-cv)\n[![PyPI-Downloads](https://img.shields.io/pepy/dt/sklearn_ensemble_cv)](https://pepy.tech/project/sklearn_ensemble_cv)\n\n# Ensemble Cross Validation\n\n\n`sklearn_ensemble_cv` is a Python module for performing accurate and efficient ensemble cross-validation methods from various [projects](https://jaydu1.github.io/overparameterized-ensembling/).\n\n\n## Features\n- The module builds on `scikit-learn`/`sklearn` to provide the most flexibility on various base predictors.\n- The module includes functions for creating ensembles of models, training the ensembles using cross-validation, and making predictions with the ensembles. \n- The module also includes utilities for evaluating the performance of the ensembles and the individual models that make up the ensembles.\n\n\n```python\nfrom sklearn.tree import DecisionTreeRegressor\nfrom sklearn_ensemble_cv import ECV\n\n# Hyperparameters for the base regressor\ngrid_regr = {    \n    'max_depth':np.array([6,7], dtype=int), \n    }\n# Hyperparameters for the ensemble\ngrid_ensemble = {\n    'max_features':np.array([0.9,1.]),\n    'max_samples':np.array([0.6,0.7]),\n    'n_jobs':-1 # use all processors for fitting each ensemble\n}\n\n# Build 50 trees and get estimates until 100 trees\nres_ecv, info_ecv = ECV(\n    X_train, y_train, DecisionTreeRegressor, grid_regr, grid_ensemble, \n    M=50, M_max=100, return_df=True\n)\n```\n\nIt currently supports bagging- and subagging-type ensembles under square loss.\nThe hyperparameters of the base predictor are listed at [`sklearn.tree.DecisionTreeRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html) and the hyperparameters of the ensemble are listed at [`sklearn.ensemble.BaggingRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html).\nUsing other sklearn Regressors (`regr.is_regressor = True`) as base predictors is also supported.\n\n## Cross-validation methods\n\nThis project is currently in development. More CV methods will be added shortly.\n\n- [x] split CV\n- [x] K-fold CV\n- [x] ECV\n- [x] GCV\n- [x] CGCV\n- [x] CGCV non-square loss\n- [ ] ALOCV\n\n## Usage\n\nThe module can be installed via PyPI:\n```cmd\npip install sklearn-ensemble-cv\n```\n\nThe [document](https://sklearn-ensemble-cv.readthedocs.io/en/latest/?badge=latest) is available.\nCheck out Jupyter Notebook tutorials in the [document](https://sklearn-ensemble-cv.readthedocs.io/en/latest/?badge=latest):\n\nName | Description\n---|---\n[basics](https://sklearn-ensemble-cv.readthedocs.io/en/latest/tutorials/basics.html) | Basics about how to apply ECV/CGCV on risk estimation and hyperparameter tuning for ensemble learning.\n[gcv](https://sklearn-ensemble-cv.readthedocs.io/en/latest/tutorials/gcv.html) | CCV for tuning regularization parameters for non-ensemble ridge, lasso, and elastic net.\n[cgcv_l1_huber](https://sklearn-ensemble-cv.readthedocs.io/en/latest/tutorials/cgcv_l1_huber.html) | Custom CGCV for M-estimator: l1-regularized Huber ensembles.\n[multitask](https://sklearn-ensemble-cv.readthedocs.io/en/latest/tutorials/multitask.html) | Apply ECV on risk estimation and hyperparameter tuning for multi-task ensemble learning.\n[random_forests](https://sklearn-ensemble-cv.readthedocs.io/en/latest/tutorials/random_forests.html) | Apply ECV on model selection of random forests via a simple utility function.\n\nThe code is tested with `scikit-learn == 1.3.1`.\n\n\n\n\n\n## Citation\n\nIf you find this package useful for your research, please consider citing our research paper: \n\nMethod|Reference\n---|---\nECV|Du, J. H., Patil, P., Roeder, K., \u0026 Kuchibhotla, A. K. (2024). Extrapolated cross-validation for randomized ensembles. Journal of Computational and Graphical Statistics, 1-12.\nGCV|Du, J. H., Patil, P., \u0026 Kuchibhotla, A. K. (2023). Subsample ridge ensembles: equivalences and generalized cross-validation. In Proceedings of the 40th International Conference on Machine Learning (pp. 8585-8631).\u003cbr\u003ePatil, P., \u0026 Du, J. H. (2024). Generalized equivalences between subsampling and ridge regularization. Advances in Neural Information Processing Systems, 36.\nCGCV | Bellec, P. C., Du, J. H., Koriyama, T., Patil, P., \u0026 Tan, K. (2024). Corrected generalized cross-validation for finite ensembles of penalized estimators. Journal of the Royal Statistical Society Series B: Statistical Methodology, qkae092.\nCGCV (non-square loss)|Koriyama, T., Patil, P., Du, J. H., Tan, K., \u0026 Bellec, P. C. (2024). Precise asymptotics of bagging regularized M-estimators. arXiv preprint arXiv:2409.15252.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaydu1%2Fensemble-cross-validation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaydu1%2Fensemble-cross-validation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaydu1%2Fensemble-cross-validation/lists"}