https://github.com/erdogant/hgboost
hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.
https://github.com/erdogant/hgboost
catboost crossvalidation gridsearch hyperoptimization lightboost machine-learning python xgboost
Last synced: 2 months ago
JSON representation
hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.
- Host: GitHub
- URL: https://github.com/erdogant/hgboost
- Owner: erdogant
- License: other
- Created: 2020-04-19T14:48:23.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2026-04-06T09:42:37.000Z (2 months ago)
- Last Synced: 2026-04-06T11:08:58.251Z (2 months ago)
- Topics: catboost, crossvalidation, gridsearch, hyperoptimization, lightboost, machine-learning, python, xgboost
- Language: Python
- Homepage: http://erdogant.github.io/hgboost
- Size: 24.3 MB
- Stars: 66
- Watchers: 2
- Forks: 18
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# hgboost - Hyperoptimized Gradient Boosting
[](https://img.shields.io/pypi/pyversions/hgboost)
[](https://pypi.org/project/hgboost/)
[](https://github.com/erdogant/hgboost/blob/master/LICENSE)
[](https://github.com/erdogant/hgboost/network)
[](https://github.com/erdogant/hgboost/issues)
[](http://www.repostatus.org/#active)
[](https://pepy.tech/project/hgboost/month)
[](https://pepy.tech/project/hgboost)
[](https://zenodo.org/badge/latestdoi/257025146)
[](https://erdogant.github.io/hgboost/)
[](https://erdogant.github.io/hgboost/pages/html/Documentation.html#colab-classification-notebook)
[](https://erdogant.github.io/hgboost/pages/html/Documentation.html#medium-blog)
--------------------------------------------------------------------
``hgboost`` is short for **Hyperoptimized Gradient Boosting** and is a python package for hyperparameter optimization for *xgboost*, *catboost* and *lightboost* using cross-validation, and evaluating the results on an independent validation set.
``hgboost`` can be applied for classification and regression tasks.
``hgboost`` is fun because:
* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.
--------------------------------------------------------------------
**⭐️ Star this repo if you like it ⭐️**
--------------------------------------------------------------------
### Blogs
Medium Blog 1:
[The Best Boosting Model using Bayesian Hyperparameter Tuning but without Overfitting.](https://erdogant.github.io/hgboost/pages/html/Documentation.html#medium-blog)
Medium Blog 2:
[Create Explainable Gradient Boosting Classification models using Bayesian Hyperparameter Optimization.](https://erdogant.github.io/hgboost/pages/html/Documentation.html#medium-blog)
--------------------------------------------------------------------
### [Documentation pages](https://erdogant.github.io/hgboost/)
On the [documentation pages](https://erdogant.github.io/hgboost/) you can find detailed information about the working of the ``hgboost`` with many examples.
--------------------------------------------------------------------
## Colab Notebooks
--------------------------------------------------------------------
### Schematic overview of hgboost
### Installation Environment
```python
conda create -n env_hgboost python=3.8
conda activate env_hgboost
```
### Install from pypi
```bash
pip install hgboost
pip install -U hgboost # Force update
```
#### Import hgboost package
```python
import hgboost as hgboost
```
#### Examples
* [Example: Fit catboost by hyperoptimization and cross-validation](https://erdogant.github.io/hgboost/pages/html/Examples.html#catboost)
#
* [Example: Fit lightboost by hyperoptimization and cross-validation](https://erdogant.github.io/hgboost/pages/html/Examples.html#lightboost)
#
* [Example: Fit xgboost by hyperoptimization and cross-validation](https://erdogant.github.io/hgboost/pages/html/Examples.html#xgboost-two-class)
#
* [Example: Plot searched parameter space](https://erdogant.github.io/hgboost/pages/html/Examples.html#plot-params)
#
* [Example: plot summary](https://erdogant.github.io/hgboost/pages/html/Examples.html#plot-summary)
#
* [Example: Tree plot](https://erdogant.github.io/hgboost/pages/html/Examples.html#treeplot)
#
* [Example: Plot the validation results](https://erdogant.github.io/hgboost/pages/html/Examples.html#plot-validation)
#
* [Example: Plot the cross-validation results](https://erdogant.github.io/hgboost/pages/html/Examples.html#plot-cv)
#
* [Example: use the learned model to make new predictions](https://erdogant.github.io/hgboost/pages/html/hgboost.hgboost.html?highlight=predict#hgboost.hgboost.hgboost.predict)
#
* [Example: Create ensemble model for Classification](https://erdogant.github.io/hgboost/pages/html/Examples.html#ensemble-classification)
#
* [Example: Create ensemble model for Regression](https://erdogant.github.io/hgboost/pages/html/Examples.html#ensemble-regression)
#
#### Classification example for xgboost, catboost and lightboost:
```python
# Load library
from hgboost import hgboost
# Initialization
hgb = hgboost(max_eval=10, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=42)
# Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')
# [hgboost] >Start hgboost classification..
# [hgboost] >Collecting xgb_clf parameters.
# [hgboost] >Number of variables in search space is [11], loss function: [auc].
# [hgboost] >method: xgb_clf
# [hgboost] >eval_metric: auc
# [hgboost] >greater_is_better: True
# [hgboost] >pos_label: True
# [hgboost] >Total dataset: (891, 204)
# [hgboost] >Hyperparameter optimization..
# 100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]
# [hgboost] >Best performing [xgb_clf] model: auc=0.881198
# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
# 100%|██████████| 10/10 [00:42<00:00, 4.27s/it]
# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
# [hgboost] >[auc] on independent validation dataset: -0.832
# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
```
```python
# Plot the ensemble classification validation results
hgb.plot_validation()
```
**References**
* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost
**Maintainers**
* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
**Contribute**
* Contributions are welcome.
**Licence**
See [LICENSE](LICENSE) for details.
**Coffee**
* If you wish to buy me a Coffee for this work, it is very appreciated :)