https://github.com/elephaint/pgbm

Probabilistic Gradient Boosting Machines
https://github.com/elephaint/pgbm

Last synced: 2 months ago
JSON representation

Probabilistic Gradient Boosting Machines

Host: GitHub
URL: https://github.com/elephaint/pgbm
Owner: elephaint
License: apache-2.0
Created: 2021-05-17T11:57:25.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-02-08T16:32:48.000Z (about 2 years ago)
Last Synced: 2025-09-25T15:51:30.763Z (7 months ago)
Language: Python
Size: 2.29 MB
Stars: 156
Watchers: 8
Forks: 22
Open Issues: 3
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
- License: LICENSE
- Support: docs/support.md

Awesome Lists containing this project

awesome-gradient-boosting-machines - PGBM - Probabilistic Gradient Boosting Machines with native GPU acceleration, auto-differentiation, and uncertainty estimates. Built on PyTorch/Numba. (Implementations / Other Frameworks)

README

          # PGBM  #

[![PyPi version](https://img.shields.io/pypi/v/pgbm)](https://pypi.org/project/pgbm/)

[![Python version](https://img.shields.io/pypi/pyversions/pgbm)](https://docs.conda.io/en/latest/miniconda.html)

[![GitHub license](https://img.shields.io/pypi/l/pgbm)](https://github.com/elephaint/pgbm/blob/main/LICENSE)

_Probabilistic Gradient Boosting Machines_ (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Airlab in Amsterdam. It provides the following advantages over existing frameworks:

* Probabilistic regression estimates instead of only point estimates. ([example](https://github.com/elephaint/pgbm/blob/main/examples/torch/example01_housing_cpu.py))

* Auto-differentiation of custom loss functions. ([example](https://github.com/elephaint/pgbm/blob/main/examples/torch/example08_housing_autodiff.py), [example](https://github.com/elephaint/pgbm/blob/main/examples/torch/example10_covidhospitaladmissions.py))

* Native GPU-acceleration. ([example](https://github.com/elephaint/pgbm/blob/main/examples/torch/example02_housing_gpu.py))

* Distributed training for CPU and GPU, across multiple nodes. ([examples](https://github.com/elephaint/pgbm/blob/main/examples/torch_dist/))

* Ability to optimize probabilistic estimates after training for a set of common distributions, without retraining the model. ([example](https://github.com/elephaint/pgbm/blob/main/examples/torch/example07_optimizeddistribution.py))

* Full integration with scikit-learn through a fork of HistGradientBoostingRegressor ([examples](https://github.com/elephaint/pgbm/tree/main/examples/sklearn))

It is aimed at users interested in solving large-scale tabular probabilistic regression problems, such as probabilistic time series forecasting. 

For more details, [read the docs](https://pgbm.readthedocs.io/en/latest/index.html) or [our paper](https://arxiv.org/abs/2106.01682) or check out the [examples](https://github.com/elephaint/pgbm/tree/main/examples).

Below a simple example to generate 1000 estimates for each of our test points:

```py

from pgbm.sklearn import HistGradientBoostingRegressor

from sklearn.model_selection import train_test_split

from sklearn.datasets import fetch_california_housing

X, y = fetch_california_housing(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

model = HistGradientBoostingRegressor().fit(X_train, y_train) 

yhat_test, yhat_test_std = model.predict(X_test, return_std=True)

yhat_dist = model.sample(yhat_test, yhat_test_std, n_estimates=1000)

```

See also [this example](https://github.com/elephaint/pgbm/blob/main/examples/sklearn/example14_probregression.py) where we compare PGBM to standard gradient boosting quantile regression methods, demonstrating that we can achieve comparable or better probabilistic performance whilst only training a single model.

### Installation ###

See [Installation](https://pgbm.readthedocs.io/en/latest/installation.html) section in our [docs](https://pgbm.readthedocs.io/en/latest/index.html).

### Support ###

In general, PGBM works similar to existing gradient boosting packages such as LightGBM or xgboost (and it should be possible to more or less use it as a drop-in replacement).

* Read the docs for an overview of [hyperparameters](https://pgbm.readthedocs.io/en/latest/parameters.html) and a [function reference](https://pgbm.readthedocs.io/en/latest/function_reference.html).

* See the [examples](https://github.com/elephaint/pgbm/tree/main/examples) folder for examples. 

In case further support is required, [open an issue](https://github.com/elephaint/pgbm/issues).

### Reference ###

[Olivier Sprangers](mailto:o.r.sprangers@uva.nl), Sebastian Schelter, Maarten de Rijke. [Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression](https://arxiv.org/abs/2106.01682). Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ([KDD 21](https://www.kdd.org/kdd2021/)), August 14–18, 2021, Virtual Event, Singapore.

The experiments from our paper can be replicated by running the scripts in the [experiments](https://github.com/elephaint/pgbm/tree/main/paper/experiments) folder. Datasets are downloaded when needed in the experiments except for higgs and m5, which should be pre-downloaded and saved to the [datasets](https://github.com/elephaint/pgbm/tree/main/paper/datasets) folder (Higgs) and to datasets/m5 (m5).

### License ###

This project is licensed under the terms of the [Apache 2.0 license](https://github.com/elephaint/pgbm/blob/main/LICENSE).

### Acknowledgements ###

This project was developed by [Airlab Amsterdam](https://icai.ai/airlab/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/elephaint/pgbm

Awesome Lists containing this project

README