awesome-gradient-boosting-machines
A comprehensive collection of gradient boosting libraries, cutting-edge research, practical tutorials, and industry resources—your one-stop guide to mastering GBMs.
https://github.com/jxucoder/awesome-gradient-boosting-machines
Last synced: 11 days ago
JSON representation
-
Related Awesome Lists
-
Comparison Studies
- Awesome Machine Learning - General ML resources
- Awesome AutoML - AutoML papers and resources
- Awesome Decision Tree Research - Decision tree papers
- Awesome Gradient Boosting Papers - Academic papers on gradient boosting
- Awesome Gradient Boosting - Resources for data scientists
- Awesome Tabular Deep Learning - Comprehensive survey on tabular representation learning (specialized, transferable, and foundation models)
- Awesome Tabular LLMs - Large language models for tabular data understanding and reasoning
-
-
Implementations
-
Other Frameworks
- TensorFlow Decision Forests - Decision forest algorithms including gradient boosted trees in TensorFlow.
- Scikit-learn HistGradientBoosting - Fast histogram-based GBM inspired by LightGBM, native to scikit-learn.
- InterpretML / EBM - Microsoft's Explainable Boosting Machine - a glass-box model as accurate as black-box GBMs but fully interpretable. ⭐ 6k+
- RAPIDS cuML - GPU machine learning library including XGBoost/gradient boosting support.
- FLAML - Microsoft's Fast and Lightweight AutoML library with efficient GBM hyperparameter tuning. ⭐ 4k+
- GBDT-PL - Gradient Boosting with Piece-Wise Linear Regression Trees. Accelerates convergence and optimized for SIMD parallelism. (Now available in LightGBM via `linear_tree=true`)
- AutoGluon-Tabular - Amazon's AutoML that ensembles multiple GBMs (XGBoost, LightGBM, CatBoost) for state-of-the-art tabular performance. ⭐ 8k+
- ThunderGBM - GPU-accelerated gradient boosting decision tree library.
- NGBoost - Natural gradient boosting for probabilistic prediction by Stanford ML Group.
- AutoXGB - XGBoost + Optuna: auto train, tune, and serve XGBoost models directly from CSV. By Kaggle Grandmaster Abhishek Thakur. ⭐ 725+
- Perpetual - Hyperparameter-free gradient boosting that self-generalizes. Just set a `budget` parameter instead of tuning hyperparameters. Written in Rust with Python bindings.
- PyGBM - Experimental gradient boosting machines in Python (archived).
- GBNet - Integrates XGBoost/LightGBM with PyTorch for auto-differentiation of custom loss functions and hybrid neural network + GBM models. [[JOSS Paper](https://joss.theoj.org/papers/10.21105/joss.08047)]
- XGBoost-Distribution - Probabilistic prediction with XGBoost via MLE. Like NGBoost but ~15x faster, with full XGBoost features (monotonic constraints, GPU). ⭐ 120+
- XGBoostLSS - Distributional gradient boosting extending XGBoost to model all parameters of 100+ distributions (location, scale, shape). Supports normalizing flows, mixture densities, and zero-inflated distributions. [[Paper](https://arxiv.org/abs/2304.03271)]
- LightGBMLSS - Distributional gradient boosting extending LightGBM. Model full conditional distributions with 50+ univariate distributions, normalizing flows, and mixture densities.
- gamboostLSS - R package for boosting GAMLSS (Generalized Additive Models for Location, Scale, Shape). High-dimensional distributional regression with gradient boosting. [[JSS Paper](https://www.jstatsoft.org/article/view/v074i01)] ⭐ 70+
- PGBM - Probabilistic Gradient Boosting Machines with native GPU acceleration, auto-differentiation, and uncertainty estimates. Built on PyTorch/Numba.
- GPBoost - Combines gradient boosting with Gaussian process and mixed effects models for spatial/grouped data. Handles random effects and spatial correlations. [[JMLR Paper](https://jmlr.org/papers/v23/20-322.html)] ⭐ 600+
- SGTB - Structured Gradient Tree Boosting for collective entity disambiguation by Bloomberg.
- AugBoost - Gradient boosting with step-wise feature augmentation using neural networks. Creates new features during boosting iterations. [[IJCAI 2019 Paper](https://www.ijcai.org/proceedings/2019/0493.pdf)]
- GrowNet - Gradient boosting with shallow neural networks as weak learners. Corrective step + dynamic learning rate. [[Paper](https://arxiv.org/abs/2002.07971)]
- GRANDE - End-to-end gradient-based training of decision tree ensembles; differentiable soft trees. [[Paper](https://openreview.net/forum?id=XEFWBxi075)]
- H2O GBM - H2O's gradient boosting machine implementation.
- SnapML - IBM's library for training generalized linear models and gradient boosting machines.
-
LightGBM
- LightGBM - A fast, distributed, high-performance gradient boosting framework by Microsoft. ⭐ 17k+
-
CatBoost
- CatBoost - A fast, scalable, high-performance gradient boosting on decision trees library by Yandex. ⭐ 8k+
- CatBoost Tutorials - Official tutorials and examples.
-
XGBoost
- XGBoost - Scalable, portable, and distributed gradient boosting library. ⭐ 26k+
- XGBoost4J-Spark - XGBoost integration with Apache Spark.
-
-
Tutorials & Guides
-
Parameter Tuning
- Hyperparameter Tuning with Optuna - Automatic hyperparameter optimization framework
- LightGBM Parameters Tuning - Official Tuning Guide
- XGBoost Parameters Guide - Official Parameters Documentation
- CatBoost Training Parameters - Official Parameters Reference
-
Getting Started
- A Gentle Introduction to XGBoost - Machine Learning Mastery
- Introduction to Gradient Boosting - Machine Learning Mastery
- Complete Guide to LightGBM - Official Quick Start
- CatBoost Tutorial - Official Tutorial
-
Interpretability & Explainability
- SHAP - SHapley Additive exPlanations for interpreting GBM predictions. The gold standard for feature importance. ⭐ 23k+
- dtreeviz - Beautiful decision tree visualization and model interpretation. Supports XGBoost, LightGBM, sklearn, PySpark, TensorFlow DF. Now with AI-powered explanations! ⭐ 3.1k+
- ELI5 - Debug ML classifiers and explain predictions. Supports XGBoost, LightGBM, CatBoost, sklearn. Includes LIME for text and permutation importance. ⭐ 2.8k+
- PyCEbox - Individual Conditional Expectation (ICE) plots for peeking inside black-box models. Shows how predictions vary per feature for individual instances, complementing PDPs. [[Paper](https://arxiv.org/abs/1309.6392)] ⭐ 164+
- InterpretML EBM Tutorial - Glass-box interpretable boosting
- Permutation Importance - Model-agnostic feature importance
-
Advanced Topics
- Custom Loss Functions in XGBoost - Custom Objective and Evaluation
- Distributed Training with XGBoost - XGBoost with Dask
- GPU Training with LightGBM - GPU Acceleration Guide
- Handling Missing Values - CatBoost Missing Values
-
-
2026 Conferences
-
Comparison Studies
-
-
Research Papers
-
2020
-
2018
-
2016
-
2021
-
2019
-
2017
-
2015
-
Foundational Papers
-
2023
-
2022
-
2024
-
2025
-
-
Books
-
Blog Posts
-
Practical Applications
- Winning Kaggle Competitions with Gradient Boosting - Kaggle Tutorial
- Feature Engineering for Gradient Boosting - Real Competition Insights
-
Comparisons & Benchmarks
- When to Use Different Gradient Boosting Libraries - Neptune.ai Blog
-
Deep Dives
- How Gradient Boosting Works - Visual and Mathematical Explanation
- Understanding LightGBM's Histogram-based Algorithm - Technical Deep Dive
-
-
Workshops
-
Comparison Studies
- Table Representation Learning Workshop @ NeurIPS 2024 - Latest advances in tabular ML
- Table Representation Learning Workshop @ NeurIPS 2023
- Table Representation Learning Workshop @ ACL 2025 - 4th TRL workshop (July 31, 2025, Vienna, Austria)
- Foundation Models for Structured Data @ ICML 2025 - TabPFN, TabICL, TabForestPFN for tabular; Chronos, TimesFM for time series (July 18, 2025, Vancouver)
- AI for Tabular Data @ EurIPS 2025 - Representation learning, generative AI, and foundation models for tables (Dec 6, 2025, Copenhagen)
- Table Representation Learning Workshop @ NeurIPS 2022
-
-
Benchmarks & Comparisons
-
Benchmark Repositories
- TableShift - Distribution shift benchmark for tabular data
- pytabkit - Pre-tuned defaults for GBDTs and MLPs with benchmarking tools
- TabZilla - Meta-learning analysis comparing neural nets vs boosted trees
- TabReD - Benchmark addressing pitfalls in tabular deep learning evaluation
- Tabular Benchmark - Comprehensive benchmark comparing deep learning and gradient boosting
- TabArena - Living benchmark with continuous updates for tabular ML methods
- LAMDA-TALENT - 300+ tabular datasets benchmark with TALENT-tiny for rapid evaluation
- EncoderBenchmarking - Benchmark of categorical encoders for feature engineering
- OpenML Benchmarking Suites - Standardized machine learning benchmarks
- TabArena - Living benchmark with continuous updates for tabular ML methods
- LAMDA-TALENT - 300+ tabular datasets benchmark with TALENT-tiny for rapid evaluation
- OpenML Benchmarking Suites - Standardized machine learning benchmarks
- MLPerf Training - Industry-standard ML benchmarks
-
Comparison Studies
- AutoML Benchmark - Comparing automated ML systems
-
-
Videos & Talks
-
Conference Talks
- XGBoost: A Scalable Tree Boosting System - Tianqi Chen, KDD 2016
- LightGBM: A Highly Efficient Gradient Boosting Decision Tree - Microsoft Research
- CatBoost - the new generation of gradient boosting - Yandex
-
Educational Videos
- Gradient Boosting Explained - StatQuest with Josh Starmer
- XGBoost Part 1: Regression - StatQuest
- XGBoost Part 2: Classification - StatQuest
- AdaBoost, Clearly Explained - StatQuest
-
Practical Tutorials
- XGBoost Python Tutorial - Hands-on Implementation
- LightGBM Complete Guide - Complete Tutorial
- CatBoost Tutorial for Beginners - Getting Started
-
Categories
Sub Categories
Other Frameworks
25
2025
24
Comparison Studies
23
Benchmark Repositories
13
2020
13
2024
11
2022
9
2021
9
Practical Tutorials
8
2023
7
2015
6
2017
6
2019
6
Interpretability & Explainability
6
2018
6
Foundational Papers
5
Educational Videos
4
2016
4
Getting Started
4
Parameter Tuning
4
Advanced Topics
4
Conference Talks
3
XGBoost
2
Deep Dives
2
CatBoost
2
Practical Applications
2
Comparisons & Benchmarks
1
LightGBM
1
Keywords
machine-learning
20
python
10
gradient-boosting
8
data-science
7
tabular-data
6
xgboost
5
random-forest
5
deep-learning
5
gbdt
5
scikit-learn
5
lightgbm
4
kaggle
4
gbm
4
catboost
4
decision-trees
4
interpretability
3
cuda
3
gpu
3
decision-tree
3
r
3
automl
3
hyperparameter-optimization
3
gradient-boosting-decision-trees
2
classifier
2
classification-trees
2
gradient-boosted-trees
2
boosting
2
ensemble-learning
2
gradient-boosting-machine
2
tabular
2
data-mining
2
tabular-methods
2
natural-language-processing
2
explainability
2
gbrt
2
tree-ensemble
2
artificial-intelligence
2
automated-machine-learning
2
neural-architecture-search
1
machine-learning-algorithms
1
cart
1
classification-model
1
distributed-systems
1
statistical-learning
1
regression-tree
1
decision-tree-classifier
1
decision-tree-learning
1
decision-tree-model
1
distributed
1
microsoft
1