https://github.com/google/yggdrasil-decision-forests
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://github.com/google/yggdrasil-decision-forests
cart cli cpp decision-forest decision-trees distributed-computing go gradient-boosting interpretability javascript machine-learning ml pypi python random-forest tensorflow
Last synced: 6 months ago
JSON representation
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
- Host: GitHub
- URL: https://github.com/google/yggdrasil-decision-forests
- Owner: google
- License: apache-2.0
- Created: 2021-04-22T08:21:18.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-05-07T08:15:49.000Z (6 months ago)
- Last Synced: 2025-05-07T09:26:07.633Z (6 months ago)
- Topics: cart, cli, cpp, decision-forest, decision-trees, distributed-computing, go, gradient-boosting, interpretability, javascript, machine-learning, ml, pypi, python, random-forest, tensorflow
- Language: C++
- Homepage: https://ydf.readthedocs.io/
- Size: 39.5 MB
- Stars: 569
- Watchers: 14
- Forks: 60
- Open Issues: 35
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rainmana - google/yggdrasil-decision-forests - A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees. (C++)
- awesome-production-machine-learning - YDF - decision-forests.svg?style=social) - YDF (Yggdrasil Decision Forests) is a library to train, evaluate, interpret, and serve Random Forest, Gradient Boosted Decision Trees, CART and Isolation forest models. (Computation and Communication Optimisation)
README
[](https://pypi.org/project/ydf/)
[](https://opensource.org/licenses/Apache-2.0)
[](https://ydf.readthedocs.io/)
[](https://ydf.readthedocs.io/en/latest/)
[](https://pepy.tech/project/ydf)
**YDF** (Yggdrasil Decision Forests) is a library to train, evaluate, interpret,
and serve Random Forest, Gradient Boosted Decision Trees, CART and Isolation
forest models.
See the [documentation](https://ydf.readthedocs.org/) for more information on
YDF.
## Installation
To install YDF from [PyPI](https://pypi.org/project/ydf/), run:
```shell
pip install ydf -U
```
## Usage example
[](https://colab.research.google.com/github/google/yggdrasil-decision-forests/blob/main/documentation/public/docs/tutorial/usage_example.ipynb)
```python
import ydf
import pandas as pd
# Load dataset with Pandas
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset/"
train_ds = pd.read_csv(ds_path + "adult_train.csv")
test_ds = pd.read_csv(ds_path + "adult_test.csv")
# Train a Gradient Boosted Trees model
model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)
# Look at a model (input features, training logs, structure, etc.)
model.describe()
# Evaluate a model (e.g. roc, accuracy, confusion matrix, confidence intervals)
model.evaluate(test_ds)
# Generate predictions
model.predict(test_ds)
# Analyse a model (e.g. partial dependence plot, variable importance)
model.analyze(test_ds)
# Benchmark the inference speed of a model
model.benchmark(test_ds)
# Save the model
model.save("/tmp/my_model")
```
Example with the C++ API.
```c++
auto dataset_path = "csv:train.csv";
// List columns in training dataset
DataSpecification spec;
CreateDataSpec(dataset_path, false, {}, &spec);
// Create a training configuration
TrainingConfig train_config;
train_config.set_learner("RANDOM_FOREST");
train_config.set_task(Task::CLASSIFICATION);
train_config.set_label("my_label");
// Train model
std::unique_ptr learner;
GetLearner(train_config, &learner);
auto model = learner->Train(dataset_path, spec);
// Export model
SaveModel("my_model", model.get());
```
(based on [examples/beginner.cc](examples/beginner.cc))
## Next steps
Check the
[Getting Started tutorial ðŸ§](https://ydf.readthedocs.io/en/stable/tutorial/getting_started/).
## Citation
If you us Yggdrasil Decision Forests in a scientific publication, please cite
the following paper:
[Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library](https://doi.org/10.1145/3580305.3599933).
**Bibtex**
```
@inproceedings{GBBSP23,
author = {Mathieu Guillame{-}Bert and
Sebastian Bruch and
Richard Stotz and
Jan Pfeifer},
title = {Yggdrasil Decision Forests: {A} Fast and Extensible Decision Forests
Library},
booktitle = {Proceedings of the 29th {ACM} {SIGKDD} Conference on Knowledge Discovery
and Data Mining, {KDD} 2023, Long Beach, CA, USA, August 6-10, 2023},
pages = {4068--4077},
year = {2023},
url = {https://doi.org/10.1145/3580305.3599933},
doi = {10.1145/3580305.3599933},
}
```
**Raw**
Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library,
Guillame-Bert et al., KDD 2023: 4068-4077. doi:10.1145/3580305.3599933
## Contact
You can contact the core development team at
[decision-forests-contact@google.com](mailto:decision-forests-contact@google.com).
## Credits
Yggdrasil Decision Forests and TensorFlow Decision Forests are developed by:
- Mathieu Guillame-Bert (gbm AT google DOT com)
- Richard Stotz (richardstotz AT google DOT com)
- Jan Pfeifer (janpf AT google DOT com)
- Sebastian Bruch (sebastian AT bruch DOT io)
- Arvind Srinivasan (arvnd AT google DOT com)
## Contributing
Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are
welcome. If you want to contribute, check the
[contribution guidelines](CONTRIBUTING.md).
## License
[Apache License 2.0](LICENSE)