Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/siboehm/lleaves
Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://github.com/siboehm/lleaves
decision-trees gradient-boosting lightgbm llvm machine-learning python
Last synced: 2 days ago
JSON representation
Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
- Host: GitHub
- URL: https://github.com/siboehm/lleaves
- Owner: siboehm
- License: mit
- Created: 2021-04-27T06:35:23.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2024-12-04T15:36:18.000Z (2 months ago)
- Last Synced: 2025-01-25T09:01:43.296Z (10 days ago)
- Topics: decision-trees, gradient-boosting, lightgbm, llvm, machine-learning, python
- Language: Python
- Homepage: https://lleaves.readthedocs.io/en/latest/
- Size: 4.74 MB
- Stars: 386
- Watchers: 10
- Forks: 33
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# lleaves 🍃
![CI](https://github.com/siboehm/lleaves/workflows/CI/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/lleaves/badge/?version=latest)](https://lleaves.readthedocs.io/en/latest/?badge=latest)
![Downloads](https://static.pepy.tech/badge/lleaves)A LLVM-based compiler for LightGBM decision trees.
`lleaves` converts trained LightGBM models to optimized machine code, speeding-up prediction by ≥10x.
## Example
```python
lgbm_model = lightgbm.Booster(model_file="NYC_taxi/model.txt")
%timeit lgbm_model.predict(df)
# 12.77sllvm_model = lleaves.Model(model_file="NYC_taxi/model.txt")
llvm_model.compile()
%timeit llvm_model.predict(df)
# 0.90s
```## Why lleaves?
- Speed: Both low-latency single-row prediction and high-throughput batch-prediction.
- Drop-in replacement: The interface of `lleaves.Model` is a subset of `LightGBM.Booster`.
- Dependencies: `llvmlite` and `numpy`. LLVM comes statically linked.## Installation
`conda install -c conda-forge lleaves` or `pip install lleaves` (Linux and MacOS only).## Benchmarks
Ran on a dedicated Intel i7-4770 Haswell, 4 cores.
Stated runtime is the minimum over 20.000 runs.### Dataset: [NYC-taxi](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
mostly numerical features.
|batchsize | 1 | 10| 100 |
|---|---:|---:|---:|
|LightGBM | 52.31μs | 84.46μs | 441.15μs |
|ONNX Runtime| 11.00μs | 36.74μs | 190.87μs |
|Treelite | 28.03μs | 40.81μs | 94.14μs |
|``lleaves`` | 9.61μs | 14.06μs | 31.88μs |### Dataset: [MTPL2](https://www.openml.org/d/41214)
mix of categorical and numerical features.
|batchsize | 10,000 | 100,000 | 678,000 |
|---|---:|---:|---:|
|LightGBM | 95.14ms | 992.47ms | 7034.65ms |
|ONNX Runtime | 38.83ms | 381.40ms | 2849.42ms |
|Treelite | 38.15ms | 414.15ms | 2854.10ms |
|``lleaves`` | 5.90ms | 56.96ms | 388.88ms |## Advanced Usage
To avoid expensive recompilation, you can call `lleaves.Model.compile()` and pass a `cache=` argument.
This will store an ELF (Linux) / Mach-O (macOS) file at the given path when the method is first called.
Subsequent calls of `compile(cache=)` will skip compilation and load the stored binary file instead.
For more info, see [docs](https://lleaves.readthedocs.io/en/latest/).To eliminate any Python overhead during inference you can link against this generated binary.
For an example of how to do this see `benchmarks/c_bench/`.
The function signature might change between major versions.## Development
High-level explanation of the inner workings of the lleaves compiler: [link](https://siboehm.com/articles/21/lleaves)
```bash
mamba env create
conda activate lleaves
pip install -e .
pre-commit install
./benchmarks/data/setup_data.sh
pytest -k "not benchmark"
```## Cite
If you're using lleaves for your research, I'd appreciate if you could cite it. Use:
```
@software{Boehm_lleaves,
author = {Boehm, Simon},
title = {lleaves},
url = {https://github.com/siboehm/lleaves},
license = {MIT},
}
```