Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dmitryikh/leaves
pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks
https://github.com/dmitryikh/leaves
boosting decision-trees go golang lightgbm machine-learning xgboost
Last synced: about 1 month ago
JSON representation
pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks
- Host: GitHub
- URL: https://github.com/dmitryikh/leaves
- Owner: dmitryikh
- License: mit
- Created: 2018-09-15T09:26:59.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-07-03T17:46:42.000Z (6 months ago)
- Last Synced: 2024-08-03T23:19:28.854Z (5 months ago)
- Topics: boosting, decision-trees, go, golang, lightgbm, machine-learning, xgboost
- Language: Go
- Homepage:
- Size: 1.22 MB
- Stars: 422
- Watchers: 16
- Forks: 72
- Open Issues: 34
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-list - leaves - Pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks. (Machine Learning Framework / General Purpose Framework)
README
# leaves
[![version](https://img.shields.io/badge/version-0.8.0-yellow.svg)](https://semver.org)
[![Build Status](https://travis-ci.org/dmitryikh/leaves.svg?branch=master)](https://travis-ci.org/dmitryikh/leaves)
[![GoDoc](https://godoc.org/github.com/dmitryikh/leaves?status.png)](https://godoc.org/github.com/dmitryikh/leaves)
[![Coverage Status](https://coveralls.io/repos/github/dmitryikh/leaves/badge.svg?branch=master)](https://coveralls.io/github/dmitryikh/leaves?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/dmitryikh/leaves)](https://goreportcard.com/report/github.com/dmitryikh/leaves)![Logo](logo.png)
## Introduction
_leaves_ is a library implementing prediction code for GBRT (Gradient Boosting Regression Trees) models in **pure Go**. The goal of the project - make it possible to use models from popular GBRT frameworks in Go programs without C API bindings.
**NOTE**: Before `1.0.0` release the API is a subject to change.
## Features
* General Features:
* support parallel predictions for batches
* support sigmoid, softmax transformation functions
* support getting leaf indices of decision trees
* Support LightGBM ([repo](https://github.com/Microsoft/LightGBM)) models:
* read models from `text` format and from `JSON` format
* support `gbdt`, `rf` (random forest) and `dart` models
* support multiclass predictions
* addition optimizations for categorical features (for example, _one hot_ decision rule)
* addition optimizations exploiting only prediction usage
* Support XGBoost ([repo](https://github.com/dmlc/xgboost)) models:
* read models from binary format
* support `gbtree`, `gblinear`, `dart` models
* support multiclass predictions
* support missing values (`nan`)
* Support scikit-learn ([repo](https://github.com/scikit-learn/scikit-learn)) tree models (experimental support):
* read models from pickle format (protocol `0`)
* support `sklearn.ensemble.GradientBoostingClassifier`## Usage examples
In order to start, go get this repository:
```sh
go get github.com/dmitryikh/leaves
```Minimal example:
```go
package mainimport (
"fmt""github.com/dmitryikh/leaves"
)func main() {
// 1. Read model
useTransformation := true
model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
if err != nil {
panic(err)
}// 2. Do predictions!
fvals := []float64{1.0, 2.0, 3.0}
p := model.PredictSingle(fvals, 0)
fmt.Printf("Prediction for %v: %f\n", fvals, p)
}
```In order to use XGBoost model, just change `leaves.LGEnsembleFromFile`, to `leaves.XGEnsembleFromFile`.
## Documentation
Documentation is hosted on godoc ([link](https://godoc.org/github.com/dmitryikh/leaves)). Documentation contains complex usage examples and full API reference. Some additional information about usage examples can be found in [leaves_test.go](leaves_test.go).
## Compatibility
Most _leaves_ features are tested to be compatible with old and coming versions of GBRT libraries. In [compatibility.md](compatibility.md) one can found detailed report about _leaves_ correctness against different versions of external GBRT libraries.
Some additional information on new features and backward compatibility can be found in [NOTES.md](NOTES.md).
## Benchmark
Below are comparisons of prediction speed on batches (~1000 objects in 1 API
call). Hardware: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 16 ГБ
2133 MHz LPDDR3. C API implementations were called from python bindings. But
large batch size should neglect overhead of python bindings. _leaves_
benchmarks were run by means of golang test framework: `go test -bench`. See
[benchmark](benchmark) for mode details on measurments. See
[testdata/README.md](testdata/README.md) for data preparation pipelines.Single thread:
| Test Case | Features | Trees | Batch size | C API | _leaves_ |
|-----------|----------|-------|------------|---------|----------|
| LightGBM [MS LTR](https://github.com/Microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) | 137 |500 | 1000 | 49ms | 51ms |
| LightGBM [Higgs](https://github.com/Microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) | 28 | 500 | 1000 | 50ms | 50ms |
| LightGBM KDD Cup 99* | 41 | 1200 | 1000 | 70ms | 85ms |
| XGBoost Higgs | 28 | 500 | 1000 | 44ms | 50ms |4 threads:
| Test Case | Features | Trees | Batch size | C API | _leaves_ |
|-----------|----------|-------|------------|---------|----------|
| LightGBM [MS LTR](https://github.com/Microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) | 137 |500 | 1000 | 14ms | 14ms |
| LightGBM [Higgs](https://github.com/Microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) | 28 | 500 | 1000 | 14ms | 14ms |
| LightGBM KDD Cup 99* | 41 | 1200 | 1000 | 19ms | 24ms |
| XGBoost Higgs | 28 | 500 | 1000 | ? | 14ms |(?) - currenly I'm unable to utilize multithreading form XGBoost predictions by means of python bindings
(*) - KDD Cup 99 problem involves continuous and categorical features simultaneously
## Limitations
* LightGBM models:
* limited support of transformation functions (support only sigmoid, softmax)
* XGBoost models:
* limited support of transformation functions (support only sigmoid, softmax)
* could be slight divergence between C API predictions vs. _leaves_ because of floating point convertions and comparisons tolerances
* scikit-learn tree models:
* no support transformations functions. Output scores is _raw scores_ (as from `GradientBoostingClassifier.decision_function`)
* only pickle protocol `0` is supported
* could be slight divergence between sklearn predictions vs. _leaves_ because of floating point convertions and comparisons tolerances## Contacts
In case if you are interested in the project or if you have questions, please contact with me by
email: khdmitryi ```at``` gmail.com