https://github.com/ajtulloch/sklearn-compiledtrees
Compiled Decision Trees for scikit-learn
https://github.com/ajtulloch/sklearn-compiledtrees
Last synced: 2 months ago
JSON representation
Compiled Decision Trees for scikit-learn
- Host: GitHub
- URL: https://github.com/ajtulloch/sklearn-compiledtrees
- Owner: ajtulloch
- License: mit
- Created: 2014-03-18T23:40:08.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2024-05-14T07:44:02.000Z (over 1 year ago)
- Last Synced: 2024-12-16T21:48:30.211Z (11 months ago)
- Language: Python
- Homepage: tullo.ch/articles/decision-tree-evaluation/
- Size: 267 KB
- Stars: 223
- Watchers: 12
- Forks: 37
- Open Issues: 3
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-python-data-science - sklearn-compiledtrees - Compiled Decision Trees for scikit-learn. (Deployment / Ranking/Recommender)
README
Scikit-Learn Compiled Trees
===========================
|Build Status|
|PyPI|
Installation
------------
Released under the MIT License.
.. code:: bash
pip install sklearn-compiledtrees
Or to get the latest development version:
.. code:: bash
pip install git+https://github.com/ajtulloch/sklearn-compiledtrees.git
sklearn-compiledtrees has been tested to work on OS X, Linux and Windows.
Installing on Windows requires GCC compiler and dlfcn-win32_,
setting `CXX` environment variable (`set "CXX=gcc -pthread"` for CMD),
and manual installation from source directory. Using msys2 distribution in conda
is strongly recommended.
.. code:: bash
conda install -c msys2 m2w64-toolchain m2w64-dlfcn pywin32
python setup.py build_ext --compiler=mingw32 -llibdl
python setup.py install
Rationale
---------
In some use cases, predicting given a model is in the hot-path, so
speeding up decision tree evaluation is very useful.
An effective way of speeding up evaluation of decision trees can be to
generate code representing the evaluation of the tree, compile that to
optimized object code, and dynamically load that file via dlopen/dlsym
or equivalent.
See
https://courses.cs.washington.edu/courses/cse501/10au/compile-machlearn.pdf
for a detailed discussion, and
http://tullo.ch/articles/decision-tree-evaluation/ for a more
pedagogical explanation and more benchmarks in C++.
This package implements compiled decision tree evaluation for the simple
case of a single-output regression tree or ensemble.
Usage
-----
.. code:: python
import compiledtrees
import sklearn.ensemble
X_train, y_train, X_test, y_test = ...
clf = ensemble.GradientBoostingRegressor()
clf.fit(X_train, y_train)
compiled_predictor = compiledtrees.CompiledRegressionPredictor(clf)
predictions = compiled_predictor.predict(X_test)
Benchmarks
----------
For random forests, we see 5x to 8x speedup in evaluation. For gradient
boosted ensembles, it's between a 1.5x and 3x speedup in evaluation.
This is due to the fact that gradient boosted trees already have an
optimized prediction implementation.
There is a benchmark script attached that allows us to examine the
performance of evaluation across a range of ensemble configurations and
datasets.
In the graphs attached, ``GB`` is Gradient Boosted, ``RF`` is Random
Forest, ``D1``, etc correspond to setting ``max-depth=1``, and ``B10``
corresponds to setting ``max_leaf_nodes=10``.
Graphs
------
.. code:: bash
for dataset in friedman1 friedman2 friedman3 uniform hastie; do
python ../benchmarks/bench_compiled_tree.py \
--iterations=10 \
--num_examples=1000 \
--num_features=50 \
--dataset=$dataset \
--max_estimators=300 \
--num_estimator_values=6
done
|timings3907426606273805268| |timings-1162001441413946416|
|timings5617004024503483042| |timings2681645894201472305|
|timings2070620222460516071|
.. |Build Status| image:: https://travis-ci.org/ajtulloch/sklearn-compiledtrees.png?branch=master
:target: https://travis-ci.org/ajtulloch/sklearn-compiledtrees
.. |PyPI| image:: https://badge.fury.io/py/sklearn-compiledtrees.png
:target: http://badge.fury.io/py/sklearn-compiledtrees
.. _dlfcn-win32: https://github.com/dlfcn-win32/dlfcn-win32
.. |timings3907426606273805268| image:: https://f.cloud.github.com/assets/1121581/2453407/c70a64bc-aedd-11e3-94c7-519411ae6276.png
:width: 500px
.. |timings-1162001441413946416| image:: https://f.cloud.github.com/assets/1121581/2453409/c70ad4ec-aedd-11e3-972d-07a49a6bc610.png
:width: 500px
.. |timings5617004024503483042| image:: https://f.cloud.github.com/assets/1121581/2453410/c70b48dc-aedd-11e3-9c68-ec3f9d4672b8.png
:width: 500px
.. |timings2681645894201472305| image:: https://f.cloud.github.com/assets/1121581/2453411/c70b4de6-aedd-11e3-86bd-d534b0ad0618.png
:width: 500px
.. |timings2070620222460516071| image:: https://f.cloud.github.com/assets/1121581/2453408/c70aa594-aedd-11e3-8b14-1a26eb1f3eba.png
:width: 500px