Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iamdecode/sklearn-pmml-model
A library to parse and convert PMML models into Scikit-learn estimators.
https://github.com/iamdecode/sklearn-pmml-model
machine-learning pmml scikit-learn sklearn
Last synced: about 19 hours ago
JSON representation
A library to parse and convert PMML models into Scikit-learn estimators.
- Host: GitHub
- URL: https://github.com/iamdecode/sklearn-pmml-model
- Owner: iamDecode
- License: bsd-2-clause
- Created: 2018-06-08T17:38:21.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-24T15:22:03.000Z (3 months ago)
- Last Synced: 2024-12-18T12:07:08.275Z (8 days ago)
- Topics: machine-learning, pmml, scikit-learn, sklearn
- Language: Python
- Homepage:
- Size: 597 KB
- Stars: 77
- Watchers: 4
- Forks: 15
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# sklearn-pmml-model
[![PyPI version](https://badge.fury.io/py/sklearn-pmml-model.svg)](https://badge.fury.io/py/sklearn-pmml-model)
[![codecov](https://codecov.io/gh/iamDecode/sklearn-pmml-model/branch/master/graph/badge.svg?token=CGbbgziGwn)](https://codecov.io/gh/iamDecode/sklearn-pmml-model)
[![CircleCI](https://circleci.com/gh/iamDecode/sklearn-pmml-model.svg?style=shield)](https://circleci.com/gh/iamDecode/sklearn-pmml-model)
[![ReadTheDocs](https://readthedocs.org/projects/sklearn-pmml-model/badge/?version=latest&style=flat)](https://sklearn-pmml-model.readthedocs.io/en/latest/)A library to effortlessly import models trained on different platforms and with programming languages into scikit-learn in Python. First export your model to [PMML](http://dmg.org/pmml/v4-3/GeneralStructure.html) (widely supported). Next, load the exported PMML file with this library, and use the class as any other scikit-learn estimator.
## Installation
The easiest way is to use pip:
```
$ pip install sklearn-pmml-model
```## Status
The library currently supports the following models:| Model | Classification | Regression | Categorical features |
|--------------------------------------------------------|----------------|------------|----------------------|
| [Decision Trees](sklearn_pmml_model/tree) | ✅ | ✅ | ✅1 |
| [Random Forests](sklearn_pmml_model/ensemble) | ✅ | ✅ | ✅1 |
| [Gradient Boosting](sklearn_pmml_model/ensemble) | ✅ | ✅ | ✅1 |
| [Linear Regression](sklearn_pmml_model/linear_model) | ✅ | ✅ | ✅3 |
| [Ridge](sklearn_pmml_model/linear_model) | ✅2 | ✅ | ✅3 |
| [Lasso](sklearn_pmml_model/linear_model) | ✅2 | ✅ | ✅3 |
| [ElasticNet](sklearn_pmml_model/linear_model) | ✅2 | ✅ | ✅3 |
| [Gaussian Naive Bayes](sklearn_pmml_model/naive_bayes) | ✅ | | ✅3 |
| [Support Vector Machines](sklearn_pmml_model/svm) | ✅ | ✅ | ✅3 |
| [Nearest Neighbors](sklearn_pmml_model/neighbors) | ✅ | ✅ | |
| [Neural Networks](sklearn_pmml_model/neural_network) | ✅ | ✅ | |1 Categorical feature support using slightly modified internals, based on [scikit-learn#12866](https://github.com/scikit-learn/scikit-learn/pull/12866).
2 These models differ only in training characteristics, the resulting model is of the same form. Classification is supported using `PMMLLogisticRegression` for regression models and `PMMLRidgeClassifier` for general regression models.
3 By one-hot encoding categorical features automatically.
## Example
A minimal working example (using [this PMML file](https://github.com/iamDecode/sklearn-pmml-model/blob/master/models/randomForest.pmml)) is shown below:```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn_pmml_model.ensemble import PMMLForestClassifier
from sklearn_pmml_model.auto_detect import auto_detect_estimator# Prepare the data
iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33, random_state=123)# Specify the model type for the least overhead...
#clf = PMMLForestClassifier(pmml="models/randomForest.pmml")# ...or simply let the library auto-detect the model type
clf = auto_detect_estimator(pmml="models/randomForest.pmml")# Use the model as any other scikit-learn model
clf.predict(Xte)
clf.score(Xte, yte)
```More examples can be found in the subsequent packages: [tree](sklearn_pmml_model/tree), [ensemble](sklearn_pmml_model/ensemble), [linear_model](sklearn_pmml_model/linear_model), [naive_bayes](sklearn_pmml_model/naive_bayes), [svm](sklearn_pmml_model/svm), [neighbors](sklearn_pmml_model/neighbors) and [neural_network](sklearn_pmml_model/neural_network).
## Benchmark
Depending on the data set and model, `sklearn-pmml-model` is between 1 and 10 times faster than competing libraries, by leveraging the optimization and industry-tested robustness of `sklearn`. Source code for this benchmark can be found in the corresponding [jupyter notebook](benchmark.ipynb).
### Running times (load + predict, in seconds)
| | | Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting |
|---------------|---------------------|--------------|-------------|---------------|---------------|-------------------|
| Wine | `PyPMML` | 0.013038 | 0.005674 | 0.005587 | 0.032734 | 0.034649 |
| | `sklearn-pmml-model`| 0.00404 | 0.004059 | 0.000964 | 0.030008 | 0.032949 |
| Breast cancer | `PyPMML` | 0.009838 | 0.01153 | 0.009367 | 0.058941 | 0.031196 |
| | `sklearn-pmml-model`| 0.010749 | 0.008481 | 0.001106 | 0.044021 | 0.013411 |### Improvement
| | | Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting |
|---------------|--------------------|--------------|-------------|---------------|---------------|-------------------|
| Wine | Improvement | 3.23× | 1.40× | 5.80× | 1.09× | 1.05× |
| Breast cancer | Improvement | 0.91× | 1.36× | **8.47×** | 1.34× | 2.33× |*Benchmark ran on: 24 september 2024 17:19*
## Development
### Prerequisites
Tests can be run using Py.test. Grab a local copy of the source:
```
$ git clone http://github.com/iamDecode/sklearn-pmml-model
$ cd sklearn-pmml-model
```create a virtual environment and activating it:
```
$ python3 -m venv venv
$ source venv/bin/activate
```and install the dependencies:
```
$ pip install -r requirements.txt
```The final step is to build the Cython extensions:
```
$ python setup.py build_ext --inplace
```### Testing
You can execute tests with py.test by running:
```
$ python setup.py pytest
```## Contributing
Feel free to make a contribution. Please read [CONTRIBUTING.md](CONTRIBUTING.md) for more details.
## License
This project is licensed under the BSD 2-Clause License - see the [LICENSE](LICENSE) file for details.