https://github.com/jpmml/jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
https://github.com/jpmml/jpmml-sklearn

Last synced: 3 months ago
JSON representation

Java library and command-line application for converting Scikit-Learn pipelines to PMML

Host: GitHub
URL: https://github.com/jpmml/jpmml-sklearn
Owner: jpmml
License: agpl-3.0
Created: 2015-09-20T17:15:13.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2026-04-04T10:10:08.000Z (4 months ago)
Last Synced: 2026-04-04T10:38:01.574Z (4 months ago)
Language: Java
Homepage:
Size: 275 MB
Stars: 540
Watchers: 18
Forks: 117
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Notice: NOTICE.txt

Awesome Lists containing this project

awesome-production-machine-learning - sklearn2pmml
awesome-machine-learning-engineering - jpmml-sklearn - line application for converting Scikit-Learn pipelines to PMML (Software / Serialising and transpiling models)
awesome-java - JPMML SkLearn - Learn管道转换为PMML的Java库和命令行应用程序。 (人工智能 / 机器学习)

README

          JPMML-SkLearn [![Build Status](https://github.com/jpmml/jpmml-sklearn/workflows/maven/badge.svg)](https://github.com/jpmml/jpmml-sklearn/actions?query=workflow%3A%22maven%22)

=============

Java library and command-line application for converting [Scikit-Learn](https://scikit-learn.org/) pipelines to PMML.

# Table of Contents #

* [Features](#features)

  * [Overview](#overview)

  * [Supported packages](#supported-packages)

* [Prerequisites](#prerequisites)

  * [The Python side of operations](#the-python-side-of-operations)

  * [The JPMML-SkLearn side of operations](#the-jpmml-sklearn-side-of-operations)

* [Installation](#installation)

* [Usage](#usage)

  * [The Python side of operations](#the-python-side-of-operations-1)

  * [The JPMML-SkLearn side of operations](#the-jpmml-sklearn-side-of-operations-1)

* [Documentation](#documentation)

* [License](#license)

* [Additional information](#additional-information)

# Features #

### Overview

* Functionality:

  * Three times more supported Python packages, transformers and estimators than all the competitors combined!

  * Thorough collection, analysis and encoding of feature information:

    * Names.

    * Data and operational types.

    * Valid, invalid and missing value spaces.

    * Descriptive statistics.

  * Pipeline extensions:

    * Pruning.

    * Decision engineering (prediction post-processing).

    * Model verification.

  * Conversion options.

* Extensibility:

  * Rich Java APIs for developing custom converters.

  * Automatic discovery and registration of custom converters based on `META-INF/sklearn2pmml.properties` resource files.

  * Direct interfacing with other JPMML conversion libraries such as [JPMML-H2O](https://github.com/jpmml/jpmml-h2o), [JPMML-LightGBM](https://github.com/jpmml/jpmml-lightgbm), [JPMML-StatsModels](https://github.com/jpmml/jpmml-statsmodels) and [JPMML-XGBoost](https://github.com/jpmml/jpmml-xgboost).

* Production quality:

  * Complete test coverage.

  * Fully compliant with the [JPMML-Evaluator](https://github.com/jpmml/jpmml-evaluator) library.

### Supported packages

For a full list of supported transformer and estimator classes see the [`features.md`](features.md) file.

# Prerequisites #

### The Python side of operations

* Python 2.7, 3.4 or newer.

* Scikit-Learn 0.16.0 or newer. This is not a typo - all Scikit-Learn version from the past 10 years (2015 or newer) should work equally fine.

### The JPMML-SkLearn side of operations

* Java 11 or newer.

# Installation #

Enter the project root directory and build using [Apache Maven](https://maven.apache.org/):

```bash

mvn clean install

```

The build produces a library JAR file `pmml-sklearn/target/pmml-sklearn-1.9-SNAPSHOT.jar`, and an executable uber-JAR file `pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar`.

# Usage #

A typical workflow can be summarized as follows:

1. Use Scikit-Learn to assemble and fit a pipeline.

2. Serialize this pipeline in `pickle` data format to a file in a local filesystem.

3. Use the JPMML-SkLearn command-line application to convert this pickle file to a PMML file.

### The Python side of operations

Assembling and fitting a pipeline:

```python

from sklearn.compose import ColumnTransformer

from sklearn.datasets import load_iris

#from sklearn.decomposition import PCA

from sklearn.linear_model import LogisticRegression

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import StandardScaler

iris_X, iris_y = load_iris(return_X_y = True, as_frame = True)

iris_X.columns = [col.rstrip(" (cm)") for col in iris_X.columns]

pipeline = Pipeline([

    # Column-oriented feature engineering

    ("transformer", ColumnTransformer([

        ("scaler", StandardScaler(), [0, 1, 2, 3])

    ], remainder = "drop")),

    # Table-oriented feature engineering

    #("pca", PCA(n_components = 3)),

    # Final model

    ("classifier", LogisticRegression())

])

pipeline.fit(iris_X, iris_y)

```

Serializing the pipeline in Joblib-flavoured `pickle` data format:

```python

import joblib

joblib.dump(pipeline, "pipeline.pkl")

```

Please see the test script file [main.py](https://github.com/jpmml/jpmml-sklearn/blob/master/pmml-sklearn/src/test/resources/main.py) for more classification (binary and multi-class) and regression workflows.

### The JPMML-SkLearn side of operations

Converting a pickle file to a PMML file:

```bash

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar --pkl-input pipeline.pkl --pmml-output pipeline.pmml

```

Getting help:

```bash

java -jar pmml-sklearn-example/target/pmml-sklearn-example-executable-1.9-SNAPSHOT.jar --help

```

# Documentation #

Integrations:

* [Training Scikit-Learn GridSearchCV StatsModels pipelines](https://openscoring.io/blog/2023/10/15/sklearn_statsmodels_gridsearchcv_pipeline/)

* [Converting Scikit-Learn H2O.ai pipelines to PMML](https://openscoring.io/blog/2023/07/17/converting_sklearn_h2o_pipeline_pmml/)

* [Converting customized Scikit-Learn estimators to PMML](https://openscoring.io/blog/2023/05/03/converting_sklearn_subclass_pmml/)

* [Training Scikit-Learn StatsModels pipelines](https://openscoring.io/blog/2023/03/28/sklearn_statsmodels_pipeline/)

* [Upgrading Scikit-Learn XGBoost pipelines](https://openscoring.io/blog/2023/02/06/upgrading_sklearn_xgboost_pipeline_pmml/)

* [Training Python-based XGBoost accelerated failure time models](https://openscoring.io/blog/2023/01/28/python_xgboost_aft_pmml/)

* [Converting Scikit-Learn PyCaret 3 pipelines to PMML](https://openscoring.io/blog/2023/01/12/converting_sklearn_pycaret3_pipeline_pmml/)

* [Training Scikit-Learn H2O.ai pipelines](https://openscoring.io/blog/2022/11/11/sklearn_h2o_pipeline/)

* [One-hot encoding categorical features in Scikit-Learn XGBoost pipelines](https://openscoring.io/blog/2022/04/12/onehot_encoding_sklearn_xgboost_pipeline/)

* [Training Scikit-Learn TF(-IDF) plus XGBoost pipelines](https://openscoring.io/blog/2021/02/27/sklearn_tf_tfidf_xgboost_pipeline/)

* [Converting Scikit-Learn TF(-IDF) pipelines to PMML](https://openscoring.io/blog/2021/01/17/converting_sklearn_tf_tfidf_pipeline_pmml/)

* [Converting Scikit-Learn Imbalanced-Learn pipelines to PMML](https://openscoring.io/blog/2020/10/24/converting_sklearn_imblearn_pipeline_pmml/)

* [Converting logistic regression models to PMML](https://openscoring.io/blog/2020/01/19/converting_logistic_regression_pmml/#scikit-learn)

* [Stacking Scikit-Learn, LightGBM and XGBoost models](https://openscoring.io/blog/2020/01/02/stacking_sklearn_lightgbm_xgboost/)

* [Converting Scikit-Learn GridSearchCV pipelines to PMML](https://openscoring.io/blog/2019/12/25/converting_sklearn_gridsearchcv_pipeline_pmml/)

* [Converting Scikit-Learn TPOT pipelines to PMML](https://openscoring.io/blog/2019/06/10/converting_sklearn_tpot_pipeline_pmml/)

* [Converting Scikit-Learn LightGBM pipelines to PMML](https://openscoring.io/blog/2019/04/07/converting_sklearn_lightgbm_pipeline_pmml/)

Extensions:

* [Extending Scikit-Learn with feature cross-references](https://openscoring.io/blog/2023/11/25/sklearn_feature_cross_references/)

* [Extending Scikit-Learn with UDF expression transformer](https://openscoring.io/blog/2023/03/09/sklearn_udf_expression_transformer/)

* [Extending Scikit-Learn with CHAID models](https://openscoring.io/blog/2022/07/14/sklearn_chaid_pmml/)

* [Extending Scikit-Learn with prediction post-processing](https://openscoring.io/blog/2022/05/06/sklearn_prediction_postprocessing/)

* [Extending Scikit-Learn with outlier detector transformer](https://openscoring.io/blog/2021/07/16/sklearn_outlier_detector_transformer/)

* [Extending Scikit-Learn with date and datetime features](https://openscoring.io/blog/2020/03/08/sklearn_date_datetime_pmml/)

* [Extending Scikit-Learn with feature specifications](https://openscoring.io/blog/2020/02/23/sklearn_feature_specification_pmml/)

* [Extending Scikit-Learn with GBDT+LR ensemble models](https://openscoring.io/blog/2019/06/19/sklearn_gbdt_lr_ensemble/)

* [Extending Scikit-Learn with business rules model](https://openscoring.io/blog/2018/09/17/sklearn_business_rules/)

Miscellaneous:

* [Upgrading Scikit-Learn decision tree models](https://openscoring.io/blog/2023/12/29/upgrading_sklearn_decision_tree/)

* [Measuring the memory consumption of Scikit-Learn models](https://openscoring.io/blog/2022/11/09/measuring_memory_sklearn/)

* [Benchmarking Scikit-Learn against JPMML-Evaluator](https://openscoring.io/blog/2021/08/04/benchmarking_sklearn_jpmml_evaluator/)

* [Analyzing Scikit-Learn feature importances via PMML](https://openscoring.io/blog/2021/07/11/analyzing_sklearn_feature_importances_pmml/)

Archived:

* [Converting Scikit-Learn to PMML](https://www.slideshare.net/VilluRuusmann/converting-scikitlearn-to-pmml)

# License #

JPMML-SkLearn is licensed under the terms and conditions of the [GNU Affero General Public License, Version 3.0](https://www.gnu.org/licenses/agpl-3.0.html).

If you would like to use JPMML-SkLearn in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-SkLearn available under the terms and conditions of the [BSD 3-Clause License](https://opensource.org/licenses/BSD-3-Clause) instead.

# Additional information #

JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.

Interested in using [Java PMML API](https://github.com/jpmml) software in your company? Please contact [info@openscoring.io](mailto:info@openscoring.io)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jpmml/jpmml-sklearn

Awesome Lists containing this project

README