{"id":13468285,"url":"https://github.com/jpmml/sklearn2pmml","last_synced_at":"2025-12-29T22:31:20.649Z","repository":{"id":2254121,"uuid":"45906846","full_name":"jpmml/sklearn2pmml","owner":"jpmml","description":"Python library for converting Scikit-Learn pipelines to PMML","archived":false,"fork":false,"pushed_at":"2024-10-28T21:01:03.000Z","size":142172,"stargazers_count":686,"open_issues_count":14,"forks_count":113,"subscribers_count":25,"default_branch":"master","last_synced_at":"2024-10-29T22:56:08.377Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jpmml.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-10T11:12:55.000Z","updated_at":"2024-10-29T06:41:13.000Z","dependencies_parsed_at":"2023-07-05T18:02:03.642Z","dependency_job_id":"eea711c3-a71c-431f-9617-e11b7ded30e5","html_url":"https://github.com/jpmml/sklearn2pmml","commit_stats":{"total_commits":541,"total_committers":2,"mean_commits":270.5,"dds":0.001848428835489857,"last_synced_commit":"eb63c40d05876fa7f4c1ab8f500caff235458213"},"previous_names":[],"tags_count":221,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpmml%2Fsklearn2pmml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpmml%2Fsklearn2pmml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpmml%2Fsklearn2pmml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jpmml%2Fsklearn2pmml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jpmml","download_url":"https://codeload.github.com/jpmml/sklearn2pmml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245597206,"owners_count":20641859,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T15:01:08.219Z","updated_at":"2025-12-29T22:31:20.644Z","avatar_url":"https://github.com/jpmml.png","language":"Python","funding_links":[],"categories":["Python","Software"],"sub_categories":["Serialising and transpiling models"],"readme":"SkLearn2PMML [![Build Status](https://github.com/jpmml/sklearn2pmml/workflows/pytest/badge.svg)](https://github.com/jpmml/sklearn2pmml/actions?query=workflow%3A%22pytest%22)\n============\n\nPython package for converting [Scikit-Learn](https://scikit-learn.org/) pipelines to PMML.\n\n# Features #\n\nThis package is a thin Python wrapper around the [JPMML-SkLearn](https://github.com/jpmml/jpmml-sklearn#features) library.\n\n# News and Updates #\n\nThe current version is **0.125.0** (26 December, 2025):\n\n```\npip install sklearn2pmml==0.125.0\n```\n\nSee the [NEWS.md](https://github.com/jpmml/sklearn2pmml/blob/master/NEWS.md#01250) file.\n\n# Prerequisites #\n\n* Java 11 or newer. The Java executable must be available on system path.\n* Python 3.8 or newer.\n\n# Installation #\n\nInstalling a release version from PyPI:\n\n```\npip install sklearn2pmml\n```\n\nAlternatively, installing the latest snapshot version from GitHub:\n\n```\npip install --upgrade git+https://github.com/jpmml/sklearn2pmml.git\n```\n\n# Usage #\n\n## Command-line application ##\n\nThe `sklearn2pmml` module is executable.\nThe main application loads the estimator object from the Pickle file (`-i` or `--input`; supports `joblib`, `pickle` or `dill` variants), performs the conversion, and saves the result to a PMML file (`-o` or `--output`):\n\n```\npython -m sklearn2pmml --input pipeline.pkl --output pipeline.pmml\n```\n\nGetting help:\n\n```\npython -m sklearn2pmml --help\n```\n\nOn some platforms, the [Pip](https://pypi.org/project/pip/) package installer additionally makes the main application available as a top-level command:\n\n```\nsklearn2pmml --input pipeline.pkl --output pipeline.pmml\n```\n\n## Library ##\n\nA typical workflow can be summarized as follows:\n\n1. Create a `PMMLPipeline` object, and populate it with pipeline steps as usual. The `sklearn2pmml.pipeline.PMMLPipeline` class extends the `sklearn.pipeline.Pipeline` class with the following functionality:\n  * If the `PMMLPipeline.fit(X, y)` method is invoked with `pandas.DataFrame` or `pandas.Series` object as an `X` argument, then its column names are used as feature names. Otherwise, feature names default to \"x1\", \"x2\", .., \"x{number_of_features}\".\n  * If the `PMMLPipeline.fit(X, y)` method is invoked with `pandas.Series` object as an `y` argument, then its name is used as the target name (for supervised models). Otherwise, the target name defaults to \"y\".\n2. Fit and validate the pipeline as usual.\n3. Optionally, compute and embed verification data into the `PMMLPipeline` object by invoking `PMMLPipeline.verify(X)` method with a small but representative subset of training data.\n4. Convert the `PMMLPipeline` object to a PMML file in local filesystem by invoking the `sklearn2pmml.sklearn2pmml(estimator, pmml_path)` utility method.\n\nDeveloping a simple decision tree model for the classification of iris species:\n\n```python\nimport pandas\n\niris_df = pandas.read_csv(\"Iris.csv\")\n\niris_X = iris_df[iris_df.columns.difference([\"Species\"])]\niris_y = iris_df[\"Species\"]\n\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn2pmml.pipeline import PMMLPipeline\n\npipeline = PMMLPipeline([\n\t(\"classifier\", DecisionTreeClassifier())\n])\npipeline.fit(iris_X, iris_y)\n\nfrom sklearn2pmml import sklearn2pmml\n\nsklearn2pmml(pipeline, \"DecisionTreeIris.pmml\", with_repr = True)\n```\n\nDeveloping a more elaborate logistic regression model for the same:\n\n```python\nimport pandas\n\niris_df = pandas.read_csv(\"Iris.csv\")\n\niris_X = iris_df[iris_df.columns.difference([\"Species\"])]\niris_y = iris_df[\"Species\"]\n\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn.decomposition import PCA\nfrom sklearn.feature_selection import SelectKBest\nfrom sklearn.impute import SimpleImputer\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn2pmml.decoration import ContinuousDomain\nfrom sklearn2pmml.pipeline import PMMLPipeline\n\npipeline = PMMLPipeline([\n\t(\"mapper\", DataFrameMapper([\n\t\t([\"Sepal.Length\", \"Sepal.Width\", \"Petal.Length\", \"Petal.Width\"], [ContinuousDomain(), SimpleImputer()])\n\t])),\n\t(\"pca\", PCA(n_components = 3)),\n\t(\"selector\", SelectKBest(k = 2)),\n\t(\"classifier\", LogisticRegression(multi_class = \"ovr\"))\n])\npipeline.fit(iris_X, iris_y)\npipeline.verify(iris_X.sample(n = 15))\n\nfrom sklearn2pmml import sklearn2pmml\n\nsklearn2pmml(pipeline, \"LogisticRegressionIris.pmml\", with_repr = True)\n```\n\n# Documentation #\n\nIntegrations:\n\n* [Training Scikit-Learn GridSearchCV StatsModels pipelines](https://openscoring.io/blog/2023/10/15/sklearn_statsmodels_gridsearchcv_pipeline/)\n* [Converting Scikit-Learn H2O.ai pipelines to PMML](https://openscoring.io/blog/2023/07/17/converting_sklearn_h2o_pipeline_pmml/)\n* [Converting customized Scikit-Learn estimators to PMML](https://openscoring.io/blog/2023/05/03/converting_sklearn_subclass_pmml/)\n* [Training Scikit-Learn StatsModels pipelines](https://openscoring.io/blog/2023/03/28/sklearn_statsmodels_pipeline/)\n* [Upgrading Scikit-Learn XGBoost pipelines](https://openscoring.io/blog/2023/02/06/upgrading_sklearn_xgboost_pipeline_pmml/)\n* [Training Python-based XGBoost accelerated failure time models](https://openscoring.io/blog/2023/01/28/python_xgboost_aft_pmml/)\n* [Converting Scikit-Learn PyCaret 3 pipelines to PMML](https://openscoring.io/blog/2023/01/12/converting_sklearn_pycaret3_pipeline_pmml/)\n* [Training Scikit-Learn H2O.ai pipelines](https://openscoring.io/blog/2022/11/11/sklearn_h2o_pipeline/)\n* [One-hot encoding categorical features in Scikit-Learn XGBoost pipelines](https://openscoring.io/blog/2022/04/12/onehot_encoding_sklearn_xgboost_pipeline/)\n* [Training Scikit-Learn TF(-IDF) plus XGBoost pipelines](https://openscoring.io/blog/2021/02/27/sklearn_tf_tfidf_xgboost_pipeline/)\n* [Converting Scikit-Learn TF(-IDF) pipelines to PMML](https://openscoring.io/blog/2021/01/17/converting_sklearn_tf_tfidf_pipeline_pmml/)\n* [Converting Scikit-Learn Imbalanced-Learn pipelines to PMML](https://openscoring.io/blog/2020/10/24/converting_sklearn_imblearn_pipeline_pmml/)\n* [Converting logistic regression models to PMML](https://openscoring.io/blog/2020/01/19/converting_logistic_regression_pmml/#scikit-learn)\n* [Stacking Scikit-Learn, LightGBM and XGBoost models](https://openscoring.io/blog/2020/01/02/stacking_sklearn_lightgbm_xgboost/)\n* [Converting Scikit-Learn GridSearchCV pipelines to PMML](https://openscoring.io/blog/2019/12/25/converting_sklearn_gridsearchcv_pipeline_pmml/)\n* [Converting Scikit-Learn TPOT pipelines to PMML](https://openscoring.io/blog/2019/06/10/converting_sklearn_tpot_pipeline_pmml/)\n* [Converting Scikit-Learn LightGBM pipelines to PMML](https://openscoring.io/blog/2019/04/07/converting_sklearn_lightgbm_pipeline_pmml/)\n\nExtensions:\n\n* [Extending Scikit-Learn with feature cross-references](https://openscoring.io/blog/2023/11/25/sklearn_feature_cross_references/)\n* [Extending Scikit-Learn with UDF expression transformer](https://openscoring.io/blog/2023/03/09/sklearn_udf_expression_transformer/)\n* [Extending Scikit-Learn with CHAID models](https://openscoring.io/blog/2022/07/14/sklearn_chaid_pmml/)\n* [Extending Scikit-Learn with prediction post-processing](https://openscoring.io/blog/2022/05/06/sklearn_prediction_postprocessing/)\n* [Extending Scikit-Learn with outlier detector transformer](https://openscoring.io/blog/2021/07/16/sklearn_outlier_detector_transformer/)\n* [Extending Scikit-Learn with date and datetime features](https://openscoring.io/blog/2020/03/08/sklearn_date_datetime_pmml/)\n* [Extending Scikit-Learn with feature specifications](https://openscoring.io/blog/2020/02/23/sklearn_feature_specification_pmml/)\n* [Extending Scikit-Learn with GBDT+LR ensemble models](https://openscoring.io/blog/2019/06/19/sklearn_gbdt_lr_ensemble/)\n* [Extending Scikit-Learn with business rules model](https://openscoring.io/blog/2018/09/17/sklearn_business_rules/)\n\nMiscellaneous:\n\n* [Upgrading Scikit-Learn decision tree models](https://openscoring.io/blog/2023/12/29/upgrading_sklearn_decision_tree/)\n* [Measuring the memory consumption of Scikit-Learn models](https://openscoring.io/blog/2022/11/09/measuring_memory_sklearn/)\n* [Benchmarking Scikit-Learn against JPMML-Evaluator](https://openscoring.io/blog/2021/08/04/benchmarking_sklearn_jpmml_evaluator/)\n* [Analyzing Scikit-Learn feature importances via PMML](https://openscoring.io/blog/2021/07/11/analyzing_sklearn_feature_importances_pmml/)\n\nArchived:\n\n* [Converting Scikit-Learn to PMML](https://www.slideshare.net/VilluRuusmann/converting-scikitlearn-to-pmml)\n\n# De-installation #\n\nUninstalling:\n\n```\npip uninstall sklearn2pmml\n```\n\n# License #\n\nSkLearn2PMML is licensed under the terms and conditions of the [GNU Affero General Public License, Version 3.0](https://www.gnu.org/licenses/agpl-3.0.html).\n\nIf you would like to use SkLearn2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes SkLearn2PMML available under the terms and conditions of the [BSD 3-Clause License](https://opensource.org/licenses/BSD-3-Clause) instead.\n\n# Additional information #\n\nSkLearn2PMML is developed and maintained by Openscoring Ltd, Estonia.\n\nInterested in using [Java PMML API](https://github.com/jpmml) software in your company? Please contact [info@openscoring.io](mailto:info@openscoring.io)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpmml%2Fsklearn2pmml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjpmml%2Fsklearn2pmml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpmml%2Fsklearn2pmml/lists"}