{"id":18144854,"url":"https://github.com/iamdecode/sklearn-pmml-model","last_synced_at":"2025-04-04T09:09:59.410Z","repository":{"id":40063921,"uuid":"136648763","full_name":"iamDecode/sklearn-pmml-model","owner":"iamDecode","description":"A library to parse and convert PMML models into Scikit-learn estimators.","archived":false,"fork":false,"pushed_at":"2025-02-03T13:23:06.000Z","size":617,"stargazers_count":77,"open_issues_count":9,"forks_count":15,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-04T09:09:39.770Z","etag":null,"topics":["machine-learning","pmml","scikit-learn","sklearn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iamDecode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-08T17:38:21.000Z","updated_at":"2025-02-02T11:21:07.000Z","dependencies_parsed_at":"2023-01-25T16:00:07.169Z","dependency_job_id":"625481b3-6502-414b-b88c-6afd8c84cdb7","html_url":"https://github.com/iamDecode/sklearn-pmml-model","commit_stats":{"total_commits":222,"total_committers":5,"mean_commits":44.4,"dds":"0.018018018018018056","last_synced_commit":"62182ae751decbf38577e469b6ef7992f398350e"},"previous_names":[],"tags_count":31,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamDecode%2Fsklearn-pmml-model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamDecode%2Fsklearn-pmml-model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamDecode%2Fsklearn-pmml-model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamDecode%2Fsklearn-pmml-model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iamDecode","download_url":"https://codeload.github.com/iamDecode/sklearn-pmml-model/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247149505,"owners_count":20891954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","pmml","scikit-learn","sklearn"],"created_at":"2024-11-01T20:06:28.292Z","updated_at":"2025-04-04T09:09:59.391Z","avatar_url":"https://github.com/iamDecode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"https://user-images.githubusercontent.com/1223300/41346080-c2c910a0-6f05-11e8-89e9-71a72bb9543f.png\" width=\"300\"\u003e\n\n# sklearn-pmml-model\n\n[![PyPI version](https://badge.fury.io/py/sklearn-pmml-model.svg)](https://badge.fury.io/py/sklearn-pmml-model)\n[![codecov](https://codecov.io/gh/iamDecode/sklearn-pmml-model/branch/master/graph/badge.svg?token=CGbbgziGwn)](https://codecov.io/gh/iamDecode/sklearn-pmml-model)\n[![CircleCI](https://circleci.com/gh/iamDecode/sklearn-pmml-model.svg?style=shield)](https://circleci.com/gh/iamDecode/sklearn-pmml-model)\n[![ReadTheDocs](https://readthedocs.org/projects/sklearn-pmml-model/badge/?version=latest\u0026style=flat)](https://sklearn-pmml-model.readthedocs.io/en/latest/)\n\nA library to effortlessly import models trained on different platforms and with programming languages into scikit-learn in Python. First export your model to [PMML](http://dmg.org/pmml/v4-3/GeneralStructure.html) (widely supported). Next, load the exported PMML file with this library, and use the class as any other scikit-learn estimator.\n\n\n## Installation\n\nThe easiest way is to use pip:\n\n```\n$ pip install sklearn-pmml-model\n```\n\n## Status\nThe library currently supports the following models:\n\n| Model                                                  | Classification | Regression | Categorical features |\n|--------------------------------------------------------|----------------|------------|----------------------|\n| [Decision Trees](sklearn_pmml_model/tree)              | ✅             | ✅         | ✅\u003csup\u003e1\u003c/sup\u003e        |\n| [Random Forests](sklearn_pmml_model/ensemble)          | ✅             | ✅         | ✅\u003csup\u003e1\u003c/sup\u003e        |\n| [Gradient Boosting](sklearn_pmml_model/ensemble)       | ✅             | ✅         | ✅\u003csup\u003e1\u003c/sup\u003e        |\n| [Linear Regression](sklearn_pmml_model/linear_model)   | ✅             | ✅         | ✅\u003csup\u003e3\u003c/sup\u003e        |\n| [Ridge](sklearn_pmml_model/linear_model)               | ✅\u003csup\u003e2\u003c/sup\u003e | ✅         | ✅\u003csup\u003e3\u003c/sup\u003e        |\n| [Lasso](sklearn_pmml_model/linear_model)               | ✅\u003csup\u003e2\u003c/sup\u003e | ✅         | ✅\u003csup\u003e3\u003c/sup\u003e        |\n| [ElasticNet](sklearn_pmml_model/linear_model)          | ✅\u003csup\u003e2\u003c/sup\u003e | ✅         | ✅\u003csup\u003e3\u003c/sup\u003e        |\n| [Gaussian Naive Bayes](sklearn_pmml_model/naive_bayes) | ✅             |            | ✅\u003csup\u003e3\u003c/sup\u003e        |\n| [Support Vector Machines](sklearn_pmml_model/svm)      | ✅             | ✅         | ✅\u003csup\u003e3\u003c/sup\u003e        |\n| [Nearest Neighbors](sklearn_pmml_model/neighbors)      | ✅             | ✅         |                      |\n| [Neural Networks](sklearn_pmml_model/neural_network)   | ✅             | ✅         |                      |\n\n\u003csub\u003e\u003csup\u003e1\u003c/sup\u003e Categorical feature support using slightly modified internals, based on [scikit-learn#12866](https://github.com/scikit-learn/scikit-learn/pull/12866).\u003c/sub\u003e\n\n\u003csub\u003e\u003csup\u003e2\u003c/sup\u003e These models differ only in training characteristics, the resulting model is of the same form. Classification is supported using `PMMLLogisticRegression` for regression models and `PMMLRidgeClassifier` for general regression models.\u003c/sub\u003e\n\n\u003csub\u003e\u003csup\u003e3\u003c/sup\u003e By one-hot encoding categorical features automatically.\u003c/sub\u003e\n  \n## Example\nA minimal working example (using [this PMML file](https://github.com/iamDecode/sklearn-pmml-model/blob/master/models/randomForest.pmml)) is shown below:\n\n```python\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nimport pandas as pd\nimport numpy as np\nfrom sklearn_pmml_model.ensemble import PMMLForestClassifier\nfrom sklearn_pmml_model.auto_detect import auto_detect_estimator\n\n# Prepare the data\niris = load_iris()\nX = pd.DataFrame(iris.data)\nX.columns = np.array(iris.feature_names)\ny = pd.Series(np.array(iris.target_names)[iris.target])\ny.name = \"Class\"\nXtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.33, random_state=123)\n\n# Specify the model type for the least overhead...\n#clf = PMMLForestClassifier(pmml=\"models/randomForest.pmml\")\n\n# ...or simply let the library auto-detect the model type\nclf = auto_detect_estimator(pmml=\"models/randomForest.pmml\")\n\n# Use the model as any other scikit-learn model\nclf.predict(Xte)\nclf.score(Xte, yte)\n```\n\nMore examples can be found in the subsequent packages: [tree](sklearn_pmml_model/tree), [ensemble](sklearn_pmml_model/ensemble), [linear_model](sklearn_pmml_model/linear_model), [naive_bayes](sklearn_pmml_model/naive_bayes), [svm](sklearn_pmml_model/svm), [neighbors](sklearn_pmml_model/neighbors) and [neural_network](sklearn_pmml_model/neural_network).\n\n## Benchmark\n\nDepending on the data set and model, `sklearn-pmml-model` is between 1 and 10 times faster than competing libraries, by leveraging the optimization and industry-tested robustness of `sklearn`. Source code for this benchmark can be found in the corresponding [jupyter notebook](benchmark.ipynb). \n\n\n### Running times (load + predict, in seconds)\n\n|               |                     | Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting |\n|---------------|---------------------|--------------|-------------|---------------|---------------|-------------------|\n| Wine          | `PyPMML`            | 0.013038     | 0.005674    | 0.005587      | 0.032734      | 0.034649          |\n|               | `sklearn-pmml-model`| 0.00404      | 0.004059    | 0.000964      | 0.030008      | 0.032949          |\n| Breast cancer | `PyPMML`            | 0.009838     | 0.01153     | 0.009367      | 0.058941      | 0.031196          |\n|               | `sklearn-pmml-model`| 0.010749     | 0.008481    | 0.001106      | 0.044021      | 0.013411          |\n\n### Improvement\n\n|               |                    | Linear model | Naive Bayes | Decision tree | Random Forest | Gradient boosting |\n|---------------|--------------------|--------------|-------------|---------------|---------------|-------------------|\n| Wine          | Improvement        | 3.23×        | 1.40×       | 5.80×         | 1.09×         | 1.05×             |\n| Breast cancer | Improvement        | 0.91×        | 1.36×       | **8.47×**     | 1.34×         | 2.33×             |\n\n*Benchmark ran on: 24 september 2024 17:19*\n\n## Development\n\n### Prerequisites\n\nTests can be run using Py.test. Grab a local copy of the source:\n\n```\n$ git clone http://github.com/iamDecode/sklearn-pmml-model\n$ cd sklearn-pmml-model\n```\n\ncreate a virtual environment and activating it:\n```\n$ python3 -m venv venv\n$ source venv/bin/activate\n```\n\nand install the dependencies:\n\n```\n$ pip install -r requirements.txt\n```\n\nThe final step is to build the Cython extensions:\n\n```\n$ python setup.py build_ext --inplace\n```\n\n### Testing\n\nYou can execute tests with py.test by running:\n```\n$ python setup.py pytest\n```\n\n## Contributing\n\nFeel free to make a contribution. Please read [CONTRIBUTING.md](CONTRIBUTING.md) for more details.\n\n## License\n\nThis project is licensed under the BSD 2-Clause License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamdecode%2Fsklearn-pmml-model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiamdecode%2Fsklearn-pmml-model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamdecode%2Fsklearn-pmml-model/lists"}