{"id":15063987,"url":"https://github.com/softwareag/nyoka","last_synced_at":"2025-04-04T14:06:35.572Z","repository":{"id":40491835,"uuid":"145867505","full_name":"SoftwareAG/nyoka","owner":"SoftwareAG","description":"Nyoka is a Python library that helps to export ML models into PMML (PMML 4.4.1 Standard).","archived":false,"fork":false,"pushed_at":"2024-01-31T16:37:52.000Z","size":36040,"stargazers_count":185,"open_issues_count":6,"forks_count":44,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-03-28T13:08:20.960Z","etag":null,"topics":["lightgbm","machine-learning","nyoka","pmml","pmml-exporter","python","python-library","scikit-learn","statsmodels","xgboost"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SoftwareAG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-23T14:47:04.000Z","updated_at":"2025-02-20T10:43:43.000Z","dependencies_parsed_at":"2024-06-18T17:11:33.688Z","dependency_job_id":null,"html_url":"https://github.com/SoftwareAG/nyoka","commit_stats":{"total_commits":765,"total_committers":26,"mean_commits":"29.423076923076923","dds":0.3294117647058824,"last_synced_commit":"0e7fb677cf6668b0ac33a058a34ccb540dc8e4c1"},"previous_names":[],"tags_count":46,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftwareAG%2Fnyoka","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftwareAG%2Fnyoka/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftwareAG%2Fnyoka/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftwareAG%2Fnyoka/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SoftwareAG","download_url":"https://codeload.github.com/SoftwareAG/nyoka/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247190250,"owners_count":20898702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lightgbm","machine-learning","nyoka","pmml","pmml-exporter","python","python-library","scikit-learn","statsmodels","xgboost"],"created_at":"2024-09-25T00:09:54.206Z","updated_at":"2025-04-04T14:06:35.552Z","avatar_url":"https://github.com/SoftwareAG.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nyoka\n\n[![Test Master Branch](https://github.com/SoftwareAG/nyoka/actions/workflows/test-master.yml/badge.svg?branch=master\u0026event=push)](https://github.com/SoftwareAG/nyoka/actions/workflows/test-master.yml)\n[![PyPI version](https://badge.fury.io/py/nyoka.svg)](https://pypi.org/project/nyoka/)\n[![codecov](https://codecov.io/gh/SoftwareAG/nyoka/branch/master/graph/badge.svg)](https://codecov.io/gh/SoftwareAG/nyoka)\n[![license](https://img.shields.io/github/license/softwareag/nyoka.svg)](https://github.com/softwareag/nyoka/blob/master/LICENSE)\n[![Python](https://img.shields.io/badge/python-3.6%2B-blue)](https://pypi.org/project/nyoka/)\n\u003cimg  src=\"https://raw.githubusercontent.com/softwareag/nyoka/master/docs/nyoka_logo.PNG\"  alt=\"nyoka_logo\"  height=\"200\"  style=\"float:right\"/\u003e\n\n## Overview\n\nNyoka is a Python library for comprehensive support of the latest PMML (PMML 4.4) standard. Using Nyoka, Data Scientists can export a large number of Machine Learning models from popular Python frameworks into PMML by either using any of the numerous included ready-to-use exporters or by creating their own exporter for specialized/individual model types by simply calling a sequence of constructors.\n\nBesides about 500 Python classes which each cover a PMML tag and all constructor parameters/attributes as defined in the standard, Nyoka also provides an increasing number of convenience classes and functions that make the Data Scientist’s life easier for example by reading or writing any PMML file in one line of code from within your favorite Python environment.\n\nNyoka comes to you with the complete source code in Python, extended HTML documentation for the classes/functions, and a growing number of Jupyter Notebook tutorials that help you familiarize yourself with the way Nyoka supports you in using PMML as your favorite Data Science transport file format.\n\nRead the documentation at **[Nyoka Documentation](https://softwareag.github.io/nyoka/)**.\n\n## List of libraries and models supported by Nyoka :\n\n### Scikit-Learn (version \u003c= 1.3.0):\n\n#### Models -\n\n*  [`linear_model.LinearRegression`](https://scikit-learn.org/0.20/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)\n*  [`linear_model.LogisticRegression`](https://scikit-learn.org/0.20/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)\n*  [`linear_model.RidgeClassifier`](https://scikit-learn.org/0.20/modules/generated/sklearn.linear_model.RidgeClassifier.html#sklearn.linear_model.RidgeClassifier)\n*  [`linear_model.SGDClassifier`](https://scikit-learn.org/0.20/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier)\n*  [`discriminant_analysis.LinearDiscriminantAnalysis`](https://scikit-learn.org/0.20/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html#sklearn.discriminant_analysis.LinearDiscriminantAnalysis)\n*  [`tree.DecisionTreeClassifier`](https://scikit-learn.org/0.20/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)\n*  [`tree.DecisionTreeRegressor`](https://scikit-learn.org/0.20/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor)\n*  [`svm.SVC`](https://scikit-learn.org/0.20/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)\n*  [`svm.SVR`](https://scikit-learn.org/0.20/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR)\n*  [`svm.LinearSVC`](https://scikit-learn.org/0.20/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC)\n*  [`svm.LinearSVR`](https://scikit-learn.org/0.20/modules/generated/sklearn.svm.LinearSVR.html#sklearn.svm.LinearSVR)\n*  [`svm.OneClassSVM`](https://scikit-learn.org/0.20/modules/generated/sklearn.svm.OneClassSVM.html#sklearn.svm.OneClassSVM)\n*  [`naive_bayes.GaussianNB`](https://scikit-learn.org/0.20/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB)\n*  [`ensemble.RandomForestRegressor`](https://scikit-learn.org/0.20/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor)\n*  [`ensemble.RandomForestClassifier`](https://scikit-learn.org/0.20/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)\n*  [`ensemble.GradientBoostingRegressor`](https://scikit-learn.org/0.20/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html#sklearn.ensemble.GradientBoostingRegressor)\n*  [`ensemble.GradientBoostingClassifier`](https://scikit-learn.org/0.20/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier)\n*  [`ensemble.IsolationForest`](https://scikit-learn.org/0.20/modules/generated/sklearn.ensemble.IsolationForest.html#sklearn.ensemble.IsolationForest)\n*  [`neural_network.MLPClassifier`](https://scikit-learn.org/0.20/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier)\n*  [`neural_network.MLPRegressor`](https://scikit-learn.org/0.20/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor)\n*  [`neighbors.KNeighborsClassifier`](https://scikit-learn.org/0.20/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier)\n*  [`neighbors.KNeighborsRegressor` ](https://scikit-learn.org/0.20/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor)\n*  [`cluster.KMeans`](https://scikit-learn.org/0.20/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)\n\n\n#### Pre-Processing -\n\n\n*  [`preprocessing.StandardScaler`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler)\n*  [`preprocessing.MinMaxScaler`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler)\n*  [`preprocessing.RobustScaler`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler)\n*  [`preprocessing.MaxAbsScaler`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler)\n*  [`preprocessing.LabelEncoder`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder)\n*  [`preprocessing.Imputer`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.Imputer.html#sklearn.preprocessing.Imputer)\n*  [`preprocessing.Binarizer`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.Binarizer.html#sklearn.preprocessing.Binarizer)\n*  [`preprocessing.PolynomialFeatures`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures)\n*  [`preprocessing.LabelBinarizer`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.LabelBinarizer.html#sklearn.preprocessing.LabelBinarizer)\n*  [`preprocessing.OneHotEncoder`](https://scikit-learn.org/0.20/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder)\n*  [`feature_extraction.text.TfidfVectorizer`](https://scikit-learn.org/0.20/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer)\n*  [`feature_extraction.text.CountVectorizer`](https://scikit-learn.org/0.20/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn.feature_extraction.text.CountVectorizer)\n*  [`decomposition.PCA`](https://scikit-learn.org/0.20/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA)\n*  [`sklearn_pandas.CategoricalImputer`](https://github.com/scikit-learn-contrib/sklearn-pandas/blob/master/sklearn_pandas/categorical_imputer.py#L21) ( From _[sklearn_pandas](https://github.com/scikit-learn-contrib/sklearn-pandas)_ library )\n  \n\n### LightGBM:\n\n  \n*  [`LGBMClassifier`](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html)\n*  [`LGBMRegressor`](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html)\n\n\n### XGBoost (version \u003c= 1.7.6):\n\n\n*  [`XGBClassifier`](https://xgboost.readthedocs.io/en/release_1.5.0/python/python_api.html#module-xgboost.sklearn)\n*  [`XGBRegressor`](https://xgboost.readthedocs.io/en/release_1.5.0/python/python_api.html#module-xgboost.sklearn)\n\n\n### Statsmodels (version \u003c= 0.14.0):\n\n\n*  [`tsa.arima_model.ARIMA`](https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/arima_model.py#L1026)\n*  [`tsa.arima.model.ARIMA`](https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/arima/model.py#L26) _(Extension of SARIMAX)_\n*  [`tsa.statespace.SARIMAX`](https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/statespace/sarimax.py#L31)\n*  [`tsa.statespace.VARMAX`](https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/statespace/varmax.py#L33)\n*  [`tsa.statespace.ExponentialSmoothing`](https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/statespace/exponential_smoothing.py#L31)\n  \n\n## Prerequisites\n\n* Python \u003e= 3.6\n\n## Dependencies\n\nnyoka requires:\n\n* lxml\n \n## Installation\n\nYou can install nyoka using: \n\n```\npip install --upgrade nyoka\n```\n## Usage\n\n\nNyoka contains seperate exporters for each library, e.g., scikit-learn, keras, xgboost etc.\n\n\n| library | exporter |\n|--|--|\n| **scikit-learn** | _skl_to_pmml_ |\n| **xgboost** | _xgboost_to_pmml_ |\n| **lightgbm** | _lgbm_to_pmml_ |\n| **statsmodels** | _StatsmodelsToPmml \u0026 ExponentialSmoothingToPmml_ |\n\n#### Note - The support of keras is until 4.4.0 release of Nyoka.\n\nThe main module of __Nyoka__ is `nyoka`. To use it for your model, you need to import the specific exporter from nyoka as -\n\n```python\nfrom nyoka import skl_to_pmml, lgb_to_pmml #... so on\n```\n\n#### Note - If scikit-learn, xgboost and lightgbm model is used then the model should be used inside sklearn's Pipeline.\n\nThe workflow is as follows (For example, a Decision Tree Classifier with StandardScaler) -\n\n* Create scikit-learn's `Pipeline` object and populate it with any pre-processing steps and the model object. \n\t```python\n\tfrom sklearn.pipeline import Pipeline\n\tfrom sklearn.tree import DecisionTreeClassifier\n\tfrom sklearn.preprocessing import StandardScaler\n\tpipeline_obj = Pipeline([\n\t\t\t(\"scaler\",StandardScaler()),\n\t\t\t(\"model\",DecisionTreeClassifier())\n\t])\n\t```\n\n* Call `Pipeline.fit(X,y)` method to train the model.\n\t```python\n\tfrom sklearn.dataset import load_iris\n\tiris_data = load_iris()\n\tX = iris_data.data\n\ty = iris_data.target\n\tfeatures = iris_data.feature_names\n\tpipeline_obj.fit(X,y)\n\t```\n  \n* Use the specific exporter and pass the pipeline object, feature names of the training dataset, target name and expected name of the PMML to the exporter function. If target name is not given default value `target` is used. Similarly, for pmml name, default value `from_sklearn.pmml`/`from_xgboost.pmml`/`from_lighgbm.pmml` is used.\n\t```python\n\tfrom nyoka import skl_to_pmml\n\tskl_to_pmml(pipeline=pipeline_obj,col_names=features,target_name=\"species\",pmml_f_name=\"decision_tree.pmml\")\n\t```\n\n\n#### For Statsmodels, pipeline is not required. The fitted model needs to be passed to the exporter.\n\n```python\nimport pandas as pd\nfrom statsmodels.tsa.arima_model import ARIMA\nfrom nyoka import StatsmodelsToPmml\nsales_data = pd.read_csv('sales-cars.csv', index_col=0, parse_dates = True)\nmodel = ARIMA(sales_data, order = (4, 1, 2))\nresult = model.fit()\nStatsmodelsToPmml(result,\"Sales_cars_ARIMA.pmml\")\n```\n\n## Examples \n\nExample jupyter notebooks can be found in [`nyoka/examples`](https://github.com/softwareag/nyoka/tree/master/examples). These files contain code to showcase how to use different exporters.\n\n* Exporting `scikit-learn` models into PMML\n\t* [SVM](https://github.com/softwareag/nyoka/blob/master/examples/skl/1_SVM.ipynb)\n\t* [KNeighbors](https://github.com/softwareag/nyoka/blob/master/examples/skl/2_K-NN_With_Scaling.ipynb)\n\t* [Random Forest](https://github.com/softwareag/nyoka/blob/master/examples/skl/3_RF_With_pre-processing.ipynb)\n\t* [Gardient Boosting](https://github.com/softwareag/nyoka/blob/master/examples/skl/4_GB_With_pre-processing.ipynb)\n\t* [Decision Tree](https://github.com/softwareag/nyoka/blob/master/examples/skl/5_Decision_Tree_With_Tf-Idf.ipynb)\n\t* [Isolation Forest](https://github.com/softwareag/nyoka/blob/master/examples/skl/6_IsolationForest_model_to_PMML.ipynb)\n\t* [OneClassSVM](https://github.com/softwareag/nyoka/blob/master/examples/skl/7_OneClassSVM_Model_to_PMML.ipynb)\n\t* [LinearSVC](https://github.com/softwareag/nyoka/blob/master/examples/skl/8_LinearSVC_with_TfidfVectorizer.ipynb)\n\n* Exporting `XGBoost` model into PMML\n\t* [XGBoost 1](https://github.com/softwareag/nyoka/blob/master/examples/xgboost/1_xgboost.ipynb)\n\t* [XGBoost 2](https://github.com/softwareag/nyoka/blob/master/examples/xgboost/2_xgboost_With_Scaling.ipynb)\n\t* [XGBoost 3](https://github.com/softwareag/nyoka/blob/master/examples/xgboost/3_xgboost_With_PreProcess%20.ipynb)\n\n* Exporting `LightGBM` model into PMML\n\t* [LightGBM 1](https://github.com/softwareag/nyoka/blob/master/examples/lgbm/1_lgbm.ipynb)\n\t* [LightGBM 2](https://github.com/softwareag/nyoka/blob/master/examples/lgbm/2_lgbm_With_Scaling.ipynb)\n\t* [LightGBM 3](https://github.com/softwareag/nyoka/blob/master/examples/lgbm/3_lgbm_With_PreProcess%20.ipynb)\n\n* Exporting `statsmodels` model into PMML\n\t* [Non-Seasonal ARIMA](https://github.com/softwareag/nyoka/blob/master/examples/statsmodels/arima/Non-Seasonal%20ARIMA.ipynb)\n\t* [Seasonal ARIMA](https://github.com/softwareag/nyoka/blob/master/examples/statsmodels/arima/Seasonal%20ARIMA.ipynb)\n\t* [Vector ARMA (for multi-variate time series)](https://github.com/softwareag/nyoka/blob/master/examples/statsmodels/arima/VARMAX.ipynb)\n\t* [Exponential Smoothing](https://github.com/softwareag/nyoka/blob/master/examples/statsmodels/exponential_smoothing/exponential_smoothing.ipynb)\n  \n## Nyoka Submodules\n\nNyoka contains one submodule called `preprocessing`. This module contains preprocessing classes implemented by Nyoka. Currently there is only one preprocessing class, which is `Lag`.\n\n#### What is Lag? When to use it?\n\n\n\u003eLag is a preprocessing class implemented by Nyoka. When used inside scikit-learn's pipeline, it simply applies an `aggregation` function for the given features of the dataset by combining `value` number of previous records. It takes two arguments- aggregation and value.\n\n\u003e\n\n\u003e The valid `aggregation` functions are -\n\u003e \"min\", \"max\", \"sum\", \"avg\", \"median\", \"product\" and \"stddev\".\n\n\nTo use __Lag__ -\n\n* Import it from nyoka -\n  ```python\n\tfrom nyoka.preprocessing import Lag\n  ```\n* Create an instance of Lag - \n  ```python\n\tlag_obj = Lag(aggregation=\"sum\", value=5)\n\t'''\n\tThis means taking previous 5 values and perform `sum`. When used inside pipeline, this will be applied to all the columns.\n\tIf used inside DataFrameMapper, the it will be applied to only those columns which are inside DataFrameMapper.\n\t'''\n  ```\n* Use this object inside scikit-learn's pipeline to train.\n  ```python\n\tfrom sklearn.pipeline import Pipeline\n\tfrom sklearn.tree import DecisionTreeClassifier\n\tfrom nyoka.preprocessing import Lag\n\tpipeline_obj = Pipeline([\n\t\t(\"lag\",Lag(aggregation=\"sum\",value=5)),\n\t\t(\"model\",DecisionTreeClassifier())\n\t])\n  ```\n\n## Uninstallation\n\n```\npip uninstall nyoka\n```\n\n## Support\n\nYou can ask questions at:\n\n*  [Stack Overflow](https://stackoverflow.com) by tagging your questions with #pmml, #nyoka\n* You can also post bug reports in [GitHub issues](https://github.com/softwareag/nyoka/issues)\n\n-----\n\nPlease note that this project is released with a [Contributor Code of\nConduct](https://github.com/SoftwareAG/nyoka/blob/master/.github/CODE_OF_CONDUCT.md).\nBy contributing to this project, you agree to abide by its terms.\n\nThese tools are provided as-is and without warranty or support. They do\nnot constitute part of the Software AG product suite. Users are free to\nuse, fork and modify them, subject to the license agreement. While\nSoftware AG welcomes contributions, we cannot guarantee to include every\ncontribution in the master project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoftwareag%2Fnyoka","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoftwareag%2Fnyoka","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoftwareag%2Fnyoka/lists"}