https://github.com/ruivieira/pmml-zoo

A REST server to generate test PMML models
https://github.com/ruivieira/pmml-zoo

model-generator pmml python rest server simulation

Last synced: 3 months ago
JSON representation

A REST server to generate test PMML models

Host: GitHub
URL: https://github.com/ruivieira/pmml-zoo
Owner: ruivieira
License: agpl-3.0
Created: 2020-11-04T21:24:22.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2020-11-09T14:57:12.000Z (almost 5 years ago)
Last Synced: 2024-12-06T19:38:33.911Z (10 months ago)
Topics: model-generator, pmml, python, rest, server, simulation
Language: Python
Homepage:
Size: 124 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # pmml-zoo ![Tests](https://github.com/ruivieira/pmml-zoo/workflows/Tests/badge.svg)

![logo](docs/logo.png)

A REST server to quickly create test PMML models.

## About

`pmml-zoo` allows you to quickly generate simulated data and create **test** PMML models by sending a JSON payload to a REST server, getting back the trained model.

*What it is not*

`pmml-zoo` **doesn't** aim at creating production models, it is intended to create models for smoke, integration and unit tests.

## Usage

The best way to get started is using `pmml-zoo` container image.

```shell

$ docker pull ruivieira/pmml-zoo:0.0.1

$ docker run -i --rm -p 5000:5000 ruivieira/pmml-zoo

```

Assuming the server is running locally, the full REST API will be available at [http://0.0.0.0:5000/apidocs](http://0.0.0.0:5000/apidocs).

As an example, let's create a linear regression.

We can send the following JSON payload to `0.0.0.0:5000/model/linear-regression`:

```json

curl --request POST \

  --url http://0.0.0.0:5000/model/linear-regression \

  --header 'content-type: application/json' \

  --data '

{"data": {

    "size": 1000,

    "inputs": [

        {"name": "feature-1",

         "type": "continuous",

         "points": [[10.0, 20.0], [20.0, 40.0], [50, 35.0], [100, 16.0]]

        },

        {"name": "feature-2",

        "type": "discrete",

        "points": [[0, 3.9], [2, 4.3], [8, 2.9], [9, 7.0]]

        },

        {"name": "feature-3",

        "type": "categorical",

        "points": [["low", 2.0], ["medium", 4.0], ["high", 1.0]]

        }

        ],

        "outputs": [

                    {"name": "feature-4",

         "type": "continuous",

         "points": [[1.0, 2.0], [4.0, 7.3], [7.0, 1.0], [100, 16.0]]

        }]

    }

}' \

-o model.pmml 

```

### What is happening?

Data is simulated by first creating an empirical distribution by interpolating the provided `points`.

This empirical distribution is then sampled `size` times and that will be the variable data.

![plot](docs/plot.png)

An important note is that all variables are independent (although spurious correlation may occur).

A complete explanation is provided in the documentation.

- `size` is the size of the dataset.

- `points` is a list of data points to use to construct the interpolation, in the format `(value, weight)`. For instance a list of `[(1.0, 2.0), (2.0, 4.0)]` means that value `2.0` will more frequent.

- `name` is the feature name, which be used in the PMML model

- `type` can be one of `continuous`, `discrete` or `categorical`

- `inputs` and `outputs` have the same format, with the obvious difference implied in the name.

After sending the above payload, a response consisting of the PMML's XML is returned, which is save (in this example) to the `model.pmml` file.

## Supported models

For now, these are the supported models:

- Linear regression (`/model/linearregression`)

- Random forest classification (`/model/randomforest`)

## Contributing

Please use the [issues](https://github.com/ruivieira/pmml-zoo/issues) for any suggestions, feedback, PRs or bugs.

Thank you!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ruivieira/pmml-zoo

Awesome Lists containing this project

README