https://github.com/ruivieira/pmml-zoo
A REST server to generate test PMML models
https://github.com/ruivieira/pmml-zoo
model-generator pmml python rest server simulation
Last synced: 3 months ago
JSON representation
A REST server to generate test PMML models
- Host: GitHub
- URL: https://github.com/ruivieira/pmml-zoo
- Owner: ruivieira
- License: agpl-3.0
- Created: 2020-11-04T21:24:22.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2020-11-09T14:57:12.000Z (almost 5 years ago)
- Last Synced: 2024-12-06T19:38:33.911Z (10 months ago)
- Topics: model-generator, pmml, python, rest, server, simulation
- Language: Python
- Homepage:
- Size: 124 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pmml-zoo 
A REST server to quickly create test PMML models.
## About
`pmml-zoo` allows you to quickly generate simulated data and create **test** PMML models by sending a JSON payload to a REST server, getting back the trained model.
*What it is not*
`pmml-zoo` **doesn't** aim at creating production models, it is intended to create models for smoke, integration and unit tests.
## Usage
The best way to get started is using `pmml-zoo` container image.
```shell
$ docker pull ruivieira/pmml-zoo:0.0.1
$ docker run -i --rm -p 5000:5000 ruivieira/pmml-zoo
```Assuming the server is running locally, the full REST API will be available at [http://0.0.0.0:5000/apidocs](http://0.0.0.0:5000/apidocs).
As an example, let's create a linear regression.
We can send the following JSON payload to `0.0.0.0:5000/model/linear-regression`:```json
curl --request POST \
--url http://0.0.0.0:5000/model/linear-regression \
--header 'content-type: application/json' \
--data '
{"data": {
"size": 1000,
"inputs": [
{"name": "feature-1",
"type": "continuous",
"points": [[10.0, 20.0], [20.0, 40.0], [50, 35.0], [100, 16.0]]
},
{"name": "feature-2",
"type": "discrete",
"points": [[0, 3.9], [2, 4.3], [8, 2.9], [9, 7.0]]
},
{"name": "feature-3",
"type": "categorical",
"points": [["low", 2.0], ["medium", 4.0], ["high", 1.0]]
}
],
"outputs": [
{"name": "feature-4",
"type": "continuous",
"points": [[1.0, 2.0], [4.0, 7.3], [7.0, 1.0], [100, 16.0]]
}]
}
}' \
-o model.pmml
```### What is happening?
Data is simulated by first creating an empirical distribution by interpolating the provided `points`.
This empirical distribution is then sampled `size` times and that will be the variable data.
An important note is that all variables are independent (although spurious correlation may occur).
A complete explanation is provided in the documentation.
- `size` is the size of the dataset.
- `points` is a list of data points to use to construct the interpolation, in the format `(value, weight)`. For instance a list of `[(1.0, 2.0), (2.0, 4.0)]` means that value `2.0` will more frequent.
- `name` is the feature name, which be used in the PMML model
- `type` can be one of `continuous`, `discrete` or `categorical`
- `inputs` and `outputs` have the same format, with the obvious difference implied in the name.After sending the above payload, a response consisting of the PMML's XML is returned, which is save (in this example) to the `model.pmml` file.
## Supported models
For now, these are the supported models:
- Linear regression (`/model/linearregression`)
- Random forest classification (`/model/randomforest`)## Contributing
Please use the [issues](https://github.com/ruivieira/pmml-zoo/issues) for any suggestions, feedback, PRs or bugs.
Thank you!