{"id":18446518,"url":"https://github.com/ovh/serving-runtime","last_synced_at":"2025-04-08T00:31:54.989Z","repository":{"id":37178352,"uuid":"259639325","full_name":"ovh/serving-runtime","owner":"ovh","description":"Exposes a serialized machine learning model through a HTTP API.","archived":false,"fork":false,"pushed_at":"2023-06-11T22:27:01.000Z","size":1513,"stargazers_count":12,"open_issues_count":12,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-23T03:11:37.753Z","etag":null,"topics":["hdf5","inference","machine-learning","onnx","serving","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ovh.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-28T13:05:53.000Z","updated_at":"2024-02-29T04:54:27.000Z","dependencies_parsed_at":"2023-01-20T17:45:56.342Z","dependency_job_id":null,"html_url":"https://github.com/ovh/serving-runtime","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ovh%2Fserving-runtime","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ovh%2Fserving-runtime/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ovh%2Fserving-runtime/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ovh%2Fserving-runtime/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ovh","download_url":"https://codeload.github.com/ovh/serving-runtime/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247755388,"owners_count":20990616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hdf5","inference","machine-learning","onnx","serving","tensorflow"],"created_at":"2024-11-06T07:09:35.859Z","updated_at":"2025-04-08T00:31:52.507Z","avatar_url":"https://github.com/ovh.png","language":"Java","readme":"# Serving Runtime\nExposes a serialized machine learning model through a HTTP API written in Java.\n\n![TUs \u0026 TIs](https://github.com/ovh/serving-runtime/workflows/TUs%20\u0026%20TIs/badge.svg?branch=master) [![Maintenance](https://img.shields.io/maintenance/yes/2020.svg)]() [![Chat on gitter](https://img.shields.io/gitter/room/ovh/ai.svg)](https://gitter.im/ovh/ai) \n \n **This project is under active development**\n \n## Description\n\nThe purpose of this project is to expose a generic HTTP API from a machine learning serialized models.\n\nSupported serialized models are :\n* [ONNX][ONNX] `1.5`\n* TensorFlow `\u003c=1.15` SavedModel or HDF5\n* [HuggingFace Tokenizer](https://github.com/huggingface/tokenizers)\n \n## Prerequisites\n\n* Maven for compiling the project\n* `Java 11` for running the project\n\n`HDF5` serialization format is supported through a conversion into `SavedModel` format. That conversion relies on following dependencies :\n\n* Python `3.7`\n* TensorFlow `\u003c=1.15` (`pip install tensorflow`)\n\nFor HuggingFace tokenizer :\n* Cargo (Rust stable)\n\n### HDF5 support (Optional)\n\u003caside class=\"notice\"\u003e\n\u003cp\u003eIf you use the API from the docker image this step is not necessary as it will be built within the image.\u003c/p\u003e\n\u003c/aside\u003e\n\nThe Tensorflow module requires the support of HDF5 files through the creation of an executable `h5_converter` wich exports the model from HDF5 file to a Tensorflow SavedModel (`.pb`).  \n\nTo generate the converter simply use the initialize_tensorflow goal of the `Makefile`:\n```bash\nmake initialize_tensorflow\n```\nThe generated executable can be found here: `evaluator-tensorflow/h5_converter/dist/h5_converter`\n\n### HuggingFace (Optional)\n\nTo build the Java binding use the initialize_huggingface goal of the `Makefile`:\n```bash\nmake initialize_huggingface\n```\n\n### Torch (Optional)\n\nTo install libtorch use initialize_torch goal of the `Makefile`:\n```bash\nmake initialize_torch\n```\n\n[Convert pyTorch model and more](evaluator-torch/README.md)\n\n## Build \u0026 Launch the project locally\nSeveral profiles are available depending on the support you require for the built project.\n- `full` which includes both Tensorflow and ONNX, requires the [ONNX support](#onnx-support-optional), [HDF5 support](#hdf5-support-optional) and [Torch support](#torch-optional).\n- `tensorflow` which only includes Tensorflow, requires the [HDF5 support](#hdf5-support-optional)\n- `onnx` which only includes ONNX, requires the [ONNX support](#onnx-support-optional). \n- `torch` which only includes Torch, requires the [Torch support](#torch-optional). \n\nSet your desired profile:\n```bash\nexport MAVEN_PROFILE=\u003cyour-profile\u003e\n```\nIf not specified the default profile is set to `full`.\n\n### Launch tests\n\n```bash\nmake test MAVEN_PROFILE=$MAVEN_PROFILE\n```\n\n### Building JAR\n\n```bash\nmake build MAVEN_PROFILE=$MAVEN_PROFILE\n```\n\nThe JAR could then be found in `api/target/api-*.jar`\n\n### Launching JAR\n\nIn the following command, replace `\u003cjar-path\u003e` with the path on your compiled jar and `\u003cmodel-path\u003e` with the directory where to find your serialized model.\n\n```bash\njava -Dfiles.path=\u003cmodel-path\u003e -jar \u003cjar-path\u003e\n```\n\nIf you wish to load a model from a HDF5 model you will need to specify the path to the executable generated in [HDF5 support](#hdf5-support-optional).\n\n```bash\njava -Dfiles.path=\u003cmodel-path\u003e -Devaluator.tensorflow.h5_converter.path=\u003cpath-to-h5-converter\u003e -jar \u003cjar-path\u003e\n```\n\nInside the `\u003cmodel-path\u003e` it will look for the first file ending with :\n* `.onnx` for an ONNX model\n* `.pb` for a TensorFlow SavedModel\n* `.h5` for a HDF5 model\n\n#### Available parameters\n\nOn the launch command you can also specify the following parameters :\n\n* `-Dserver.port` : the host port to request for the http server\n* `-Dswagger.title` : The title that will be dispayed on the swagger\n* `-Dswagger.description` : The description that will be displayed on the swagger\n\n## Build \u0026 Launch the project using docker\n\n### Building the docker container\n\n```bash\nmake docker-build-api MAVEN_PROFILE=$MAVEN_PROFILE\n```\n\nIt will build the docker image `serving-runtime-$MAVEN_PROFILE:latest`\n\n### Running the docker container\n\nIn the following command, replace `\u003cmodel-path\u003e` with the absolute path on directory where to find your serialized model.\n\n```bash\ndocker run --rm -it -p 8080:8080 -v \u003cmodel-path\u003e:/deployments/models serving-runtime-$MAVEN_PROFILE:latest\n```\n\n## Using the API\n\nBy default the API will be running on `http://localhost:8080`. Reaching this URL in your browser will display the SwaggerUI describing the API for your model.\n\nThere is 2 routes available in each models :\n\n* `/describe` : Describe your model (what are the inputs, outputs and transformations)\n* `/eval` : Send expected inputs on model and receive expected outputs results\n\n### Describe the models inputs and outputs\n\nEach serialized model takes a list of named tensors as **inputs** and also returns a list of named tensors as **outputs**. \n\nA **named tensors** is a **N-Dimensional array** with :\n\n* A identifier name. Example: `my-tensor-name`\n* A data type. Example: `integer` or `double` or `string`\n* A shape. Example: `(5)` for a vector of length **5**, `(3, 2)` for a matrix which first dimension is of size **3** and second dimension is of size **2**. Etc.\n\nYou can get access to the model inputs and outputs by calling the http `GET` method on `/describe` path of the model.\n\n#### Example of a describe query with curl\n\n```bash\ncurl \\\n    -X GET \\\n    http://\u003cyour-model-url\u003e/describe\n```\n\n#### Example of a describe response\n\nYou will get a **JSON** object describing the list of **inputs tensors** that are needed to query your model as well as the list of **outputs tensors** that will be returning.\n\n```json\n{\n    \"inputs\": [\n        {\n            \"name\": \"sepal_length\",\n            \"type\": \"float\",\n            \"shape\": [-1]\n        },\n        {\n            \"name\": \"sepal_width\",\n            \"type\": \"float\",\n            \"shape\": [-1]\n        },\n        {\n            \"name\": \"petal_length\",\n            \"type\": \"float\",\n            \"shape\": [-1]\n        },\n        {\n            \"name\": \"petal_width\",\n            \"type\": \"float\",\n            \"shape\": [-1]\n        }\n    ],\n    \"outputs\": [\n        {\n            \"name\": \"output_label\",\n            \"type\": \"long\",\n            \"shape\": [-1]\n        },\n        {\n            \"name\": \"output_probability\",\n            \"type\": \"float\",\n            \"shape\": [-1, 2]\n        }\n    ]\n}\n```\n\nIn this example, the deployed model is waiting for 4 tensors as inputs :\n\n* `sepal_length` of shape `(-1)` (i.e. a vector of any size)\n* `sepal_width` of shape `(-1)` (i.e. a vector of any size)\n* `petal_length` of shape `(-1)` (i.e. a vector of any size)\n* `petal_width` of shape `(-1)` (i.e. a vector of any size)\n\nIt will answer a response with 2 tensors as outputs :\n\n* `output_label` of shape `(-1)` (i.e. a vector of any size)\n* `output_probability` of shape `(-1, 2)` (i.e. a matrix which first dimension is of any size and which second dimension is of size 2)\n\n### Query the model\n\nOnce you know what kind of **input tensors** are needed by the model, just fill a correct **body** on your **HTTP query** with your wanted representation of tensor (see below) and send it to the model with a `POST` method on the path `/eval`.\n\nTwo attached headers are available for your query:\n\n* The [Content-Type][Content Type Header] header indicating the [media type][Media Type] of your input tensors data contained in your body message.\n* The (optional) [Accept][Accept Header] header indicating what kind of [media type][Media Type] your want to receive for output tensors in the response body. The default `Accept` header if you don't provide one will be `application/json`.\n\n### Supported Content-Type headers\n\n* `application/json` : A json document which **key** are the **input tensors** names and **values** are the n-dimensional json arrays matching your tensors.\n\n* `image/png` : A bytes content which representation is a **png** encoded image.\n* `image/jpeg` : A bytes content which representation is a **jpeg** encoded image.\n\n\u003e\n\u003e `image/png` and `image/jpeg` are only available for models taking a single tensor as input. That tensor's shape should also be compatible with an image representation.\n\u003e\n\n* `multipart/form-data` : A multipart body, each part of which is named by an **input tensor**.\n\n\u003e \n\u003e Each part (i.e. tensor) in the **multipart** should have its own **Content-Type**\n\u003e \n\n### Supported Accept headers\n\n* `application/json` : A JSON document which **key** is the **output tensors** names and **values** are the n-dimensional json arrays matching your tensors.\n\n* `image/png` : A bytes content which representation is a **png** encoded image.\n* `image/jpeg` : A bytes content which representation is a **jpeg** encoded image.\n\n\u003e\n\u003e `image/png` and `image/jpeg` are only available for models returning a single tensor as output. That tensor's shape should also be compatible with an image representation.\n\u003e\n\n* `text/html` : A HTML document displaying the **output tensors** representation.\n* `multipart/form-data` : A multipart body, each part of which is named by an **output tensor** and the content is the tensor json representation.\n\n\u003e\n\u003e If you want some of the output tensors in `multipart/form-data` and `text/html` header to be interpreted as an image, you can specify it as a parameter in the header.\n\u003e\n\u003e **Example** : The header `text/html; tensor_1=image/png; tensor_2=image/png` returns the global response as HTML content. Inside the HTML page, `tensor_1` and `tensor_2` are displayed as **png** images.\n\u003e\n\n### Tensor interpretable as image\n\nFor a tensor to be interpretable as image raw data, it should be of a compatible shape in your exported model. Here are the supported ones :\n\n* `(x, y, z, 1)` : Batch of **x** grayscale images with **y** pixels height and **z** pixels width \n* `(x, 1, y, z)` : Batch of **x** grayscale images with **y** pixels height and **z** pixels width\n* `(x, y, z, 3)` : Batch of **x** RGB images with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.\n* `(x, 3, y, z)` : Batch of **x** RGB images with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.\n* `(y, z, 1)` : Single grayscale image with **y** pixels height and **z** pixels width\n* `(1, y, z)` : Single grayscale image with **y** pixels height and **z** pixels width\n* `(y, z, 3)` : Single RGB image with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.\n* `(3, y, z)` : Single RGB image with **y** pixels height and **z** pixels width. The last dimension should be the array of `(red, green, blue)` components.\n\n## Examples\n\n### Example of a query with curl for a single prediction\n\nIn the following example, we want to receive a prediction from our model for the following item :\n\n* `sepal_length` : 0.1\n* `sepal_width` : 0.2\n* `petal_length` : 0.3\n* `petal_width` : 0.4\n\n```bash\ncurl \\\n    -H 'Content-Type: application/json' \\\n    -H 'Accept: application/json' \\\n    -X POST \\\n    -d '{\n        \"stepal_length\": 0.1,\n        \"stepal_width\": 0.2,\n        \"petal_length\": 0.3,\n        \"petal_width\": 0.4\n    }' \\\n    http://\u003cyour-model-url\u003e/eval\n```\n\n### Example of response for a single prediction\n\n\n* HTTP Status code: `200`\n* Header: `Content-Type: application/json`\n\n```json\n{\n    \"output_label\": 0,\n    \"output_probability\": [0.88, 0.12]\n}\n```\n\nIn this example, our model predicts the **output_label** for our **input item** to be `0` with the following probabilities :\n\n* 88% of chance to be `0`\n* 12% of chance to be `1`\n\n### Example of query with curl for several predictions in one call\n\nIn the following example, we want to receive a prediction from our model for the two following items :\n\n**First Item**\n\n* `sepal_length` : 0.1\n* `sepal_width` : 0.2\n* `petal_length` : 0.3\n* `petal_width` : 0.4\n\n**Second Item**\n\n* `sepal_length` : 0.2\n* `sepal_width` : 0.3\n* `petal_length` : 0.4\n* `petal_width` : 0.5\n\n**Query**\n\n```bash\ncurl \\\n    -H 'Content-Type: application/json' \\\n    -H 'Accept: application/json' \\\n    -X POST \\\n    -d '{\n        \"stepal_length\": [0.1, 0.2],\n        \"stepal_width\": [0.2, 0.3],\n        \"petal_length\": [0.3, 0.4],\n        \"petal_width\": [0.4, 0.5]\n    }' \\\n    http://\u003cyour-model-url\u003e/eval\n```\n\n### Example of response for several predictions in one call\n\n* HTTP Status code: `200`\n* Header: `Content-Type: application/json`\n\n```json\n{\n    \"output_label\": [0, 1],\n    \"output_probability\": [\n        [0.88, 0.12],\n        [0.01, 0.99]\n    ]\n}\n```\n\nIn this example, our model predicts the **output_label** for our **first input item** to be `0` with the following probabilities :\n\n* 88% of chance to be `0`\n* 12% of chance to be `1`\n\nIt also predicts the **output_label** for our **second input item** to be `1` with the following probabilities :\n\n* 1% of chance to be `0`\n* 99% of chance to be `1`\n\n# Related links\n \n * Contribute: https://github.com/ovh/serving-runtime/blob/master/CONTRIBUTING.md\n * Report bugs: https://github.com/ovh/serving-runtime/issues\n \n# License\n \nSee https://github.com/ovh/serving-runtime/blob/master/LICENSE\n\n[Content Type Header]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type\n[Accept Header]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept\n[Media Type]: https://developer.mozilla.org/en-US/docs/Glossary/MIME_type\n[ONNX]: https://onnx.ai/","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fovh%2Fserving-runtime","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fovh%2Fserving-runtime","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fovh%2Fserving-runtime/lists"}