{"id":13488765,"url":"https://github.com/autodeployai/ai-serving","last_synced_at":"2026-02-21T05:06:17.755Z","repository":{"id":192763449,"uuid":"256194101","full_name":"autodeployai/ai-serving","owner":"autodeployai","description":"Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints","archived":false,"fork":false,"pushed_at":"2024-10-20T15:30:14.000Z","size":292,"stargazers_count":147,"open_issues_count":3,"forks_count":31,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-10-20T17:39:46.726Z","etag":null,"topics":["ai-serving","inference","inference-server","onnx","onnx-grpc","onnx-inference","onnx-models","onnx-realtime","onnx-rest","pmml","pmml-deployment","pmml-grpc","pmml-inference","pmml-model","pmml-realtime","pmml-rest"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/autodeployai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-16T11:24:35.000Z","updated_at":"2024-10-20T15:30:19.000Z","dependencies_parsed_at":"2024-01-16T09:02:57.582Z","dependency_job_id":"c681fbfb-a6fa-4bda-939c-768596239a7f","html_url":"https://github.com/autodeployai/ai-serving","commit_stats":null,"previous_names":["autodeployai/ai-serving"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autodeployai%2Fai-serving","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autodeployai%2Fai-serving/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autodeployai%2Fai-serving/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autodeployai%2Fai-serving/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/autodeployai","download_url":"https://codeload.github.com/autodeployai/ai-serving/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-serving","inference","inference-server","onnx","onnx-grpc","onnx-inference","onnx-models","onnx-realtime","onnx-rest","pmml","pmml-deployment","pmml-grpc","pmml-inference","pmml-model","pmml-realtime","pmml-rest"],"created_at":"2024-07-31T18:01:21.428Z","updated_at":"2026-02-21T05:06:17.748Z","avatar_url":"https://github.com/autodeployai.png","language":"Scala","funding_links":[],"categories":["Scala","人工智能"],"sub_categories":[],"readme":"# AI-Serving\n\nServing AI/ML models in the open standard formats [PMML](http://dmg.org/pmml/v4-4-1/GeneralStructure.html) and [ONNX](https://onnx.ai/) with both HTTP (REST API) and gRPC endpoints.\n\n## Table of Contents\n\n- [Features](#features)\n- [Prerequisites](#prerequisites) \n- [Installation](#installation)\n    - [Install using Docker](#install-using-docker)\n    - [Install from Source](#install-from-source)\n        - [Install SBT](#install-sbt)\n        - [Build Assembly](#build-assembly)\n        - [Start Server](#start-server)\n        - [Server Configurations](#server-configurations)\n- [PMML](#pmml)\n- [ONNX](#onnx)\n    - [Advanced ONNX Runtime Configuration](#advanced-onnx-runtime-configuration)\n        - [Build ONNX Runtime](#build-onnx-runtime)\n        - [Load ONNX Runtime](#load-onnx-runtime)\n- [How to deploy and undeploy a PMML or ONNX model in AI-Serving](#how-to-deploy-and-undeploy-a-pmml-or-onnx-model-in-ai-serving)\n  - [Manual Deployment](#manual-deployment)\n  - [Deployment via API](#deployment-via-api)\n- [REST APIs](#rest-apis)\n  - [v2 REST APIs](#v2-rest-apis)\n  - [v1 REST APIs](#v1-rest-apis)\n      - [Validate API](#1-validate-api)\n      - [Deploy API](#2-deploy-api)\n      - [Model Metadata API](#3-model-metadata-api)\n      - [Predict API](#4-predict-api)\n- [gRPC APIs](#grpc-apis)\n  - [gRPC predict v2 proto specification](#grpc-predict-v2-proto-specification)\n  - [gRPC predict v1 proto specification](#grpc-predict-v1-proto-specification)\n- [Examples](#examples)\n    - [Python](#python)\n      - [Serving PMML models with AI-Serving](examples/AIServingIrisXGBoostPMMLModel.ipynb) \n      - [Serving ONNX models with AI-Serving](examples/AIServingMnistOnnxModel.ipynb)\n    - [Curl](#curl)\n      - [Scoring PMML models](#scoring-pmml-models)\n      - [Scoring ONNX models](#scoring-onnx-models)\n- [Support](#support)\n- [License](#license)\n\n## Features\n\nAI-Serving is a flexible, high-performance inference system for machine learning and deep learning models, designed for production environments. AI-Serving is a flexible, high-performance inference system for machine learning and deep learning models, designed for production environments.\n\n- Out-of-the-box support for PMML and ONNX models.\n- HTTP and gRPC APIs for seamless integration.\n- Support for both v1 and v2 APIs, with the v2 API fully compatible with the [Open Inference Protocol](https://github.com/kserve/open-inference-protocol).\n- Automatic batching to improve GPU utilization and increase throughput, applicable to only ONNX models.\n- Configurable request timeouts for better control in production.\n- Automatic model warm-up before handling inference requests.\n\n## Prerequisites\n\n* Java \u003e= 1.8\n\n## Installation\n\n### Install using Docker\nThe easiest and most straight-forward way of using AI-Serving is with [Docker images](dockerfiles).\n\n### Install from Source\n\n#### Install SBT\n\nThe [`sbt`](https://www.scala-sbt.org/) build system is required. After sbt installed, clone this repository, then change into the repository root directory:\n```bash\ncd REPO_ROOT\n```\n\n#### Build Assembly\n\nAI-Serving depends on [ONNX Runtime](https://github.com/microsoft/onnxruntime) to support ONNX models, and the default CPU accelerator (OpenMP) is used for ONNX Runtime:\n```bash\n\nsbt clean assembly\n```\n\nSet the property `-Dgpu=true` to use the GPU accelerator (CUDA) for [ONNX Runtime](https://github.com/microsoft/onnxruntime):\n```bash\nsbt -Dgpu=true clean assembly\n```\n\nRun `set test in assembly := {}` to disable unit tests if you want to skip them when generating an assembly jar:\n```bash\nsbt -Dgpu=true 'set test in assembly := {}' clean assembly\n```\n\nAn assembly jar will be generated:\n```bash\n$REPO_ROOT/target/scala-2.13/ai-serving-assembly-\u003cversion\u003e.jar or ai-serving-gpu-assembly-\u003cversion\u003e.jar\n```\n\n#### Start Server\n\nSimply run with the default CPU backend for ONNX models:\n```bash\njava -jar ai-serving-assembly-\u003cversion\u003e.jar\n```\n\nGPU backend for ONNX models:\n```bash\njava -Donnxruntime.backend=cuda -jar ai-serving-gpu-assembly-\u003cversion\u003e.jar\n```\nSeveral available execution backends: TensorRT, DirectML, Dnnl and so on. See [Advanced ONNX Runtime Configuration](#advanced-onnx-runtime-configuration) for details.\n\n#### Server Configurations\n\nBy default, the HTTP endpoint is listening on `http://0.0.0.0:9090/`, and the gRPC port is `9091`. You can customize those options that are defined in the [`application.conf`](src/main/resources/application.conf). There are several ways to override the default options, one is to create a new config file based on the default one, then:\n\n```bash\njava -Dconfig.file=/path/to/config-file -jar ai-serving-assembly-\u003cversion\u003e.jar\n```\n\nAnother is to override each by setting Java system property, for example:\n\n```bash\njava -Dservice.http.port=9000 -Dservice.grpc.port=9001 -Dservice.home=\"/path/to/writable-directory\" -jar ai-serving-assembly-\u003cversion\u003e.jar\n```\n\nAI-Serving is designed to be persistent or recoverable, so it needs a place to save all served models, that is specified by the property `service.home` that takes `/opt/ai-serving` as default, and the directory must be writable.\n\n## PMML\n\nIntegrates [PMML4S](https://github.com/autodeployai/pmml4s) to score PMML models. PMML4S is a lightweight, clean and efficient implementation based on the [PMML](http://dmg.org/) specification from 2.0 through to the latest 4.4.1. \n\nPMML4S is written in pure Scala that running in JVM, AI-Serving needs no special configurations to support PMML models.\n\n## ONNX\n\nLeverages [ONNX Runtime](https://github.com/microsoft/onnxruntime) to make predictions for ONNX models. ONNX Runtime is a performance-focused inference engine for ONNX models.\n\nONNX Runtime is implemented in C/C++, and AI-Serving calls ONNX Runtime Java API to support ONNX models. ONNX Runtime can support various architectures with multiple hardware accelerators, refer to the table on [aka.ms/onnxruntime](https://microsoft.github.io/onnxruntime/) for details.\n\nSince both CPU and GPU binaries of X64 have been distributed to the [Maven Central](https://mvnrepository.com/artifact/com.microsoft.onnxruntime), AI-Serving depends on them directly now, the CPU is used by default, and you can switch to GPU via the property `-Dgpu=true` defined above. \n\nIf you depend on other different OS/architectures or hardware accelerators, refer to the next topic [Advanced ONNX Runtime Configuration](#advanced-onnx-runtime-configuration), otherwise, ignore it.\n\n### Advanced ONNX Runtime Configuration\n\n#### Build ONNX Runtime\n   \nYou need to build both native libraries `JNI shared library` and `onnxruntime shared library` for your OS/Architecture: \n   \nPlease, refer to the [onnxtime build instructions](https://github.com/microsoft/onnxruntime/blob/master/BUILD.md), and the `--build_java` option always must be specified.\n\n#### Load ONNX Runtime\n    \nPlease, see [Build Output](https://github.com/microsoft/onnxruntime/tree/master/java#build-output) lists all generated outputs. Explicitly specify the path to the shared library when starting AI-Serving:\n  ```bash\n  java -Donnxruntime.backend=execution_backend -Donnxruntime.native.onnxruntime4j_jni.path=/path/to/onnxruntime4j_jni -Donnxruntime.native.onnxruntime.path=/path/to/onnxruntime -jar ai-serving-assembly-\u003cversion\u003e.jar\n  ```\n\n## How to deploy and undeploy a PMML or ONNX model in AI-Serving\n\nThere are two ways to deploy a PMML or ONNX model into AI-Serving: manual deployment and deployment via API:\n\n### Manual Deployment\n  To deploy a model manually, follow these steps:\n  1. Locate the directory specified by the `service.home` property.\n  2. Create a subdirectory named `models` if it does not already exist.\n  3. Inside the model name directory, create a subdirectory for the model version (for example, 1, 2, etc.).\n  4. Place the model file into the version directory:\n     - Use the fixed filename `model.pmml` for PMML models.\n     - Use the fixed filename `model.onnx` for ONNX models.\n  5. Customized inference behavior can be configured in model.conf, which can be placed either in the model directory or within a specific version directory. The following parameters are supported:\n```\nmax-batch-size=8\nmax-batch-delay-ms=10\nrequest-timeout-ms=20\nwarmup-count=100\nwarmup-data-type=zero   // one of zero and random\n```\n  6. All models placed in this directory structure will be automatically loaded when AI-Serving starts.\n\nRefer to the following example for guidance:\n```shell\n/opt/ai-serving\n└── models\n    ├── iris\n    │   └── 1\n    │       └── model.pmml\n    ├── mnist\n    │   ├── 1\n    │   │   ├── model.conf\n    │   │   └── model.onnx\n    │   └── 2\n    │       └── model.onnx\n    └── mobilenet\n        ├── 1\n        │   └── model.onnx\n        └── model.conf\n```\nUndeploying a model is straightforward, simply remove the corresponding model directory or the specific version directory.\n\n### Deployment via API\nTo deploy and undeploy a model via the deployment API, refer to the following REST and gRPC APIs for deploying and undeploying models.\n\n## REST APIs\n\n### v2 REST APIs\nThey are fully compatible with [Predict Protocol - Version 2](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md)\n\n### v1 REST APIs\n\n\n- [Validate API](#1-validate-api)\n- [Deploy API](#2-deploy-api)\n- [Model Metadata API](#3-model-metadata-api)\n- [Predict API](#4-predict-api)\n\nWhen an error occurs, all APIs will return a JSON object as follows:\n```\n{\n  \"error\": \u003can error message string\u003e\n}\n```\n\n### 1. Validate API\n\n#### URL:\n```\nPUT http://host:port/v1/validate\n```\n\n#### Request:\nModel with `Content-Type` that tells the server which format to handle:\n * `Content-Type: application/xml`, or `Content-Type: text/xml`, the input is treated as a PMML model.\n * `Content-Type: application/octet-stream`, `application/vnd.google.protobuf` or `application/x-protobuf`, the input is processed as an ONNX model.\n \nIf no `Content-Type` specified, the server can probe the content type from the input entity, but it could fail.\n\n#### Response:\nModel metadata includes the model type, input list, output list, and so on.\n```\n{\n  \"type\": \u003cmodel_type\u003e\n  \"inputs\": [\n    {\n      \"name\": \u003cinput_name1\u003e,\n      \"type\": \u003cfield_type\u003e,\n      ...\n    },\n    {\n      \"name\": \u003cinput_name2\u003e,\n      \"type\": \u003cfield_type\u003e,\n      ...\n    },\n    ...\n  ],\n  \"outputs\": [\n    {\n      \"name\": \u003coutput_name1\u003e,\n      \"type\": \u003cfield_type\u003e,\n      ...\n    },\n    {\n      \"name\": \u003coutput_name2\u003e,\n      \"type\": \u003cfield_type\u003e,\n      ...\n    },\n    ...\n  ],\n  ...\n}\n```\n\n### 2. Deploy API\n\n#### Deployment URL:\n```\nPUT http://host:port/v1/models/${MODEL_NAME}\n```\n\n#### Request:\nModel with `Content-Type`, see validation request above for details.\n\n#### Response:\n```\n{\n  // The specified servable name\n  \"name\": \u003cmodel_name\u003e\n  \n  // The deployed version starts from 1\n  \"version\": \u003cmodel_version\u003e\n}\n```\n\n#### Undeployment URL:\n```\nDELETE http://host:port/v1/models/${MODEL_NAME}\n```\n\n#### Response:\n```\n204 No Content\n```\n\n### 3. Model Metadata API\n\n#### URL:\n```\nGET http://host:port/v1/models[/${MODEL_NAME}[/versions/${MODEL_VERSION}]]\n```\n* If `/${MODEL_NAME}/versions/${MODEL_VERSION}` is missing, all models are returned.\n* If `/versions/${MODEL_VERSION}` is missing, all versions of the specified model are returned.\n* Otherwise, only the specified version and model is returnd.\n\n#### Response:\n\n```\n// All models are returned from GET http://host:port/v1/models\n[\n  {\n    \"name\": \u003cmodel_name1\u003e,\n    \"versions\": [\n      {\n        \"version\": 1,\n        ...\n      },\n      {\n        \"version\": 2,\n        ...\n      },\n      ...\n    ]\n  },\n  {\n    \"name\": \u003cmodel_name2\u003e,\n    \"versions\": [\n      {\n        \"version\": 1,\n        ...\n      },\n      {\n        \"version\": 2,\n        ...\n      },\n      ...\n    ]\n  },\n  ...\n]\n```\n\n```\n// All versions of the specified model are returned from GET http://host:port/v1/models/${MODEL_NAME}\n{\n  \"name\": \u003cmodel_name\u003e,\n  \"versions\": [\n    {\n      \"version\": 1,\n      ...\n    },\n    {\n      \"version\": 2,\n      ...\n    },\n    ...\n  ]\n}\n```\n\n```\n// The specified version and model is returned from http://host:port/v1/models/${MODEL_NAME}/versions/${MODEL_VERSION}\n{\n  \"name\": \u003cmodel_name\u003e,\n  \"versions\": [\n    {\n      \"version\": \u003cmodel_version\u003e,\n      ...\n    }\n  ]\n}\n```\n\n\n### 4. Predict API\n\n#### URL:\n```\nPOST http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]\n```\n/versions/${MODEL_VERSION} is optional. If omitted the latest version is used.\n\n#### Request:\nThe request body could have two formats: JSON and binary, the HTTP header `Content-Type` tells the server which format to handle and thus it is required for all requests.\n\n* `Content-Type: application/json`. The request body must be a JSON object formatted as follows:\n  ```\n  {\n    \"X\": {\n      \"records\": [\n        {\n          \"input_name1\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n          \"input_name2\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n          ...\n        },\n        {\n          \"input_name1\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n          \"input_name2\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n          ...\n        },\n        ...\n      ],\n      \"columns\": [ \"input_name1\", \"input_name2\", ... ],\n      \"data\": [ \n        [ \u003cvalue\u003e|\u003c(nested)list\u003e, \u003cvalue\u003e|\u003c(nested)list\u003e, ... ], \n        [ \u003cvalue\u003e|\u003c(nested)list\u003e, \u003cvalue\u003e|\u003c(nested)list\u003e, ... ], \n        ... \n      ]\n    },\n    // Output filters to specify which output fields need to be returned.\n    // If the list is empty, all outputs will be included.\n    \"filter\": \u003clist\u003e\n  }\n  ```\n  The `X` can take more than one records, as you see above, there are two formats supported. You could use any one, usually the `split` format is smaller for multiple records.\n  - `records` : list like [{column -\u003e value}, … , {column -\u003e value}]\n  - `split` : dict like {columns -\u003e [columns], data -\u003e [values]}\n\n* `Content-Type: application/octet-stream`, `application/vnd.google.protobuf` or `application/x-protobuf`. \n  \n  The request body must be the protobuf message [`PredictRequest`](https://github.com/autodeployai/ai-serving/blob/master/src/main/protobuf/ai-serving.proto#L152) of gRPC API, besides of those common scalar values, it can use the standard [`onnx.TensorProto`](https://github.com/autodeployai/ai-serving/blob/master/src/main/protobuf/onnx-ml.proto#L304) value directly.\n\n* Otherwise, an error will be returned.\n\n#### Response:\nThe server always return the same type as your request.\n\n* For the JSON format. The response body is a JSON object formatted as follows:\n```\n{\n  \"result\": {\n    \"records\": [\n      {\n        \"output_name1\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n        \"output_name2\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n        ...\n      },\n      {\n        \"output_name1\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n        \"output_name2\": \u003cvalue\u003e|\u003c(nested)list\u003e,\n        ...\n      },\n      ...\n    ],\n    \"columns\": [ \"output_name1\", \"output_name2\", ... ],\n    \"data\": [ \n      [ \u003cvalue\u003e|\u003c(nested)list\u003e, \u003cvalue\u003e|\u003c(nested)list\u003e, ... ], \n      [ \u003cvalue\u003e|\u003c(nested)list\u003e, \u003cvalue\u003e|\u003c(nested)list\u003e, ... ], \n      ... \n    ]\n  }\n}\n```\n* For the binary format. The response body is the protobuf message [`PredictResponse`](https://github.com/autodeployai/ai-serving/blob/master/src/main/protobuf/ai-serving.proto#L164) of gRPC API.\n\nGenerally speaking, the binary payload has better latency, especially for the big tensor value for ONNX models, while the JSON format is easy for human readability.\n\n## gRPC APIs\n\n### gRPC predict v2 proto specification\nRefer to the [grpc_predict_v2.proto](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/grpc_predict_v2.proto) specification\n\n### gRPC predict v1 proto specification\nPlease, refer to the protobuf file [`ai-serving.proto`](src/main/protobuf/ai-serving.proto) for details. You could generate your client and make a gRPC call to it using your favorite language. To learn more about how to generate the client code and call to the server, please refer to [the tutorials of gRPC](https://grpc.io/docs/tutorials/).\n\n## Examples\n\n### Python\n- [Serving ONNX models with AI-Serving](examples/AIServingMnistOnnxModel.ipynb)\n- [Serving PMML models with AI-Serving](examples/AIServingIrisXGBoostPMMLModel.ipynb) \n\n### Curl\n- [Scoring PMML models](#scoring-pmml-models)\n- [Scoring ONNX models](#scoring-onnx-models)\n\n#### Start AI-Serving\nWe will use Docker to run the AI-Serving:\n```bash\ndocker pull autodeployai/ai-serving:latest\ndocker run --rm -it -v /opt/ai-serving:/opt/ai-serving -p 9090:9090 -p 9091:9091 autodeployai/ai-serving:latest\n\n16:06:47.722 INFO  AI-Serving-akka.actor.default-dispatcher-5 akka.event.slf4j.Slf4jLogger             Slf4jLogger started\n16:06:47.833 INFO  main            ai.autodeploy.serving.AIServer$          Predicting thread pool size: 16\n16:06:48.305 INFO  main            a.autodeploy.serving.protobuf.GrpcServer AI-Serving grpc server started, listening on 9091\n16:06:49.433 INFO  main            ai.autodeploy.serving.AIServer$          AI-Serving http server started, listening on http://0.0.0.0:9090/\n```\n\n#### Make REST API calls to AI-Serving\nIn a different terminal, run `cd $REPO_ROOT/src/test/resources`, use the `curl` tool to make REST API calls.\n\n\n#### Scoring PMML models\nWe will use the `Iris` decision tree model [single_iris_dectree.xml](http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml) to see REST APIs in action.\n\n* ##### Validate the PMML model\n```bash\ncurl -X PUT --data-binary @single_iris_dectree.xml -H \"Content-Type: application/xml\"  http://localhost:9090/v1/validate\n{\n  \"algorithm\": \"TreeModel\",\n  \"app\": \"KNIME\",\n  \"appVersion\": \"2.8.0\",\n  \"copyright\": \"KNIME\",\n  \"formatVersion\": \"4.1\",\n  \"functionName\": \"classification\",\n  \"outputs\": [\n    {\n      \"name\": \"predicted_class\",\n      \"optype\": \"nominal\",\n      \"type\": \"string\"\n    },\n    {\n      \"name\": \"probability\",\n      \"optype\": \"continuous\",\n      \"type\": \"real\"\n    },\n    {\n      \"name\": \"probability_Iris-setosa\",\n      \"optype\": \"continuous\",\n      \"type\": \"real\"\n    },\n    {\n      \"name\": \"probability_Iris-versicolor\",\n      \"optype\": \"continuous\",\n      \"type\": \"real\"\n    },\n    {\n      \"name\": \"probability_Iris-virginica\",\n      \"optype\": \"continuous\",\n      \"type\": \"real\"\n    },\n    {\n      \"name\": \"node_id\",\n      \"optype\": \"nominal\",\n      \"type\": \"string\"\n    }\n  ],\n  \"inputs\": [\n    {\n      \"name\": \"sepal_length\",\n      \"optype\": \"continuous\",\n      \"type\": \"double\",\n      \"values\": \"[4.3,7.9]\"\n    },\n    {\n      \"name\": \"sepal_width\",\n      \"optype\": \"continuous\",\n      \"type\": \"double\",\n      \"values\": \"[2.0,4.4]\"\n    },\n    {\n      \"name\": \"petal_length\",\n      \"optype\": \"continuous\",\n      \"type\": \"double\",\n      \"values\": \"[1.0,6.9]\"\n    },\n    {\n      \"name\": \"petal_width\",\n      \"optype\": \"continuous\",\n      \"type\": \"double\",\n      \"values\": \"[0.1,2.5]\"\n    }\n  ],\n  \"runtime\": \"PMML4S\",\n  \"serialization\": \"pmml\",\n  \"targets\": [\n    {\n      \"name\": \"class\",\n      \"optype\": \"nominal\",\n      \"type\": \"string\",\n      \"values\": \"Iris-setosa,Iris-versicolor,Iris-virginica\"\n    }\n  ],\n  \"type\": \"PMML\"\n}\n```\n\n* ##### Deploy the PMML model without configuration\n```bash\ncurl -X PUT --data-binary @single_iris_dectree.xml -H \"Content-Type: application/xml\"  http://localhost:9090/v1/models/iris\n{\n  \"name\": \"iris\",\n  \"version\": \"1\"\n}\n```\n\n* ##### Deploy the PMML model with specified configurations\n```bash\ncurl -X PUT -F \"model=@single_iris_dectree.xml\" -F \"config=@conf.json\"   http://localhost:9090/v1/models/iris\n{\n  \"name\": \"iris\",\n  \"version\": \"2\"\n}\n```\n\n* ##### Get metadata of the PMML model\n```bash\ncurl -X GET http://localhost:9090/v1/models/iris\n{\n  \"createdAt\": \"2020-04-23T06:29:32\",\n  \"id\": \"56fa4917-e904-4364-9c15-4c87d84ec2c4\",\n  \"latestVersion\": 1,\n  \"name\": \"iris\",\n  \"updateAt\": \"2020-04-23T06:29:32\",\n  \"versions\": [\n    {\n      \"algorithm\": \"TreeModel\",\n      \"app\": \"KNIME\",\n      \"appVersion\": \"2.8.0\",\n      \"copyright\": \"KNIME\",\n      \"createdAt\": \"2020-04-23T06:29:32\",\n      \"formatVersion\": \"4.1\",\n      \"functionName\": \"classification\",\n      \"hash\": \"fc44c33123836be368d3f24829360020\",\n      \"outputs\": [\n        {\n          \"name\": \"predicted_class\",\n          \"optype\": \"nominal\",\n          \"type\": \"string\"\n        },\n        {\n          \"name\": \"probability\",\n          \"optype\": \"continuous\",\n          \"type\": \"real\"\n        },\n        {\n          \"name\": \"probability_Iris-setosa\",\n          \"optype\": \"continuous\",\n          \"type\": \"real\"\n        },\n        {\n          \"name\": \"probability_Iris-versicolor\",\n          \"optype\": \"continuous\",\n          \"type\": \"real\"\n        },\n        {\n          \"name\": \"probability_Iris-virginica\",\n          \"optype\": \"continuous\",\n          \"type\": \"real\"\n        },\n        {\n          \"name\": \"node_id\",\n          \"optype\": \"nominal\",\n          \"type\": \"string\"\n        }\n      ],\n      \"inputs\": [\n        {\n          \"name\": \"sepal_length\",\n          \"optype\": \"continuous\",\n          \"type\": \"double\",\n          \"values\": \"[4.3,7.9]\"\n        },\n        {\n          \"name\": \"sepal_width\",\n          \"optype\": \"continuous\",\n          \"type\": \"double\",\n          \"values\": \"[2.0,4.4]\"\n        },\n        {\n          \"name\": \"petal_length\",\n          \"optype\": \"continuous\",\n          \"type\": \"double\",\n          \"values\": \"[1.0,6.9]\"\n        },\n        {\n          \"name\": \"petal_width\",\n          \"optype\": \"continuous\",\n          \"type\": \"double\",\n          \"values\": \"[0.1,2.5]\"\n        }\n      ],\n      \"runtime\": \"PMML4S\",\n      \"serialization\": \"pmml\",\n      \"size\": 3497,\n      \"targets\": [\n        {\n          \"name\": \"class\",\n          \"optype\": \"nominal\",\n          \"type\": \"string\",\n          \"values\": \"Iris-setosa,Iris-versicolor,Iris-virginica\"\n        }\n      ],\n      \"type\": \"PMML\",\n      \"version\": 1\n    }\n  ]\n}\n```\n\n* ##### Predict the PMML model using the JSON payload in `records`\n```bash\ncurl -X POST -d '{\"X\": [{\"sepal_length\": 5.1, \"sepal_width\": 3.5, \"petal_length\": 1.4, \"petal_width\": 0.2}]}' -H \"Content-Type: application/json\"  http://localhost:9090/v1/models/iris\n{\n  \"result\": [\n    {\n      \"node_id\": \"1\",\n      \"probability_Iris-setosa\": 1.0,\n      \"predicted_class\": \"Iris-setosa\",\n      \"probability_Iris-virginica\": 0.0,\n      \"probability_Iris-versicolor\": 0.0,\n      \"probability\": 1.0\n    }\n  ]\n}\n```\n\n* ##### Predict the PMML model using the JSON payload in `split` with filters\n```bash\ncurl -X POST -d '{\"X\": {\"columns\": [\"sepal_length\", \"sepal_width\", \"petal_length\", \"petal_width\"],\"data\":[[5.1, 3.5, 1.4, 0.2], [7, 3.2, 4.7, 1.4]]}, \"filter\": [\"predicted_class\"]}' -H \"Content-Type: application/json\"  http://localhost:9090/v1/models/iris\n{\n  \"result\": {\n    \"columns\": [\n      \"predicted_class\"\n    ],\n    \"data\": [\n      [\n        \"Iris-setosa\"\n      ],\n      [\n        \"Iris-versicolor\"\n      ]\n    ]\n  }\n}\n```\n\n#### Scoring ONNX models\nWe will use the pre-trained [MNIST Handwritten Digit Recognition ONNX Model](https://github.com/onnx/models/tree/master/vision/classification/mnist) to see REST APIs in action. \n\n* ##### Validate the ONNX model\n```bash\ncurl -X PUT --data-binary @mnist.onnx -H \"Content-Type: application/octet-stream\"  http://localhost:9090/v1/validate\n{\n  \"outputs\": [\n    {\n      \"name\": \"Plus214_Output_0\",\n      \"shape\": [\n        1,\n        10\n      ],\n      \"type\": \"tensor(float)\"\n    }\n  ],\n  \"inputs\": [\n    {\n      \"name\": \"Input3\",\n      \"shape\": [\n        1,\n        1,\n        28,\n        28\n      ],\n      \"type\": \"tensor(float)\"\n    }\n  ],\n  \"runtime\": \"ONNX Runtime\",\n  \"serialization\": \"onnx\",\n  \"type\": \"ONNX\"\n}\n```\n\n* ##### Deploy the ONNX model\n```bash\ncurl -X PUT --data-binary @mnist.onnx -H \"Content-Type: application/octet-stream\"  http://localhost:9090/v1/models/mnist\n{\n  \"name\": \"mnist\",\n  \"version\": 1\n}\n```\n\n* ##### Get metadata of the ONNX model\n```bash\ncurl -X GET http://localhost:9090/v1/models/mnist\n{\n  \"createdAt\": \"2020-04-16T15:18:18\",\n  \"id\": \"850bf345-5c4c-4312-96c8-6ee715113961\",\n  \"latestVersion\": 1,\n  \"name\": \"mnist\",\n  \"updateAt\": \"2020-04-16T15:18:18\",\n  \"versions\": [\n    {\n      \"createdAt\": \"2020-04-16T15:18:18\",\n      \"hash\": \"104617a683b4e62469478e07e1518aaa\",\n      \"outputs\": [\n        {\n          \"name\": \"Plus214_Output_0\",\n          \"shape\": [\n            1,\n            10\n          ],\n          \"type\": \"tensor(float)\"\n        }\n      ],\n      \"inputs\": [\n        {\n          \"name\": \"Input3\",\n          \"shape\": [\n            1,\n            1,\n            28,\n            28\n          ],\n          \"type\": \"tensor(float)\"\n        }\n      ],\n      \"runtime\": \"ONNX Runtime\",\n      \"serialization\": \"onnx\",\n      \"size\": 26454,\n      \"type\": \"ONNX\",\n      \"version\": 1\n    }\n  ]\n}\n```\n\n* ##### Predict the ONNX model using the REST payload in `records`\n```bash\ncurl -X POST -d @mnist_request_0.json -H \"Content-Type: application/json\" http://localhost:9090/v1/models/mnist\n{\n  \"result\": [\n    {\n      \"Plus214_Output_0\": [\n        [\n          975.6703491210938,\n          -618.7241821289062,\n          6574.5654296875,\n          668.0283203125,\n          -917.2710571289062,\n          -1671.6361083984375,\n          -1952.7598876953125,\n          -61.54957580566406,\n          -777.1764526367188,\n          -1439.5316162109375\n        ]\n      ]\n    }\n  ]\n}\n```\n\n* ##### Predict the ONNX model using the binary payload\n```bash\ncurl -X POST --data-binary @mnist_request_0.pb -o response_0.pb -H \"Content-Type: application/octet-stream\" http://localhost:9090/v1/models/mnist\n```\n\nSave the binary response to `response_0.pb` that is in the `protobuf` format, an instance of `PredictResponse` message, you could use the generated client from `ai-serving.proto` to read it.\n\nNote, the content type of `predict` request must be specified explicitly and take one of four candidates. An incorrect request URL or body returns an HTTP error status.\n```bash\ncurl -i -X POST -d @mnist_request_0.json  http://localhost:9090/v1/models/mnist\nHTTP/1.1 400 Bad Request\nServer: akka-http/10.1.11\nDate: Sun, 19 Apr 2020 06:25:25 GMT\nConnection: close\nContent-Type: application/json\nContent-Length: 92\n\n{\"error\":\"Prediction request takes unknown content type: application/x-www-form-urlencoded\"}\n```\n\n## Support\nIf you have any questions about the _AI-Serving_ library, please open issues on this repository.\n\nFeedback and contributions to the project, no matter what kind, are always very welcome. \n\n## License\n_AI-Serving_ is licensed under [APL 2.0](http://www.apache.org/licenses/LICENSE-2.0).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautodeployai%2Fai-serving","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fautodeployai%2Fai-serving","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautodeployai%2Fai-serving/lists"}