{"id":21300919,"url":"https://github.com/kibae/onnxruntime-server","last_synced_at":"2025-04-05T06:06:28.917Z","repository":{"id":192653660,"uuid":"687136184","full_name":"kibae/onnxruntime-server","owner":"kibae","description":"ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.","archived":false,"fork":false,"pushed_at":"2025-03-11T01:25:13.000Z","size":971,"stargazers_count":153,"open_issues_count":9,"forks_count":11,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T05:09:43.458Z","etag":null,"topics":["ai","contributions-welcome","cuda","deep-learning","inference-server","machine-learning","nueral-networks","onnx","onnxruntime"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kibae.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-04T17:42:13.000Z","updated_at":"2025-03-26T02:42:16.000Z","dependencies_parsed_at":"2024-03-17T14:30:24.847Z","dependency_job_id":"31fb89f2-e425-41bd-b635-4c056b798bed","html_url":"https://github.com/kibae/onnxruntime-server","commit_stats":{"total_commits":110,"total_committers":1,"mean_commits":110.0,"dds":0.0,"last_synced_commit":"61741093d83c6fa34dc8cf797fef84f0875dcfe4"},"previous_names":["kibae/onnx-runtime-server","kibae/onnxruntime-server"],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibae%2Fonnxruntime-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibae%2Fonnxruntime-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibae%2Fonnxruntime-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kibae%2Fonnxruntime-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kibae","download_url":"https://codeload.github.com/kibae/onnxruntime-server/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294536,"owners_count":20915340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","contributions-welcome","cuda","deep-learning","inference-server","machine-learning","nueral-networks","onnx","onnxruntime"],"created_at":"2024-11-21T15:42:17.758Z","updated_at":"2025-04-05T06:06:28.881Z","avatar_url":"https://github.com/kibae.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ONNX Runtime Server\n\n[![ONNX Runtime](https://img.shields.io/github/v/release/microsoft/onnxruntime?filter=v1.21.0\u0026label=ONNX%20Runtime)](https://github.com/microsoft/onnxruntime)\n[![CMake on Linux](https://github.com/kibae/onnxruntime-server/actions/workflows/cmake-linux.yml/badge.svg)](https://github.com/kibae/onnxruntime-server/actions/workflows/cmake-linux.yml)\n[![CMake on MacOS](https://github.com/kibae/onnxruntime-server/actions/workflows/cmake-macos.yml/badge.svg)](https://github.com/kibae/onnxruntime-server/actions/workflows/cmake-macos.yml)\n[![CMake on Windows](https://github.com/kibae/onnxruntime-server/actions/workflows/cmake-windows.yml/badge.svg)](https://github.com/kibae/onnxruntime-server/actions/workflows/cmake-windows.yml)\n[![CodeQL](https://github.com/kibae/onnxruntime-server/actions/workflows/codeql.yml/badge.svg)](https://github.com/kibae/onnxruntime-server/actions/workflows/codeql.yml)\n[![License](https://img.shields.io/github/license/kibae/onnxruntime-server)](https://github.com/kibae/onnxruntime-server/blob/main/LICENSE)\n\n- [ONNX: Open Neural Network Exchange](https://onnx.ai/)\n- **The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.**\n- ONNX Runtime Server aims to provide simple, high-performance ML inference and a good developer experience.\n    - If you have exported ML models trained in various environments as ONNX files, you can provide inference APIs\n      without writing additional code or\n      metadata. [Just place the ONNX files into the directory structure.](#run-the-server)\n    - Each ONNX session, you can choose to use CPU or CUDA.\n    - Analyze the input/output of ONNX models to provide type/shape information for your collaborators.\n    - Built-in Swagger API documentation makes it easy for collaborators to test ML models through the\n      API. ([API example](https://kibae.github.io/onnxruntime-server/swagger/))\n    - [Ready-to-run Docker images.](#docker) No build required.\n\n----\n\n\u003c!-- TOC --\u003e\n\n- [Build ONNX Runtime Server](#build-onnx-runtime-server)\n    - [Requirements](#requirements)\n        - [Install ONNX Runtime](#install-onnx-runtime)\n        - [Install dependencies](#install-dependencies)\n    - [Compile and Install](#compile-and-install)\n- [Install via a package manager](#install-via-a-package-manager)\n- [Run the server](#run-the-server)\n- [Docker](#docker)\n- [API](#api)\n- [How to use](#how-to-use)\n\n----\n\n# Build ONNX Runtime Server\n\n## Requirements\n\n- [ONNX Runtime](https://onnxruntime.ai/)\n- [Boost](https://www.boost.org/)\n- [CMake](https://cmake.org/), pkg-config\n- CUDA(*optional, for Nvidia GPU support*)\n- OpenSSL(*optional, for HTTPS*)\n\n----\n\n## Install ONNX Runtime\n\n#### Linux\n\n- Use `download-onnxruntime-linux.sh` script\n    - This script downloads the latest version of the binary and install to `/usr/local/onnxruntime`.\n    - Also, add `/usr/local/onnxruntime/lib` to `/etc/ld.so.conf.d/onnxruntime.conf` and run `ldconfig`.\n- Or manually download binary from [ONNX Runtime Releases](https://github.com/microsoft/onnxruntime/releases).\n\n#### Mac OS\n\n```shell\nbrew install onnxruntime\n```\n\n----\n\n## Install dependencies\n\n#### Ubuntu/Debian\n\n```shell\nsudo apt install cmake pkg-config libboost-all-dev libssl-dev\n```\n\n##### (optional) CUDA support (CUDA 12.x, cuDNN 9.x)\n\n- Follow the instructions below to install the CUDA Toolkit and cuDNN.\n    - [CUDA Toolkit Installation Guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)\n    - [CUDA Download for Ubuntu](https://developer.nvidia.com/cuda-downloads?target_os=Linux\u0026target_arch=x86_64\u0026Distribution=Ubuntu\u0026target_version=22.04\u0026target_type=deb_network)\n\n```shell\nsudo apt install cuda-toolkit-12 libcudnn9-dev-cuda-12\n# optional, for Nvidia GPU support with Docker \nsudo apt install nvidia-container-toolkit \n```\n\n#### Mac OS\n\n```shell\nbrew install cmake boost openssl\n```\n\n----\n\n## Compile and Install\n\n```shell\ncmake -B build -S . -DCMAKE_BUILD_TYPE=Release\ncmake --build build --parallel\nsudo cmake --install build --prefix /usr/local/onnxruntime-server\n```\n\n----\n\n# Install via a package manager\n\n| OS         | Method | Command                     |\n|------------|--------|-----------------------------|\n| Arch Linux | AUR    | `yay -S onnxruntime-server` |\n\n----\n\n# Run the server\n\n- **You must enter the path option(`--model-dir`) where the models are located.**\n    - The onnx model files must be located in the following path:\n      `${model_dir}/${model_name}/${model_version}/model.onnx` or\n      `${model_dir}/${model_name}/${model_version}.onnx`\n\n| Files in `--model-dir`                                                   | Create session request body                         | Get/Execute session API URL path\u003cbr /\u003e(after created) |\n|--------------------------------------------------------------------------|-----------------------------------------------------|-------------------------------------------------------|\n| `model_name/model_version/model.onnx` or `model_name/model_version.onnx` | `{\"model\":\"model_name\", \"version\":\"model_version\"}` | `/api/sessions/model_name/model_version`              |\n| `sample/v1/model.onnx` or `sample/v1.onnx`                               | `{\"model\":\"sample\", \"version\":\"v1\"}`                | `/api/sessions/sample/v1`                             |\n| `sample/v2/model.onnx` or `sample/v2.onnx`                               | `{\"model\":\"sample\", \"version\":\"v2\"}`                | `/api/sessions/sample/v2`                             |\n| `other/20200101/model.onnx` or `other/20200101.onnx`                     | `{\"model\":\"other\", \"version\":\"20200101\"}`           | `/api/sessions/other/20200101`                        |\n\n- **You need to enable one of the following backends: TCP, HTTP, or HTTPS.**\n    - If you want to use TCP, you must specify the `--tcp-port` option.\n    - If you want to use HTTP, you must specify the `--http-port` option.\n    - If you want to use HTTPS, you must specify the `--https-port`, `--https-cert` and `--https-key` options.\n    - If you want to use Swagger, you must specify the `--swagger-url-path` option.\n- Use the `-h`, `--help` option to see a full list of options.\n- **All options can be set as environment variables.** This can be useful when operating in a container like Docker.\n    - Normally, command-line options are prioritized over environment variables, but if\n      the `ONNX_SERVER_CONFIG_PRIORITY=env` environment variable exists, environment variables have higher priority.\n      Within a Docker image, environment variables have higher priority.\n\n## Options\n\n| Option                    | Environment                         | Description                                                                                                                                                                                                                                                                                                                                     |\n|---------------------------|-------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `--workers`               | `ONNX_SERVER_WORKERS`               | Worker thread pool size.\u003cbr/\u003eDefault: `4`                                                                                                                                                                                                                                                                                                       |\n| `--request-payload-limit` | `ONNX_SERVER_REQUEST_PAYLOAD_LIMIT` | HTTP/HTTPS request payload size limit.\u003cbr /\u003eDefault: 1024 * 1024 * 10(10MB)`                                                                                                                                                                                                                                                                    |\n| `--model-dir`             | `ONNX_SERVER_MODEL_DIR`             | Model directory path\u003cbr/\u003eThe onnx model files must be located in the following path:\u003cbr/\u003e`${model_dir}/${model_name}/${model_version}/model.onnx` or\u003cbr/\u003e`${model_dir}/${model_name}/${model_version}.onnx`\u003cbr/\u003eDefault: `models`                                                                                                               |\n| `--prepare-model`         | `ONNX_SERVER_PREPARE_MODEL`         | Pre-create some model sessions at server startup.\u003cbr/\u003e\u003cbr/\u003eFormat as a space-separated list of `model_name:model_version` or `model_name:model_version(session_options, ...)`.\u003cbr/\u003e\u003cbr/\u003eAvailable session_options are\u003cbr/\u003e- cuda=device_id`[ or true or false]`\u003cbr/\u003e\u003cbr/\u003eeg) `model1:v1 model2:v9`\u003cbr/\u003e`model1:v1(cuda=true) model2:v9(cuda=1)` |\n\n### Backend options\n\n| Option               | Environment                    | Description                                                                                                                                                                                     |\n|----------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `--tcp-port`         | `ONNX_SERVER_TCP_PORT`         | Enable TCP backend and which port number to use.                                                                                                                                                |\n| `--http-port`        | `ONNX_SERVER_HTTP_PORT`        | Enable HTTP backend and which port number to use.                                                                                                                                               |\n| `--https-port`       | `ONNX_SERVER_HTTPS_PORT`       | Enable HTTPS backend and which port number to use.                                                                                                                                              |\n| `--https-cert`       | `ONNX_SERVER_HTTPS_CERT`       | SSL Certification file path for HTTPS                                                                                                                                                           |\n| `--https-key`        | `ONNX_SERVER_HTTPS_KEY`        | SSL Private key file path for HTTPS                                                                                                                                                             |\n| `--swagger-url-path` | `ONNX_SERVER_SWAGGER_URL_PATH` | Enable Swagger API document for HTTP/HTTPS backend.\u003cbr/\u003eThis value cannot start with \"/api/\" and \"/health\"\u003cbr /\u003eIf not specified, swagger document not provided.\u003cbr /\u003eeg) /swagger or /api-docs |\n\n### Log options\n\n| Option              | Environment                   | Description                                                                 |\n|---------------------|-------------------------------|-----------------------------------------------------------------------------|\n| `--log-level`       | `ONNX_SERVER_LOG_LEVEL`       | Log level(debug, info, warn, error, fatal)                                  |\n| `--log-file`        | `ONNX_SERVER_LOG_FILE`        | Log file path.\u003cbr/\u003eIf not specified, logs will be printed to stdout.        |\n| `--access-log-file` | `ONNX_SERVER_ACCESS_LOG_FILE` | Access log file path.\u003cbr/\u003eIf not specified, logs will be printed to stdout. |\n\n----\n\n# Docker\n\n- Docker hub: [kibaes/onnxruntime-server](https://hub.docker.com/r/kibaes/onnxruntime-server)\n    - [\n      `1.21.0-linux-cuda12`](https://github.com/kibae/onnxruntime-server/blob/main/deploy/build-docker/linux-cuda12.dockerfile)\n      amd64(CUDA 12.x, cuDNN 9.x)\n    - [\n      `1.21.0-linux-cpu`](https://github.com/kibae/onnxruntime-server/blob/main/deploy/build-docker/linux-cpu.dockerfile)\n      amd64, arm64\n\n```shell\nDOCKER_IMAGE=kibae/onnxruntime-server:1.21.0-linux-cuda12 # or kibae/onnxruntime-server:1.21.0-linux-cpu\t\n\ndocker pull ${DOCKER_IMAGE}\n\n# simple http backend\ndocker run --name onnxruntime_server_container -d --rm --gpus all \\\n  -p 80:80 \\\n  -v \"/your_model_dir:/app/models\" \\\n  -v \"/your_log_dir:/app/logs\" \\\n  -e \"ONNX_SERVER_SWAGGER_URL_PATH=/api-docs\" \\\n  ${DOCKER_IMAGE}\n```\n\n- More information on using Docker images can be found here.\n    - https://hub.docker.com/r/kibaes/onnxruntime-server\n- [docker-compose.yml](https://github.com/kibae/onnxruntime-server/blob/main/deploy/build-docker/docker-compose.yaml)\n  example is available in the repository.\n\n----\n\n# API\n\n- [HTTP/HTTPS REST API](https://github.com/kibae/onnxruntime-server/wiki/REST-API(HTTP-HTTPS))\n    - API documentation (Swagger) is built in. If you want the server to serve swagger, add\n      the `--swagger-url-path=/swagger/` option at launch. This must be used with the `--http-port` or `--https-port`\n      option.\n      ```shell\n      ./onnxruntime_server --model-dir=YOUR_MODEL_DIR --http-port=8080 --swagger-url-path=/api-docs/\n      ```\n        - After running the server as above, you will be able to access the Swagger UI available\n          at `http://localhost:8080/api-docs/`.\n    - \u003cpicture\u003e\u003cimg src=\"https://cdn.simpleicons.org/swagger/green\" height=\"16\" align=\"center\" /\u003e\u003c/picture\u003e [Swagger Sample](https://kibae.github.io/onnxruntime-server/swagger/)\n- [TCP API](https://github.com/kibae/onnxruntime-server/wiki/TCP-API)\n\n----\n\n# How to use\n\n- A few things have been left out to help you get a rough idea of the usage flow.\n\n## Simple usage examples\n\n#### Example of creating ONNX sessions at server startup\n\n```mermaid\n%%{init: {\n    'sequence': {'noteAlign': 'left', 'mirrorActors': true}\n}}%%\nsequenceDiagram\n    actor A as Administrator\n    box rgb(0, 0, 0, 0.1) \"ONNX Runtime Server\"\n        participant SD as Disk\n        participant SP as Process\n    end\n    actor C as Client\n    Note right of A: You have 3 models to serve.\n    A -\u003e\u003e SD: copy model files to disk.\u003cbr /\u003e\"/var/models/model_A/v1/model.onnx\"\u003cbr /\u003e\"/var/models/model_A/v2/model.onnx\"\u003cbr /\u003e\"/var/models/model_B/20201101/model.onnx\"\n    A -\u003e\u003e SP: Start server with --prepare-model option\n    activate SP\n    Note right of A: onnxruntime_server\u003cbr /\u003e--http-port=8080\u003cbr /\u003e--model-path=/var/models\u003cbr /\u003e--prepare-model=\"model_A:v1(cuda=0) model_A:v2(cuda=0)\"\n    SP --\u003e\u003e SD: Load model\n    Note over SD, SP: Load model from\u003cbr /\u003e\"/var/models/model_A/v1/model.onnx\"\n    SD --\u003e\u003e SP: Model binary\n    activate SP\n    SP --\u003e\u003e SP: Create\u003cbr /\u003eonnxruntime\u003cbr /\u003esession\n    deactivate SP\n    deactivate SP\n    rect rgb(100, 100, 100, 0.3)\n        Note over SD, C: Execute Session\n        C -\u003e\u003e SP: Execute session request\n        activate SP\n        Note over SP, C: POST /api/sessions/model_A/v1\u003cbr /\u003e{\u003cbr /\u003e\"x\": [[1], [2], [3]],\u003cbr /\u003e\"y\": [[2], [3], [4]],\u003cbr /\u003e\"z\": [[3], [4], [5]]\u003cbr /\u003e}\n        activate SP\n        SP --\u003e\u003e SP: Execute\u003cbr /\u003eonnxruntime\u003cbr /\u003esession\n        deactivate SP\n        SP -\u003e\u003e C: Execute session response\n        deactivate SP\n        Note over SP, C: {\u003cbr /\u003e\"output\": [\u003cbr /\u003e[0.6492120623588562],\u003cbr /\u003e[0.7610487341880798],\u003cbr /\u003e[0.8728854656219482]\u003cbr /\u003e]\u003cbr /\u003e}\n    end\n```\n\n#### Example of the client creating and running ONNX sessions\n\n```mermaid\n%%{init: {\n    'sequence': {'noteAlign': 'left', 'mirrorActors': true}\n}}%%\nsequenceDiagram\n    actor A as Administrator\n    box rgb(0, 0, 0, 0.1) \"ONNX Runtime Server\"\n        participant SD as Disk\n        participant SP as Process\n    end\n    actor C as Client\n    Note right of A: You have 3 models to serve.\n    A -\u003e\u003e SD: copy model files to disk.\u003cbr /\u003e\"/var/models/model_A/v1/model.onnx\"\u003cbr /\u003e\"/var/models/model_A/v2/model.onnx\"\u003cbr /\u003e\"/var/models/model_B/20201101/model.onnx\"\n    A -\u003e\u003e SP: Start server\n    Note right of A: onnxruntime_server\u003cbr /\u003e--http-port=8080\u003cbr /\u003e--model-path=/var/models\n    rect rgb(100, 100, 100, 0.3)\n        Note over SD, C: Create Session\n        C -\u003e\u003e SP: Create session request\n        activate SP\n        Note over SP, C: POST /api/sessions\u003cbr /\u003e{\"model\": \"model_A\", \"version\": \"v1\"}\n        SP --\u003e\u003e SD: Load model\n        Note over SD, SP: Load model from\u003cbr /\u003e\"/var/models/model_A/v1/model.onnx\"\n        SD --\u003e\u003e SP: Model binary\n        activate SP\n        SP --\u003e\u003e SP: Create\u003cbr /\u003eonnxruntime\u003cbr /\u003esession\n        deactivate SP\n        SP -\u003e\u003e C: Create session response\n        deactivate SP\n        Note over SP, C: {\u003cbr /\u003e\"model\": \"model_A\",\u003cbr /\u003e\"version\": \"v1\",\u003cbr /\u003e\"created_at\": 1694228106,\u003cbr /\u003e\"execution_count\": 0,\u003cbr /\u003e\"last_executed_at\": 0,\u003cbr /\u003e\"inputs\": {\u003cbr /\u003e\"x\": \"float32[-1,1]\",\u003cbr /\u003e\"y\": \"float32[-1,1]\",\u003cbr /\u003e\"z\": \"float32[-1,1]\"\u003cbr /\u003e},\u003cbr /\u003e\"outputs\": {\u003cbr /\u003e\"output\": \"float32[-1,1]\"\u003cbr /\u003e}\u003cbr /\u003e}\n        Note right of C: 👌 You can know the type and shape\u003cbr /\u003eof the input and output.\n    end\n    rect rgb(100, 100, 100, 0.3)\n        Note over SD, C: Execute Session\n        C -\u003e\u003e SP: Execute session request\n        activate SP\n        Note over SP, C: POST /api/sessions/model_A/v1\u003cbr /\u003e{\u003cbr /\u003e\"x\": [[1], [2], [3]],\u003cbr /\u003e\"y\": [[2], [3], [4]],\u003cbr /\u003e\"z\": [[3], [4], [5]]\u003cbr /\u003e}\n        activate SP\n        SP --\u003e\u003e SP: Execute\u003cbr /\u003eonnxruntime\u003cbr /\u003esession\n        deactivate SP\n        SP -\u003e\u003e C: Execute session response\n        deactivate SP\n        Note over SP, C: {\u003cbr /\u003e\"output\": [\u003cbr /\u003e[0.6492120623588562],\u003cbr /\u003e[0.7610487341880798],\u003cbr /\u003e[0.8728854656219482]\u003cbr /\u003e]\u003cbr /\u003e}\n    end\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkibae%2Fonnxruntime-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkibae%2Fonnxruntime-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkibae%2Fonnxruntime-server/lists"}