https://github.com/ramirezramiro/onnx-fastapi

Starter kit for ONNX models: predictable CPU performance, simple REST, fast to demo
https://github.com/ramirezramiro/onnx-fastapi

computer-vision cpu docker fastapi inference onnx python

Last synced: 3 months ago
JSON representation

Starter kit for ONNX models: predictable CPU performance, simple REST, fast to demo

Host: GitHub
URL: https://github.com/ramirezramiro/onnx-fastapi
Owner: ramirezramiro
Created: 2025-09-26T12:18:43.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-10-01T12:30:27.000Z (9 months ago)
Last Synced: 2025-11-22T23:03:10.308Z (7 months ago)
Topics: computer-vision, cpu, docker, fastapi, inference, onnx, python
Language: Python
Homepage:
Size: 8.77 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          **Prerequisites**

WSL 2 (Windows Subsystem for Linux) → install from Microsoft Store:

https://apps.microsoft.com/detail/9p9tqf7mrm4r?hl=en-US&gl=TW

**CPU-optimized inference (ONNX Runtime)**

Model: MobileNetV3-Small (ImageNet)

Hardware: Colab CPU

Latency (avg): ~4.23 ms/img

Throughput: ~236.27 FPS

Export: PyTorch → ONNX (opset 13, dynamic batch), ORT graph optimizations ON

# onnx-fastapi (CPU)

Minimal FastAPI service for ONNX inference on CPU (MobileNetV3-Small example).

## Run and Build (Docker)

```bash

docker build -t onnx-fastapi:cpu .

docker run --rm -p 8000:8000 onnx-fastapi:cpu

# open http://localhost:8000/docs

```

## API Health check

```bash

curl http://localhost:8000/health

# {"status":"ok"}

```

## Prediction

```bash

# Replace with your image path

curl -X POST "http://localhost:8000/predict" \

  -H "accept: application/json" \

  -H "Content-Type: multipart/form-data" \

  -F "file=@samples/cat.jpg"

```

## Prediction

```json

{

  "top1": {"label": "tabby_cat", "prob": 0.87},

  "top5": [

    {"label": "tabby_cat", "prob": 0.87},

    {"label": "tiger_cat", "prob": 0.07},

    {"label": "Egyptian_cat", "prob": 0.03},

    {"label": "lynx", "prob": 0.02},

    {"label": "cougar", "prob": 0.01}

  ]

}

```

## Architecture Diagram

```mermaid

flowchart LR

  A[Client / cURL / App] -->|HTTP: /predict| B[FastAPI Service]

  B -->|NumPy tensors| C[ONNX Runtime]

  C -->|CPU| D[(model.onnx)]

  subgraph DockerContainer["Docker Container"]

    B

    C

    D

  end

  E[(Host FS)]

  D -->|"bind-mount (optional)"| E

```

## Final takeaway 

FastAPI exposes /health and /predict.

ONNX Runtime (CPU) runs inference—no GPU required.

Model can be baked into the image or bind-mounted at runtime for quick swaps.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ramirezramiro/onnx-fastapi

Awesome Lists containing this project

README