https://github.com/ramirezramiro/onnx-fastapi
Starter kit for ONNX models: predictable CPU performance, simple REST, fast to demo
https://github.com/ramirezramiro/onnx-fastapi
computer-vision cpu docker fastapi inference onnx python
Last synced: 3 months ago
JSON representation
Starter kit for ONNX models: predictable CPU performance, simple REST, fast to demo
- Host: GitHub
- URL: https://github.com/ramirezramiro/onnx-fastapi
- Owner: ramirezramiro
- Created: 2025-09-26T12:18:43.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-10-01T12:30:27.000Z (9 months ago)
- Last Synced: 2025-11-22T23:03:10.308Z (7 months ago)
- Topics: computer-vision, cpu, docker, fastapi, inference, onnx, python
- Language: Python
- Homepage:
- Size: 8.77 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
**Prerequisites**
WSL 2 (Windows Subsystem for Linux) → install from Microsoft Store:
https://apps.microsoft.com/detail/9p9tqf7mrm4r?hl=en-US&gl=TW
**CPU-optimized inference (ONNX Runtime)**
Model: MobileNetV3-Small (ImageNet)
Hardware: Colab CPU
Latency (avg): ~4.23 ms/img
Throughput: ~236.27 FPS
Export: PyTorch → ONNX (opset 13, dynamic batch), ORT graph optimizations ON
# onnx-fastapi (CPU)
Minimal FastAPI service for ONNX inference on CPU (MobileNetV3-Small example).
## Run and Build (Docker)
```bash
docker build -t onnx-fastapi:cpu .
docker run --rm -p 8000:8000 onnx-fastapi:cpu
# open http://localhost:8000/docs
```
## API Health check
```bash
curl http://localhost:8000/health
# {"status":"ok"}
```
## Prediction
```bash
# Replace with your image path
curl -X POST "http://localhost:8000/predict" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@samples/cat.jpg"
```
## Prediction
```json
{
"top1": {"label": "tabby_cat", "prob": 0.87},
"top5": [
{"label": "tabby_cat", "prob": 0.87},
{"label": "tiger_cat", "prob": 0.07},
{"label": "Egyptian_cat", "prob": 0.03},
{"label": "lynx", "prob": 0.02},
{"label": "cougar", "prob": 0.01}
]
}
```
## Architecture Diagram
```mermaid
flowchart LR
A[Client / cURL / App] -->|HTTP: /predict| B[FastAPI Service]
B -->|NumPy tensors| C[ONNX Runtime]
C -->|CPU| D[(model.onnx)]
subgraph DockerContainer["Docker Container"]
B
C
D
end
E[(Host FS)]
D -->|"bind-mount (optional)"| E
```
## Final takeaway
FastAPI exposes /health and /predict.
ONNX Runtime (CPU) runs inference—no GPU required.
Model can be baked into the image or bind-mounted at runtime for quick swaps.