An open API service indexing awesome lists of open source software.

https://github.com/tameronline/repo-fastapi

GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)
https://github.com/tameronline/repo-fastapi

ai-server cuda deep-learning fastapi inference mps nlp plugins pytorch rocm

Last synced: about 2 months ago
JSON representation

GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)

Awesome Lists containing this project

README

          

# πŸš€ NeuroServe β€” GPU-Ready FastAPI AI Server

## πŸ“Š Project Status

| Category | Badges |
|---------------|--------|
| **Languages** | ![Python](https://img.shields.io/badge/Python-3.12%2B-blue) ![HTML](https://img.shields.io/badge/HTML-5-orange) ![CSS](https://img.shields.io/badge/CSS-3-blueviolet) |
| **Framework** | ![FastAPI](https://img.shields.io/badge/FastAPI-0.116.x-009688) |
| **ML / GPU** | ![PyTorch](https://img.shields.io/badge/PyTorch-2.6.x-ee4c2c) ![CUDA Ready](https://img.shields.io/badge/CUDA-Ready-76B900?logo=nvidia&logoColor=white) |
| **CI** | [![Ubuntu CI](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-ubuntu.yml/badge.svg)](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-ubuntu.yml) [![Windows CI](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-windows.yml/badge.svg)](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-windows.yml) [![Windows GPU CI](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-gpu.yml/badge.svg)](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-gpu.yml) [![macOS CI](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-macos.yml/badge.svg)](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-macos.yml) |
| **License** | ![License](https://img.shields.io/badge/License-MIT-green) |
| **Support** | [![Sponsor](https://img.shields.io/badge/Sponsor-πŸ’–-pink)](https://paypal.me/tameronline) |

---

## πŸ“– Overview

**NeuroServe** is an **AI Inference Server** built on **FastAPI**, designed to run seamlessly on **GPU (CUDA/ROCm)**, **CPU**, and **macOS MPS**.
It provides ready-to-use REST APIs, a modular **plugin system**, runtime utilities, and a consistent unified response format β€” making it the perfect foundation for AI-powered services.

---

## ✨ Key Features

- 🌐 **REST APIs out-of-the-box** with Swagger UI (`/docs`) & ReDoc (`/redoc`).
- ⚑ **PyTorch integration** with automatic device selection (`cuda`, `cpu`, `mps`, `rocm`).
- πŸ”Œ **Plugin system** to extend functionality with custom AI models or services.
- πŸ“Š **Runtime tools** for GPU info, warm-up routines, and environment inspection.
- 🧠 **Built-in utilities** like a toy model and model size calculator.
- 🧱 **Unified JSON responses** for predictable API behavior.
- πŸ§ͺ **Cross-platform CI/CD** (Ubuntu, Windows, macOS, Self-hosted GPU).

---

## πŸ“‚ Project Structure

```text
gpu-server/
β”œβ”€β”€ app/ # Main application code
β”‚ β”œβ”€β”€ core/ # Config, logging, error handling
β”‚ β”œβ”€β”€ routes/ # API routes (auth, inference, plugins, uploads)
β”‚ β”œβ”€β”€ plugins/ # Plugin system (dummy, neu_server, base, loader)
β”‚ β”œβ”€β”€ utils/ # Unified responses
β”‚ β”œβ”€β”€ static/ # Static assets (CSS, favicon)
β”‚ β”œβ”€β”€ templates/ # HTML templates
β”‚ β”œβ”€β”€ main.py # FastAPI entrypoint
β”‚ β”œβ”€β”€ runtime.py # Device/GPU management
β”‚ └── toy_model.py # Example PyTorch model
β”œβ”€β”€ scripts/ # Install torch, prefetch models, test API
β”œβ”€β”€ tests/ # Unit & integration tests
β”œβ”€β”€ models_cache/ # Model cache (HuggingFace / Torch)
β”œβ”€β”€ docs/ # Documentation & model licenses
β”œβ”€β”€ logs/ # Errors & plugin logs
└── ...
```

---

## βš™οΈ Installation

### 1. Clone the repository
```bash
git clone https://github.com/USERNAME/gpu-server.git
cd gpu-server
```

### 2. Create a virtual environment
```bash
python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activate
```

### 3. Install dependencies
```bash
pip install -r requirements.txt
```

### 4. (Optional) Auto-install PyTorch
```bash
python -m scripts.install_torch --gpu # or --cpu / --rocm
```

---

## πŸš€ Running the Server

```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```

Available endpoints:
- 🏠 **Home** β†’ [http://localhost:8000/](http://localhost:8000/)
- ❀️ **Health** β†’ [http://localhost:8000/health](http://localhost:8000/health)
- πŸ“š **Swagger UI** β†’ [http://localhost:8000/docs](http://localhost:8000/docs)
- πŸ“˜ **ReDoc** β†’ [http://localhost:8000/redoc](http://localhost:8000/redoc)
- 🧭 **Env Summary** β†’ [http://localhost:8000/env](http://localhost:8000/env)
- πŸ”Œ **Plugins** β†’ [http://localhost:8000/plugins](http://localhost:8000/plugins)

Quick test:
```bash
curl http://localhost:8000/health
# {"status": "ok"}
```

---

## πŸ”Œ Plugin System

Each plugin lives in `app/plugins//` and typically includes:

```
manifest.json
plugin.py # Defines Plugin class inheriting AIPlugin
README.md # Documentation
```

API Endpoints:
- `GET /plugins` β€” list all plugins with metadata.
- `POST /plugins/{name}/{task}` β€” execute a task inside a plugin.

Example:
```python
from app.plugins.base import AIPlugin

class Plugin(AIPlugin):
name = "my_plugin"
tasks = ["infer"]

def load(self):
# Load models/resources once
...

def infer(self, payload: dict) -> dict:
return {"message": "ok", "payload": payload}
```

---

## πŸ§ͺ Development

Install dev dependencies:
```bash
pip install -r requirements-dev.txt
pre-commit install
```

Run tests:
```bash
pytest
```

Ruff (lint + format check) runs automatically via pre-commit hooks.

---

## πŸ“¦ Model Management

Download models in advance:
```bash
python -m scripts.prefetch_models
```

Models are cached in `models_cache/` (see `docs/LICENSES.md` for licenses).

---

## 🏭 Deployment Notes

- Use `uvicorn`/`hypercorn` behind a reverse proxy (e.g., Nginx).
- Configure environment with `APP_*` variables instead of hardcoding.
- Enable HTTPS and configure CORS carefully in production.

---

## πŸ—ΊοΈ Roadmap

- [ ] Add `/cuda` endpoint β†’ return detailed CUDA info.
- [ ] Add `/warmup` endpoint for GPU readiness.
- [ ] Provide a **plugin generator CLI**.
- [ ] Implement API Key / JWT authentication.
- [ ] Example plugins: translation, summarization, image classification.
- [ ] Docker support for one-click deployment.
- [ ] Benchmark suite for model inference speed.

---

## 🀝 Contributing

Contributions are welcome!
- Open **Issues** for bugs or ideas.
- Submit **Pull Requests** for improvements.
- Follow style guidelines (Ruff + pre-commit).

---

## πŸ“œ License

Licensed under the **MIT License** β€” see [LICENSE](./LICENSE).
⚠️ AI models may have their own licenses (see `docs/LICENSES.md`).