https://github.com/tameronline/repo-fastapi
GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)
https://github.com/tameronline/repo-fastapi
ai-server cuda deep-learning fastapi inference mps nlp plugins pytorch rocm
Last synced: about 2 months ago
JSON representation
GPU-Ready FastAPI AI Inference Server with plugin system (CUDA/CPU/MPS/ROCm)
- Host: GitHub
- URL: https://github.com/tameronline/repo-fastapi
- Owner: TamerOnLine
- License: mit
- Created: 2025-09-12T19:37:21.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-15T02:31:03.000Z (9 months ago)
- Last Synced: 2025-10-21T14:50:27.734Z (8 months ago)
- Topics: ai-server, cuda, deep-learning, fastapi, inference, mps, nlp, plugins, pytorch, rocm
- Language: Python
- Homepage:
- Size: 137 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π NeuroServe β GPU-Ready FastAPI AI Server
## π Project Status
| Category | Badges |
|---------------|--------|
| **Languages** |    |
| **Framework** |  |
| **ML / GPU** |   |
| **CI** | [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-ubuntu.yml) [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-windows.yml) [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-gpu.yml) [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-macos.yml) |
| **License** |  |
| **Support** | [](https://paypal.me/tameronline) |
---
## π Overview
**NeuroServe** is an **AI Inference Server** built on **FastAPI**, designed to run seamlessly on **GPU (CUDA/ROCm)**, **CPU**, and **macOS MPS**.
It provides ready-to-use REST APIs, a modular **plugin system**, runtime utilities, and a consistent unified response format β making it the perfect foundation for AI-powered services.
---
## β¨ Key Features
- π **REST APIs out-of-the-box** with Swagger UI (`/docs`) & ReDoc (`/redoc`).
- β‘ **PyTorch integration** with automatic device selection (`cuda`, `cpu`, `mps`, `rocm`).
- π **Plugin system** to extend functionality with custom AI models or services.
- π **Runtime tools** for GPU info, warm-up routines, and environment inspection.
- π§ **Built-in utilities** like a toy model and model size calculator.
- π§± **Unified JSON responses** for predictable API behavior.
- π§ͺ **Cross-platform CI/CD** (Ubuntu, Windows, macOS, Self-hosted GPU).
---
## π Project Structure
```text
gpu-server/
βββ app/ # Main application code
β βββ core/ # Config, logging, error handling
β βββ routes/ # API routes (auth, inference, plugins, uploads)
β βββ plugins/ # Plugin system (dummy, neu_server, base, loader)
β βββ utils/ # Unified responses
β βββ static/ # Static assets (CSS, favicon)
β βββ templates/ # HTML templates
β βββ main.py # FastAPI entrypoint
β βββ runtime.py # Device/GPU management
β βββ toy_model.py # Example PyTorch model
βββ scripts/ # Install torch, prefetch models, test API
βββ tests/ # Unit & integration tests
βββ models_cache/ # Model cache (HuggingFace / Torch)
βββ docs/ # Documentation & model licenses
βββ logs/ # Errors & plugin logs
βββ ...
```
---
## βοΈ Installation
### 1. Clone the repository
```bash
git clone https://github.com/USERNAME/gpu-server.git
cd gpu-server
```
### 2. Create a virtual environment
```bash
python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activate
```
### 3. Install dependencies
```bash
pip install -r requirements.txt
```
### 4. (Optional) Auto-install PyTorch
```bash
python -m scripts.install_torch --gpu # or --cpu / --rocm
```
---
## π Running the Server
```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
Available endpoints:
- π **Home** β [http://localhost:8000/](http://localhost:8000/)
- β€οΈ **Health** β [http://localhost:8000/health](http://localhost:8000/health)
- π **Swagger UI** β [http://localhost:8000/docs](http://localhost:8000/docs)
- π **ReDoc** β [http://localhost:8000/redoc](http://localhost:8000/redoc)
- π§ **Env Summary** β [http://localhost:8000/env](http://localhost:8000/env)
- π **Plugins** β [http://localhost:8000/plugins](http://localhost:8000/plugins)
Quick test:
```bash
curl http://localhost:8000/health
# {"status": "ok"}
```
---
## π Plugin System
Each plugin lives in `app/plugins//` and typically includes:
```
manifest.json
plugin.py # Defines Plugin class inheriting AIPlugin
README.md # Documentation
```
API Endpoints:
- `GET /plugins` β list all plugins with metadata.
- `POST /plugins/{name}/{task}` β execute a task inside a plugin.
Example:
```python
from app.plugins.base import AIPlugin
class Plugin(AIPlugin):
name = "my_plugin"
tasks = ["infer"]
def load(self):
# Load models/resources once
...
def infer(self, payload: dict) -> dict:
return {"message": "ok", "payload": payload}
```
---
## π§ͺ Development
Install dev dependencies:
```bash
pip install -r requirements-dev.txt
pre-commit install
```
Run tests:
```bash
pytest
```
Ruff (lint + format check) runs automatically via pre-commit hooks.
---
## π¦ Model Management
Download models in advance:
```bash
python -m scripts.prefetch_models
```
Models are cached in `models_cache/` (see `docs/LICENSES.md` for licenses).
---
## π Deployment Notes
- Use `uvicorn`/`hypercorn` behind a reverse proxy (e.g., Nginx).
- Configure environment with `APP_*` variables instead of hardcoding.
- Enable HTTPS and configure CORS carefully in production.
---
## πΊοΈ Roadmap
- [ ] Add `/cuda` endpoint β return detailed CUDA info.
- [ ] Add `/warmup` endpoint for GPU readiness.
- [ ] Provide a **plugin generator CLI**.
- [ ] Implement API Key / JWT authentication.
- [ ] Example plugins: translation, summarization, image classification.
- [ ] Docker support for one-click deployment.
- [ ] Benchmark suite for model inference speed.
---
## π€ Contributing
Contributions are welcome!
- Open **Issues** for bugs or ideas.
- Submit **Pull Requests** for improvements.
- Follow style guidelines (Ruff + pre-commit).
---
## π License
Licensed under the **MIT License** β see [LICENSE](./LICENSE).
β οΈ AI models may have their own licenses (see `docs/LICENSES.md`).