https://github.com/liebemama/repo-fastapi
GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.
https://github.com/liebemama/repo-fastapi
ai-server cuda fastapi gpu inference mps plugins pytorch rocm
Last synced: 2 months ago
JSON representation
GPU-ready FastAPI AI inference server with plugin system, supporting CUDA, ROCm, CPU, and macOS MPS.
- Host: GitHub
- URL: https://github.com/liebemama/repo-fastapi
- Owner: liebemama
- License: mit
- Created: 2025-09-13T21:01:34.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-13T21:25:06.000Z (9 months ago)
- Last Synced: 2025-09-13T23:31:36.332Z (9 months ago)
- Topics: ai-server, cuda, fastapi, gpu, inference, mps, plugins, pytorch, rocm
- Language: Python
- Homepage:
- Size: 77.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# π NeuroServe β GPU-Ready FastAPI AI Server
## π Project Status
| Category | Badges |
|---------------|--------|
| **Languages** |    |
| **Framework** |  |
| **ML / GPU** |   |
| **CI** | [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-ubuntu.yml) [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-windows.yml) [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-gpu.yml) [](https://github.com/TamerOnLine/repo-fastapi/actions/workflows/ci-macos.yml) |
| **Code Style**|  |
| **Tests** |  |
| **Docs** | [](docs/API.md) |
| **OS** |    |
| **Version** | [](https://github.com/TamerOnLine/repo-fastapi/releases) |
| **License** |  |
| **Support** | [](https://paypal.me/tameronline) |
| **GitHub** | [](https://github.com/TamerOnLine/repo-fastapi/stargazers) [](https://github.co)
---
## π Overview
**NeuroServe** is an **AI Inference Server** built on **FastAPI**, designed to run seamlessly on **GPU (CUDA/ROCm)**, **CPU**, and **macOS MPS**.
It provides ready-to-use REST APIs, a modular **plugin system**, runtime utilities, and a consistent unified response format β making it the perfect foundation for AI-powered services.
---
## Quick Setup
π§ Virtualenv quick guide: see **[docs/README_venv.md](docs/README_venv.md)**.
---
## π API Documentation
Detailed API reference and usage examples are available here:
β‘οΈ [API Documentation](docs/API.md)
---
## β¨ Key Features
- π **REST APIs out-of-the-box** with Swagger UI (`/docs`) & ReDoc (`/redoc`).
- β‘ **PyTorch integration** with automatic device selection (`cuda`, `cpu`, `mps`, `rocm`).
- π **Plugin system** to extend functionality with custom AI models or services.
- π **Runtime tools** for GPU info, warm-up routines, and environment inspection.
- π§ **Built-in utilities** like a toy model and model size calculator.
- π§± **Unified JSON responses** for predictable API behavior.
- π§ͺ **Cross-platform CI/CD** (Ubuntu, Windows, macOS, Self-hosted GPU).
---
## π Project Structure
```text
repo-fastapi/
ββ app/ # application package
β ββ core/ # settings & configuration
β β ββ config.py # app settings (Pydantic v2)
β ββ routes/ # HTTP API routes
β ββ plugins/ # extensions / integrations
β ββ workflows/ # workflow definitions & orchestrators
β ββ templates/ # Jinja templates (if used)
ββ docs/ # documentation & generated diagrams
β ββ ARCHITECTURE.md # main architecture report
β ββ architecture.mmd # Mermaid source (no fences)
β ββ architecture.html # browser-friendly diagram
β ββ architecture.png # exported PNG (if mmdc installed)
β ββ runtime.mmd # runtime/infra diagram
β ββ imports.mmd # Python import graph (if generated)
β ββ endpoints.md # discovered API endpoints (if generated)
β ββ README_venv.md # virtualenv quick guide
ββ tools/ # project tooling & scripts
β ββ build_workflows_index.py # builds docs/workflows-overview.md
ββ tests/ # test suite
β ββ test_run.py # smoke test for app startup
ββ gen_arch.py # architecture generator script
ββ requirements.txt # runtime dependencies
ββ requirements-dev.txt # dev dependencies (ruff, pre-commit, pytest, ...)
ββ .pre-commit-config.yaml # pre-commit hooks configuration
ββ README.md # project overview & usage
ββ LICENSE # project license
```
---
## ποΈ Architecture
For a deeper look into the internal design, modules, and flow of the system, see:
β‘οΈ [Architecture Guide](docs/ARCHITECTURE.md)
---
## βοΈ Installation
### 1. Clone the repository
```bash
git clone https://github.com/USERNAME/gpu-server.git
cd gpu-server
```
### 2. Create a virtual environment
```bash
python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows
.venv\Scripts\activate
```
### 3. Install dependencies
```bash
pip install -r requirements.txt
```
### 4. (Optional) Auto-install PyTorch
```bash
python -m scripts.install_torch --gpu # or --cpu / --rocm
```
---
## π Running the Server
```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
Available endpoints:
- π **Home** β [http://localhost:8000/](http://localhost:8000/)
- β€οΈ **Health** β [http://localhost:8000/health](http://localhost:8000/health)
- π **Swagger UI** β [http://localhost:8000/docs](http://localhost:8000/docs)
- π **ReDoc** β [http://localhost:8000/redoc](http://localhost:8000/redoc)
- π§ **Env Summary** β [http://localhost:8000/env](http://localhost:8000/env)
- π **Plugins** β [http://localhost:8000/plugins](http://localhost:8000/plugins)
Quick test:
```bash
curl http://localhost:8000/health
# {"status": "ok"}
```
---
## π Plugin System
Each plugin lives in `app/plugins//` and typically includes:
```
manifest.json
plugin.py # Defines Plugin class inheriting AIPlugin
README.md # Documentation
```
API Endpoints:
- `GET /plugins` β list all plugins with metadata.
- `POST /plugins/{name}/{task}` β execute a task inside a plugin.
Example:
```python
from app.plugins.base import AIPlugin
class Plugin(AIPlugin):
name = "my_plugin"
tasks = ["infer"]
def load(self):
# Load models/resources once
...
def infer(self, payload: dict) -> dict:
return {"message": "ok", "payload": payload}
```
---
## Workflow System
A lightweight orchestration layer to chain plugins into reproducible pipelines (steps β plugin + task + payload).
All endpoints are exposed under `/workflow`.
- **Endpoints:** `GET /workflow/ping`, `GET /workflow/presets`, `POST /workflow/run`
- **System Guide (EN):** [app/workflows/README.md](app/workflows/README.md)
- **Workflows Index:** [docs/workflows-overview.md](docs/workflows-overview.md)
---
## π Available Workflows
A full list of available workflows with their versions, tags, and step counts is maintained in the **Workflows Index**.
β‘οΈ [View Workflows Index](docs/workflows-overview.md)
---
## π§© Available Plugins
A full list of available plugins with their providers, tasks, and source files is maintained in the **Plugins Index**.
β‘οΈ [View Plugins Index](docs/plugins-overview.md)
---
## π§ͺ Development
Install dev dependencies:
```bash
pip install -r requirements-dev.txt
pre-commit install
```
Run tests:
```bash
pytest
```
Ruff (lint + format check) runs automatically via pre-commit hooks.
---
## π§Ή Code Style
We enforce a clean and consistent code style using **Ruff** (linter, import sorter, and formatter).
For full details on configuration, commands, helper scripts, and CI integration, see:
β‘οΈ [Code Style & Linting Guide](docs/CODE_STYLE_GUIDE.md)
---
## π¦ Model Management
Download models in advance:
```bash
python -m scripts.prefetch_models
```
Models are cached in `models_cache/` (see `docs/LICENSES.md` for licenses).
---
## π Deployment Notes
- Use `uvicorn`/`hypercorn` behind a reverse proxy (e.g., Nginx).
- Configure environment with `APP_*` variables instead of hardcoding.
- Enable HTTPS and configure CORS carefully in production.
---
## π Changelog
A complete history of changes and improvements:
β‘οΈ [CHANGELOG](docs/CHANGELOG.md)
## π¦ Release Notes
Details about the initial release v0.1.0:
β‘οΈ [Release Notes v0.1.0](docs/RELEASE_NOTES_v0.1.0.md)
---
## πΊοΈ Roadmap
- [ ] Add `/cuda` endpoint β return detailed CUDA info.
- [ ] Add `/warmup` endpoint for GPU readiness.
- [ ] Provide a **plugin generator CLI**.
- [ ] Implement API Key / JWT authentication.
- [ ] Example plugins: translation, summarization, image classification.
- [ ] Docker support for one-click deployment.
- [ ] Benchmark suite for model inference speed.
---
## π€ Contributing
Contributions are welcome!
- Open **Issues** for bugs or ideas.
- Submit **Pull Requests** for improvements.
- Follow style guidelines (Ruff + pre-commit).
---
## π License
Licensed under the **MIT License** β see [LICENSE](./LICENSE).
### π Model Licenses
Some AI/ML models are licensed separately β see [Model Licenses](docs/LICENSES.md).
---