https://github.com/manascb1344/zonos-api
Production-ready FastAPI wrapper for Zonos TTS models with GPU acceleration, voice cloning, and emotion control. Supports both Transformer and Hybrid variants. ⚠️ UNSTABLE API - INITIAL RELEASE
https://github.com/manascb1344/zonos-api
ai api docker fastapi python text-to-speech tts zonos zyphra
Last synced: 8 months ago
JSON representation
Production-ready FastAPI wrapper for Zonos TTS models with GPU acceleration, voice cloning, and emotion control. Supports both Transformer and Hybrid variants. ⚠️ UNSTABLE API - INITIAL RELEASE
- Host: GitHub
- URL: https://github.com/manascb1344/zonos-api
- Owner: manascb1344
- License: apache-2.0
- Created: 2025-02-10T20:44:52.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-22T19:08:03.000Z (over 1 year ago)
- Last Synced: 2025-02-22T19:20:19.988Z (over 1 year ago)
- Topics: ai, api, docker, fastapi, python, text-to-speech, tts, zonos, zyphra
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 26
- Watchers: 2
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Zonos API
> ⚠️ **WARNING: UNSTABLE API - INITIAL RELEASE** ⚠️
>
> This API is currently in its initial release phase (v1.0.0) and is considered unstable.
> Breaking changes may occur without notice. Use in production at your own risk.
> For development and testing purposes only.
A production-grade FastAPI implementation of the Zonos Text-to-Speech model.
## Credits
This API is built on top of the [Zonos-v0.1-hybrid](https://huggingface.co/Zyphra/Zonos-v0.1-hybrid) and [Zonos-v0.1-transformer](https://huggingface.co/Zyphra/Zonos-v0.1-transformer) models created by [Zyphra](https://huggingface.co/Zyphra). The models feature:
- Zero-shot TTS with voice cloning capabilities
- Support for multiple languages (100+ languages via eSpeak-ng)
- High-quality 44kHz audio output
- Fine-grained control over speaking rate, pitch, audio quality, and emotions
- Real-time performance (~2x real-time on RTX 4090)
For more information, visit the model cards on Hugging Face: [Hybrid](https://huggingface.co/Zyphra/Zonos-v0.1-hybrid) | [Transformer](https://huggingface.co/Zyphra/Zonos-v0.1-transformer).
## Features
- FastAPI-based REST API for Zonos Text-to-Speech model
- Support for both Transformer and Hybrid model variants
- Docker and docker-compose support with NVIDIA GPU acceleration
- Production-ready with Gunicorn workers and optimizations
- Prometheus and Grafana monitoring integration
- Health checks and comprehensive logging
- CORS support and Swagger documentation
- Voice cloning and audio continuation support
- Fine-grained emotion and audio quality control
## Quick Start
### Using Pre-built Image
The fastest way to get started is using our pre-built Docker image:
```bash
docker pull ghcr.io/manascb1344/zonos-api-gpu:v1.0.0
docker run -d \
--name zonos-api-gpu \
--gpus all \
-p 8000:8000 \
-e CUDA_VISIBLE_DEVICES=0 \
zonos-api-gpu
```
### Manual Installation
1. Clone the repository with submodules:
```bash
git clone --recursive https://github.com/manascb1344/zonos-api
cd zonos-api
```
The API will be available at `http://localhost:8000`
## Running with Docker
1. Build the container:
```bash
docker build -t zonos-api .
```
2. Run the container:
```bash
docker run -d \
--name zonos-api \
--gpus all \
-p 8000:8000 \
-e CUDA_VISIBLE_DEVICES=0 \
zonos-api
```
## Environment Variables
- `CUDA_VISIBLE_DEVICES`: Specify which GPU(s) to use (default: 0)
- `USE_GPU`: Enable/disable GPU usage (default: true)
## Requirements
- Docker with NVIDIA Container Toolkit installed
- NVIDIA GPU with CUDA support
- At least 8GB of GPU memory recommended
## Verifying the Installation
Check if the API is running:
```bash
curl http://localhost:8000/health
```
## API Endpoints
### GET /
Root endpoint that returns basic API information
### GET /models
Returns a list of available TTS models
### GET /languages
Returns a list of supported languages
### GET /model/{model_name}/conditioners
Returns available conditioners for a specific model
### POST /synthesize
Generate speech from text. Example request:
```json
{
"model_choice": "Zyphra/Zonos-v0.1-transformer",
"text": "Hello, this is a test.",
"language": "en-us",
"emotion_values": [1.0, 0.05, 0.05, 0.05, 0.05, 0.05, 0.1, 0.2],
"vq_score": 0.78,
"cfg_scale": 2.0,
"min_p": 0.15
}
```
## Environment Variables
- `USE_GPU`: Set to "true" to enable GPU acceleration (default: true)
- `PYTHONPATH`: Set to the application root directory
## GPU Support
The API uses NVIDIA GPU acceleration by default. Make sure you have:
1. NVIDIA GPU with CUDA support
2. NVIDIA drivers installed
3. NVIDIA Container Toolkit installed and configured
## Development
### Prerequisites
- Python 3.10+
- NVIDIA GPU with CUDA support (recommended)
- Docker and docker-compose (for containerized deployment)
### Local Development
```bash
# Start in development mode
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Or with docker-compose
docker-compose up --build
```
## License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.