https://github.com/dwain-barnes/kyutai-tts-openai-api
A Docker-based OpenAI-compatible Text-to-Speech API server powered by Kyutai's TTS models with GPU acceleration support.
https://github.com/dwain-barnes/kyutai-tts-openai-api
docker fastapi kyutai openai-api openai-compatible speech-ai text-to-speech tts tts-engines voice-generation
Last synced: 3 months ago
JSON representation
A Docker-based OpenAI-compatible Text-to-Speech API server powered by Kyutai's TTS models with GPU acceleration support.
- Host: GitHub
- URL: https://github.com/dwain-barnes/kyutai-tts-openai-api
- Owner: dwain-barnes
- Created: 2025-07-12T09:06:33.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-07-12T09:54:47.000Z (3 months ago)
- Last Synced: 2025-07-12T11:34:20.370Z (3 months ago)
- Topics: docker, fastapi, kyutai, openai-api, openai-compatible, speech-ai, text-to-speech, tts, tts-engines, voice-generation
- Language: Python
- Homepage:
- Size: 383 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kyutai TTS OpenAI-Compatible API
A Docker-based OpenAI-compatible Text-to-Speech API server powered by Kyutai's TTS models with GPU acceleration support.
## Features
- OpenAI-Compatible API endpoints
- GPU acceleration with CUDA support
- Multiple audio formats (MP3, WAV, FLAC, AAC, Opus, PCM)
- Voice selection and speed control
- Complete Docker solution## Quick Start
### Prerequisites
- Docker with Docker Compose
- NVIDIA GPU with CUDA support (recommended)
- 8GB+ GPU memory, 16GB+ system RAM### Installation
```bash
# Clone repository
git clone https://github.com/dwain-barnes/kyutai-tts-openai-api.git
cd kyutai-tts-openai-apimkdir input output cache scripts
# Build and start
docker compose build --no-cache
docker compose up -d# Test the API
python api_test_script.py
```## Usage
### Python (OpenAI SDK)
```python
from openai import OpenAIclient = OpenAI(
api_key="dummy-key",
base_url="http://localhost:8000/v1"
)response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello! This is Kyutai TTS speaking.",
response_format="wav"
)with open("speech.wav", "wb") as f:
f.write(response.content)
```### cURL
```bash
curl -X POST "http://localhost:8000/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello world!",
"voice": "alloy",
"response_format": "mp3"
}' \
--output speech.mp3
```## API Reference
### POST /v1/audio/speech
Generate audio from text input.
**Request:**
```json
{
"model": "tts-1",
"input": "Text to synthesize",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}
```**Parameters:**
- `model`: "tts-1" or "tts-1-hd" (required)
- `input`: Text 1-4096 characters (required)
- `voice`: alloy, echo, fable, onyx, nova, shimmer (optional)
- `response_format`: mp3, wav, flac, aac, opus, pcm (optional)
- `speed`: 0.25-4.0 (optional)### GET /v1/models
List available models.
### GET /health
Health check endpoint.
## Project Structure
```
kyutai-tts-openai-api/
├── README.md
├── Dockerfile
├── docker-compose.yml
├── api_server.py
├── dependency_check.py
├── api_test_script.py
├── input/
├── output/
├── cache/
└── scripts/
```## Configuration
### Environment Variables
- `CUDA_VISIBLE_DEVICES`: GPU device selection
- `HF_HOME`: Hugging Face cache directory
- `TRANSFORMERS_CACHE`: Model cache directory## Troubleshooting
### Common Issues
**Container won't start:**
```bash
docker logs kyutai-tts-gpu
```**Model download fails:**
- Check internet connection
- Ensure sufficient disk space (2GB+ needed)**Slow generation:**
```bash
# Verify GPU usage
docker exec kyutai-tts-gpu python -c "import torch; print('CUDA:', torch.cuda.is_available())"
```**Dependencies missing:**
```bash
docker exec kyutai-tts-gpu python dependency_check.py
```## Security Notes
- Designed for local/internal use
- No authentication implemented
- Add proper auth for production use
- No rate limiting## Kyutai License
The present code is provided under the MIT license for the Python parts, and Apache license for the Rust backend. The web client code is provided under the MIT license. Note that parts of this code is based on AudioCraft, released under the MIT license.
The weights for the speech-to-text models are released under the CC-BY 4.0 license.## Acknowledgments
- [Kyutai Labs](https://github.com/kyutai-labs) for TTS models