https://github.com/fakehec/coqui-tts-local-server

High-performance Coqui TTS API server with a hybrid "Hot/Cold" worker architecture
https://github.com/fakehec/coqui-tts-local-server

concurrency coqui-tts fastapi local-ai open-webui openai-api openclaw personality-tuning privacy-focused python self-hosted text-to-speech tts voice-cloning xtts-v2

Last synced: 2 months ago
JSON representation

High-performance Coqui TTS API server with a hybrid "Hot/Cold" worker architecture

Host: GitHub
URL: https://github.com/fakehec/coqui-tts-local-server
Owner: fakehec
Created: 2026-02-27T03:53:14.000Z (3 months ago)
Default Branch: master
Last Pushed: 2026-04-04T03:14:05.000Z (2 months ago)
Last Synced: 2026-04-04T03:25:53.087Z (2 months ago)
Topics: concurrency, coqui-tts, fastapi, local-ai, open-webui, openai-api, openclaw, personality-tuning, privacy-focused, python, self-hosted, text-to-speech, tts, voice-cloning, xtts-v2
Language: Python
Homepage:
Size: 1.58 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md

Awesome Lists containing this project

README

# coqui-tts-local-server

High-performance Coqui TTS API server with a hybrid "Hot/Cold" worker architecture.

**Ideal for locally running installations of agents like OpenClaw or Open-WebUI, where the media should not leave the private local domain.**

## ⚖️ License & Terms of Service (IMPORTANT)

This server uses **Coqui TTS**, which is released under various licenses depending on the model.
- **XTTS v2** and several other models are released under the **Coqui Public Model License (CPML)**.
- **NON-COMMERCIAL USE ONLY**: Usage is free for personal and non-commercial projects.
- **COMMERCIAL USE**: If you intend to use this for commercial purposes, you **must** purchase a commercial license from Coqui (licensing@coqui.ai).
- By running the installation scripts, you acknowledge and agree to these terms.

## 🚀 Key Features

- **Hybrid Concurrency:**
- **Hot Worker:** Primary model resident in VRAM for sub-second (XTTSv2 ~1.0s) inference.
- **Cold Workers:** Spawns on-demand subprocesses on GPU when the main lane is busy.
- **GPU Accelerated:** Native support for NVIDIA CUDA via `torch`, ensuring ultra-fast inference and high-quality synthesis.
- **OpenAI Compatible:** Native support for OpenAI parameters (`model`, `voice`, `speed`, `response_format`). Includes `GET /v1/models` for client autodiscovery.
- **Streaming:** `POST /v1/audio/speech/stream` delivers chunked WAV audio in real time via XTTS-v2's `inference_stream()` (Hot Lane only).
- **Personality Tuning:** Full control over synthesis expressiveness via parameters like `temperature`, `top_p/k`, and `penalties`.
- **Multilingual Excellence:** Native support for 16 languages: `en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, hu, ko, ja` (English by default).
- **Intelligent Caching:** MD5-based caching for zero-latency repeated requests. Configurable TTL via `CACHE_TTL_MINUTES`.
- **Health Endpoint:** `GET /health` exposes server version, model name, and hot worker status for proxies and Docker healthchecks.

## 📦 Installation & Setup

### 1. Prerequisites (Debian/Ubuntu)
Install the following system dependencies first:
```bash
sudo apt update && sudo apt install -y espeak-ng curl file ffmpeg python3.12 python3.12-venv
```

> **Python version:** `setup.sh` requires **Python 3.12**. Python 3.13+ has no prebuilt wheels for `torch==2.9.0` or `torchcodec==0.8.1`. On systems where Python 3.12 is not the default (e.g. Ubuntu 24.10 with Python 3.14), the package above installs it alongside the system Python.

### 2. Unified Installation
```bash
git clone https://github.com/fakehec/coqui-tts-local-server.git
cd coqui-tts-local-server
chmod +x setup.sh
./setup.sh
```

### 3. User Permissions & Hardware Acceleration
To run the server without `sudo` privileges and enable GPU acceleration, the user must belong to the `video` and `render` groups:
```bash
sudo usermod -aG video $USER
sudo usermod -aG render $USER
```
*Note: Restart your session for changes to take effect.*

### 3. Network Permissions
The server listens on port `5100` by default. Ensure the user has permissions to open sockets on this port (standard for ports >1024).

### 4. Vocal Provisioning
- **Standard Voices**: The server automatically provisions the 6 standard OpenAI identities (Alloy, Echo, Fable, Onyx, Nova, Shimmer) during setup.
- **Elite/Custom Voices**: Reference voice files (.wav) for custom cloning are **not provided** due to copyright. Place your samples in `assets/voices/elite/` within the project directory.
- Refer to [CLONE_VOICES.md](./CLONE_VOICES.md) for instructions on creating high-quality reference files.

## 🎭 Personality Tuning & Parameters

The server supports advanced personality parameters to tune the output voice. These can be sent via the **API (JSON or Form-data)** or set as system-wide defaults via **environment variables** (or the `.env` file).

### 🌐 Supported Languages
The following language codes are supported: `en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, hu, ko, ja`.

## 📡 API Endpoints

## 🔧 Troubleshooting

### Transformers Compatibility Error
The `isin_mps_friendly` compatibility fix is applied automatically as a Python monkey-patch in `main_tts.py` before any model import, and also by `setup.sh` as a fallback. No manual action is required.

## 🛠 Execution

The server uses direct **Uvicorn** execution for maximum ASGI performance.

### Manual Execution (Console)
```bash
source venv/bin/activate

# Localhost only (Default: 127.0.0.1:5100)
uvicorn main_tts:app --host 127.0.0.1 --port 5100

# Expose to Local Network (0.0.0.0)
uvicorn main_tts:app --host 0.0.0.0 --port 5100
```

### ⚙️ Environment Variables & .env

The server includes a `.env.example` file. You can create a **`.env`** file in the root directory to override default behaviors without changing the code.

*Note: All personality parameters listed in the section above can also be set via their respective `DEFAULT_*` environment variables.*

### 3. User Service (systemd --user)
1. Create directory if it doesn't exist: `mkdir -p ~/.config/systemd/user`
2. Create: `~/.config/systemd/user/coqui-tts.service`
3. Configuration (all environment variables are loaded from your `.env` file):

```ini
[Unit]
Description=Coqui TTS Local Server
After=network.target

[Service]
Type=simple
WorkingDirectory=%h/coqui-tts-local-server
ExecStart=%h/coqui-tts-local-server/venv/bin/uvicorn main_tts:app --host 127.0.0.1 --port 5100
Restart=always
RestartSec=5

[Install]
WantedBy=default.target
```

4. Enable and start:
```bash
systemctl --user daemon-reload
systemctl --user enable --now coqui-tts.service
```

## 🐳 Docker

### Host Prerequisites (one-time setup)

Before running `docker compose up` for the first time, the host machine requires two one-time configuration steps to enable GPU passthrough via the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) CDI mode.

> These steps are required because Docker's default legacy GPU mode relies on BPF cgroup device filters, which are not available in cgroup v2 environments (Ubuntu 22.04+). CDI solves this cleanly.

**1. Add the NVIDIA package repository:**
```bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
```

**2. Install the toolkit:**
```bash
sudo apt update && sudo apt install -y nvidia-container-toolkit
```

**3. Generate the CDI spec** (exposes the GPU to containers via a stable device descriptor):
```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```

**4. Enable CDI in the Docker daemon:**
```bash
sudo tee /etc/docker/daemon.json <<'EOF'
{
"features": {
"cdi": true
}
}
EOF
sudo systemctl restart docker
```

**5. Verify it works:**
```bash
docker run --rm --device nvidia.com/gpu=all nvidia/cuda:12.6.3-runtime-ubuntu24.04 nvidia-smi
```

> **Note:** Step 3 must be re-run if the NVIDIA driver is updated (`sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml`).

### Running with Docker Compose

```bash
# Build and start (downloads model and standard voices on first run)
docker compose up -d

# Check server is ready
curl http://localhost:5100/health

# View logs (including first-run provisioning progress)
docker compose logs -f

# Stop
docker compose down
```

On first run, `entrypoint.sh` automatically:
- Downloads the `xtts_v2` model (~1.7GB) into `assets/models/`
- Downloads the 6 standard voices (alloy, echo, fable, onyx, nova, shimmer) into `assets/voices/standard/`

Both are persisted in host volumes and skipped on subsequent starts.

### Elite Voices in Docker

Elite/custom voices are not provisioned automatically. Mount them into the container by placing your `.wav` files in `assets/voices/elite/` on the host — the volume mapping `./assets/voices:/app/assets/voices` picks them up automatically without rebuilding the image.

## 🔒 Security & Network Note
By default, the server binds to **`127.0.0.1`** on port **`5100`**.
- To allow external network access, modify the `--host` parameter to `0.0.0.0` in the execution command or systemd unit.
- **WARNING**: This API **does not have authentication**. Exposing it to the network via `0.0.0.0` represents a security risk. Ensure the server is protected by a firewall or operating within a secure VPN/Local Network.

## 📊 Performance (NVIDIA RTX 5090)
| Task | Latency (Hot Lane) | Latency (Cold Lane) |
| :--- | :--- | :--- |
| Short Response (XTTSv2) | **~1.0s** | ~19s (Cold load) |
| Cached Response | **<0.02s** | <0.02s |

## 🛡 License
GNU GPL v3. Maintainers: Hugo L. Espuny & J.A.R.V.I.S. A.I.

## ☕ Support

If this project is useful to you, consider supporting its development:

- **Bitcoin (BTC):** `38jJyMomtUqhCjuNJ9VxKpgEyMyx37Zqix`
- **Monero (XMR):** `82bbUZdkMXUPAma4ioTuZNcJgTh8YTv4XNUwPy6T28kYJWCfeGgV79AZb7amCszFXeBaa5u595cQBVjFS4PkBGim56ap7Ej`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fakehec/coqui-tts-local-server

Awesome Lists containing this project

README