https://github.com/mlang/llm-tts
Streaming playback for the OpenAI Text-to-Speech API
https://github.com/mlang/llm-tts
Last synced: 9 months ago
JSON representation
Streaming playback for the OpenAI Text-to-Speech API
- Host: GitHub
- URL: https://github.com/mlang/llm-tts
- Owner: mlang
- License: apache-2.0
- Created: 2025-06-24T15:44:59.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2025-08-09T17:19:02.000Z (10 months ago)
- Last Synced: 2025-08-09T19:13:22.817Z (10 months ago)
- Language: Python
- Size: 12.7 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# llm-tts
> Streaming playback for Text-to-Speech APIs and local models
Provides a subcommand `llm tts` as well as a `tts` tool.
Uses FFmpeg or GStreamer for real-time playback.
## Installation
If you have FFmpeg installed:
```bash
llm install git+https://github.com/mlang/llm-tts
```
If you prefer to use GStreamer:
```bash
sudo apt install cairo-1.0 gstreamer-1.0 girepository-2.0
llm install 'git+https://github.com/mlang/llm-tts#[gstreamer]'
```
## Usage
```bash
llm tts 'Hello, World!'
llm "A short poem" | llm tts --instructions poetic
```
By default, `llm-tts` will try to use FFmpeg for real-time playback.
If FFmpeg doesn’t work for you, try GStreamer instead:
```bash
llm tts --play gstreamer:alsasink "Advanced Linux Sound Architecture"
```
If you want the chat model to be able to pass instructions along to the tts model, you can do so with a schema specification and the `--json` flag:
```bash
llm --schema "text: Text to speak, instructions: How to speak the given text" --save tts
llm -t tts "A lovely poem" | llm tts --json
## Supported TTS back-ends
### OpenAI
• Model names: `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`
• Requires `OPENAI_API_KEY`.
### ElevenLabs
* Model name: `eleven_multilingual_v2`
* `pip install elevenlabs` and set `ELEVENLABS_API_KEY`.
### Hugging Face / transformers (local)
* Model names: `facebook/mms-tts-eng`, `suno/bark`, `suno/bark-small`
* `pip install transformers sentencepiece`.
### Piper / Mimic3 (fully offline)
* Model names start with `piper/`, e.g. `piper/en_US-amy-low`
* `pip install piper-tts` – the first run auto-downloads the voice file.
### OuteTTS
* Model names: `OuteTTS-1.0-0.6B`, `Llama-OuteTTS-1.0-1B`
* `pip install outetts`.
### Silero (offline)
• Model names such as `silero/v3_en`, `silero/v4_ru`, …
• `pip install torch` (models are fetched via `torch.hub`).
Run
```bash
llm tts --list-models
```
to see every identifier currently available.
If you omit `--model`, the default is **gpt-4o-mini-tts**.
---
## Playback back-ends
### FFmpeg (default on Linux, macOS, Windows)
```bash
--play ffmpeg:FORMAT:DEVICE # e.g. --play ffmpeg:alsa:hw:0
```
### GStreamer (default on other systems)
```bash
--play gstreamer[:SINK] # list sinks with --list-sinks
```
You can also skip playback and write the result to a file instead:
```bash
llm tts --output-file out.mp3 "Save to disk instead of playing"
```
```