https://github.com/resemble-ai/chatterbox
SoTA open-source TTS
https://github.com/resemble-ai/chatterbox
Last synced: 6 months ago
JSON representation
SoTA open-source TTS
- Host: GitHub
- URL: https://github.com/resemble-ai/chatterbox
- Owner: resemble-ai
- License: mit
- Created: 2025-04-23T08:16:38.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-06-04T13:54:02.000Z (6 months ago)
- Last Synced: 2025-06-07T05:09:00.063Z (6 months ago)
- Language: Python
- Homepage: https://resemble-ai.github.io/chatterbox_demopage/
- Size: 43.9 KB
- Stars: 5,924
- Watchers: 53
- Forks: 653
- Open Issues: 65
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- AiTreasureBox - resemble-ai/chatterbox - 11-03_14381_2](https://img.shields.io/github/stars/resemble-ai/chatterbox.svg)|SoTA open-source TTS| (Repos)
- StarryDivineSky - resemble-ai/chatterbox - ai/chatterbox 是一个基于最新技术的开源文本到语音(TTS)系统,旨在提供高质量、自然流畅的语音合成能力。该项目的核心特色包括支持多语言语音生成、实时语音合成以及对复杂文本的精准处理,能够将输入文本转换为接近人类发音的音频文件。其工作原理基于深度学习模型,结合了语音波形生成和声学模型优化技术,通过训练大量语音数据来提升合成语音的自然度和准确性。项目提供了简洁的API接口,开发者可快速集成至应用程序中,同时支持自定义语音风格和音色调整功能。此外,chatterbox 还包含高效的音频处理模块,能够实时转换文本并生成多种格式的输出文件,适用于虚拟助手、有声书制作、语音交互系统等场景。其代码结构清晰,文档完整,便于开发者进行二次开发和模型调优。项目团队持续更新模型参数和训练数据,确保合成语音的前沿性和稳定性,同时通过开源社区收集用户反馈以优化功能。若需使用,用户可通过 pip 安装依赖包并按照示例代码调用接口,即可快速实现文本到语音的转换需求。 (语音合成 / 资源传输下载)
- awesome-tts-colab - GitHub Link
- ai-game-devtools - Chatterbox - grade open-source TTS model. | | | Speech | (<span id="speech">Speech</span> / <span id="tool">LLM (LLM & Tool)</span>)
README

# Chatterbox TTS
[](https://resemble-ai.github.io/chatterbox_demopage/)
[](https://huggingface.co/spaces/ResembleAI/Chatterbox)
[](https://podonos.com/resembleai/chatterbox)
[](https://discord.gg/rJq9cRJBJ6)
We're excited to introduce Chatterbox, [Resemble AI's](https://resemble.ai) first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support **emotion exaggeration control**, a powerful feature that makes your voices stand out. Try it now on our [Hugging Face Gradio app.](https://huggingface.co/spaces/ResembleAI/Chatterbox)
If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.
# Key Details
- SoTA zeroshot TTS
- 0.5B Llama backbone
- Unique exaggeration/intensity control
- Ultra-stable with alignment-informed inference
- Trained on 0.5M hours of cleaned data
- Watermarked outputs
- Easy voice conversion script
- [Outperforms ElevenLabs](https://podonos.com/resembleai/chatterbox)
# Tips
- **General Use (TTS and Voice Agents):**
- The default settings (`exaggeration=0.5`, `cfg_weight=0.5`) work well for most prompts.
- If the reference speaker has a fast speaking style, lowering `cfg_weight` to around `0.3` can improve pacing.
- **Expressive or Dramatic Speech:**
- Try lower `cfg_weight` values (e.g. `~0.3`) and increase `exaggeration` to around `0.7` or higher.
- Higher `exaggeration` tends to speed up speech; reducing `cfg_weight` helps compensate with slower, more deliberate pacing.
# Installation
```shell
pip install chatterbox-tts
```
Alternatively, you can install from source:
```shell
# conda create -yn chatterbox python=3.11
# conda activate chatterbox
git clone https://github.com/resemble-ai/chatterbox.git
cd chatterbox
pip install -e .
```
We developed and tested Chatterbox on Python 3.11 on Debain 11 OS; the versions of the dependencies are pinned in `pyproject.toml` to ensure consistency. You can modify the code or dependencies in this installation mode.
# Usage
```python
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
model = ChatterboxTTS.from_pretrained(device="cuda")
text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."
wav = model.generate(text)
ta.save("test-1.wav", wav, model.sr)
# If you want to synthesize with a different voice, specify the audio prompt
AUDIO_PROMPT_PATH = "YOUR_FILE.wav"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)
```
See `example_tts.py` and `example_vc.py` for more examples.
# Supported Lanugage
Currenlty only English.
# Acknowledgements
- [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)
- [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning)
- [HiFT-GAN](https://github.com/yl4579/HiFTNet)
- [Llama 3](https://github.com/meta-llama/llama3)
- [S3Tokenizer](https://github.com/xingchensong/S3Tokenizer)
# Built-in PerTh Watermarking for Responsible AI
Every audio file generated by Chatterbox includes [Resemble AI's Perth (Perceptual Threshold) Watermarker](https://github.com/resemble-ai/perth) - imperceptible neural watermarks that survive MP3 compression, audio editing, and common manipulations while maintaining nearly 100% detection accuracy.
## Watermark extraction
You can look for the watermark using the following script.
```python
import perth
import librosa
AUDIO_PATH = "YOUR_FILE.wav"
# Load the watermarked audio
watermarked_audio, sr = librosa.load(AUDIO_PATH, sr=None)
# Initialize watermarker (same as used for embedding)
watermarker = perth.PerthImplicitWatermarker()
# Extract watermark
watermark = watermarker.get_watermark(watermarked_audio, sample_rate=sr)
print(f"Extracted watermark: {watermark}")
# Output: 0.0 (no watermark) or 1.0 (watermarked)
```
# Official Discord
👋 Join us on [Discord](https://discord.gg/rJq9cRJBJ6) and let's build something awesome together!
# Disclaimer
Don't use this model to do bad things. Prompts are sourced from freely available data on the internet.