https://github.com/resemble-ai/chatterbox

SoTA open-source TTS
https://github.com/resemble-ai/chatterbox

Last synced: 9 months ago
JSON representation

SoTA open-source TTS

Host: GitHub
URL: https://github.com/resemble-ai/chatterbox
Owner: resemble-ai
License: mit
Created: 2025-04-23T08:16:38.000Z (11 months ago)
Default Branch: master
Last Pushed: 2025-06-04T13:54:02.000Z (9 months ago)
Last Synced: 2025-06-07T05:09:00.063Z (9 months ago)
Language: Python
Homepage: https://resemble-ai.github.io/chatterbox_demopage/
Size: 43.9 KB
Stars: 5,924
Watchers: 53
Forks: 653
Open Issues: 65
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

AiTreasureBox - resemble-ai/chatterbox - 11-03_14381_2](https://img.shields.io/github/stars/resemble-ai/chatterbox.svg)|SoTA open-source TTS| (Repos)
StarryDivineSky - resemble-ai/chatterbox - ai/chatterbox 是一个基于最新技术的开源文本到语音（TTS）系统，旨在提供高质量、自然流畅的语音合成能力。该项目的核心特色包括支持多语言语音生成、实时语音合成以及对复杂文本的精准处理，能够将输入文本转换为接近人类发音的音频文件。其工作原理基于深度学习模型，结合了语音波形生成和声学模型优化技术，通过训练大量语音数据来提升合成语音的自然度和准确性。项目提供了简洁的API接口，开发者可快速集成至应用程序中，同时支持自定义语音风格和音色调整功能。此外，chatterbox 还包含高效的音频处理模块，能够实时转换文本并生成多种格式的输出文件，适用于虚拟助手、有声书制作、语音交互系统等场景。其代码结构清晰，文档完整，便于开发者进行二次开发和模型调优。项目团队持续更新模型参数和训练数据，确保合成语音的前沿性和稳定性，同时通过开源社区收集用户反馈以优化功能。若需使用，用户可通过 pip 安装依赖包并按照示例代码调用接口，即可快速实现文本到语音的转换需求。 (语音合成 / 资源传输下载)
awesome-tts-colab - GitHub Link
ai-game-devtools - Chatterbox - grade open-source TTS model. | | | Speech | (<span id="speech">Speech</span> / <span id="tool">LLM (LLM & Tool)</span>)
awesome-repositories - resemble-ai/chatterbox - SoTA open-source TTS (Python)

README

          


# Chatterbox TTS

[![Alt Text](https://img.shields.io/badge/listen-demo_samples-blue)](https://resemble-ai.github.io/chatterbox_demopage/)

[![Alt Text](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/ResembleAI/Chatterbox)

[![Alt Text](https://static-public.podonos.com/badges/insight-on-pdns-sm-dark.svg)](https://podonos.com/resembleai/chatterbox)

[![Discord](https://img.shields.io/discord/1377773249798344776?label=join%20discord&logo=discord&style=flat)](https://discord.gg/rJq9cRJBJ6)

_Made with ♥️ by 

We're excited to introduce Chatterbox, [Resemble AI's](https://resemble.ai) first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support **emotion exaggeration control**, a powerful feature that makes your voices stand out. Try it now on our [Hugging Face Gradio app.](https://huggingface.co/spaces/ResembleAI/Chatterbox)

If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.

# Key Details

- SoTA zeroshot TTS

- 0.5B Llama backbone

- Unique exaggeration/intensity control

- Ultra-stable with alignment-informed inference

- Trained on 0.5M hours of cleaned data

- Watermarked outputs

- Easy voice conversion script

- [Outperforms ElevenLabs](https://podonos.com/resembleai/chatterbox)

# Tips

- **General Use (TTS and Voice Agents):**

  - The default settings (`exaggeration=0.5`, `cfg_weight=0.5`) work well for most prompts.

  - If the reference speaker has a fast speaking style, lowering `cfg_weight` to around `0.3` can improve pacing.

- **Expressive or Dramatic Speech:**

  - Try lower `cfg_weight` values (e.g. `~0.3`) and increase `exaggeration` to around `0.7` or higher.

  - Higher `exaggeration` tends to speed up speech; reducing `cfg_weight` helps compensate with slower, more deliberate pacing.

# Installation

```shell

pip install chatterbox-tts

```

Alternatively, you can install from source:

```shell

# conda create -yn chatterbox python=3.11

# conda activate chatterbox

git clone https://github.com/resemble-ai/chatterbox.git

cd chatterbox

pip install -e .

```

We developed and tested Chatterbox on Python 3.11 on Debain 11 OS; the versions of the dependencies are pinned in `pyproject.toml` to ensure consistency. You can modify the code or dependencies in this installation mode.

# Usage

```python

import torchaudio as ta

from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(device="cuda")

text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."

wav = model.generate(text)

ta.save("test-1.wav", wav, model.sr)

# If you want to synthesize with a different voice, specify the audio prompt

AUDIO_PROMPT_PATH = "YOUR_FILE.wav"

wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)

ta.save("test-2.wav", wav, model.sr)

```

See `example_tts.py` and `example_vc.py` for more examples.

# Supported Lanugage

Currenlty only English.

# Acknowledgements

- [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)

- [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning)

- [HiFT-GAN](https://github.com/yl4579/HiFTNet)

- [Llama 3](https://github.com/meta-llama/llama3)

- [S3Tokenizer](https://github.com/xingchensong/S3Tokenizer)

# Built-in PerTh Watermarking for Responsible AI

Every audio file generated by Chatterbox includes [Resemble AI's Perth (Perceptual Threshold) Watermarker](https://github.com/resemble-ai/perth) - imperceptible neural watermarks that survive MP3 compression, audio editing, and common manipulations while maintaining nearly 100% detection accuracy.

## Watermark extraction

You can look for the watermark using the following script.

```python

import perth

import librosa

AUDIO_PATH = "YOUR_FILE.wav"

# Load the watermarked audio

watermarked_audio, sr = librosa.load(AUDIO_PATH, sr=None)

# Initialize watermarker (same as used for embedding)

watermarker = perth.PerthImplicitWatermarker()

# Extract watermark

watermark = watermarker.get_watermark(watermarked_audio, sample_rate=sr)

print(f"Extracted watermark: {watermark}")

# Output: 0.0 (no watermark) or 1.0 (watermarked)

```

# Official Discord

👋 Join us on [Discord](https://discord.gg/rJq9cRJBJ6) and let's build something awesome together!

# Disclaimer

Don't use this model to do bad things. Prompts are sourced from freely available data on the internet.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/resemble-ai/chatterbox

Awesome Lists containing this project

README