https://github.com/solveditnpc/zonos-audiobook

Zonos-v0.1 text-to-speech(TTS) model trained on more than 200k hours of varied multilingual speech, additional support for converting pdf into audio-books --- find original release repo of zonos link below
https://github.com/solveditnpc/zonos-audiobook

2025 audio-book audio-book-converter deep-learning extension-repository natural-language-processing no-context-length-limit offline-tts pdf-to-speech python text-to-speech zonos

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/solveditnpc/zonos-audiobook
Owner: solveditnpc
License: apache-2.0
Created: 2025-02-15T03:46:28.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-03-10T13:44:59.000Z (7 months ago)
Last Synced: 2025-03-10T14:24:47.356Z (7 months ago)
Topics: 2025, audio-book, audio-book-converter, deep-learning, extension-repository, natural-language-processing, no-context-length-limit, offline-tts, pdf-to-speech, python, text-to-speech, zonos
Language: Python
Homepage: https://github.com/Zyphra/Zonos.git
Size: 1.65 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Zonos-v0.1

---

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.

Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

##### For more details and speech samples, check out our blog [here](https://www.zyphra.com/post/beta-release-of-zonos-v0-1)

##### We also have a hosted version available at [maia.zyphra.com/audio](https://maia.zyphra.com/audio)

---

Zonos follows a straightforward architecture: text normalization and phonemization via eSpeak, followed by DAC token prediction through a transformer or hybrid backbone. An overview of the architecture can be seen below.

---

## Usage

### PDF to Audio Conversion

The repository includes a script to convert PDF documents to audio files. To use it:

0. you can add an exampleaudio.mp3(with this exact name) file of the voice you want the model to speak in the [assets](assets/exampleaudio.mp3) folder
1. Place your PDF files in the `input` folder
2. Run the conversion script:
```bash
uv run audio_book.py
```
3. Follow the interactive prompts to:
- Select a PDF file (if multiple files are present)
- Choose the page range to convert
- Wait for the conversion to complete

The script will create audio files in the `output` folder, with filenames indicating the page range (e.g., `document_pages_1-5.wav`).

Features:
- Interactive PDF selection
- Page range selection
- Intelligent text chunking for natural speech
- Progress tracking
- Uses the same high-quality voice cloning as the base model

### Gradio interface(does not have pdf to audio conversion feature)

```bash
uv run gradio_interface.py
# python gradio_interface.py
```

This should produce a `sample.wav` file in your project root directory.

_For repeated sampling we highly recommend using the gradio interface instead, as the minimal example needs to load the model every time it is run._

## Features

- Zero-shot TTS with voice cloning: Input desired text and a 10-30s speaker sample to generate high quality TTS output
- Audio prefix inputs: Add text plus an audio prefix for even richer speaker matching. Audio prefixes can be used to elicit behaviours such as whispering which can otherwise be challenging to replicate when cloning from speaker embeddings
- Multilingual support: Zonos-v0.1 supports English, Japanese, Chinese, French, and German
- Audio quality and emotion control: Zonos offers fine-grained control of many aspects of the generated audio. These include speaking rate, pitch, maximum frequency, audio quality, and various emotions such as happiness, anger, sadness, and fear.
- Fast: our model runs with a real-time factor of ~2x on an RTX 4090 (i.e. generates 2 seconds of audio per 1 second of compute time)
- PDF to Audio conversion: Convert PDF documents to natural-sounding audio files with support for page range selection and intelligent text chunking
- Gradio WebUI: Zonos comes packaged with an easy to use gradio interface to generate speech
- Simple installation and deployment: Zonos can be installed and deployed simply using the docker file packaged with our repository.

## Installation

**At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 4GB+ VRAM).**

See also [Docker Installation](#docker-installation)

#### System dependencies

## Ensure that the python version is 3.11.x or 3.12.x for error free installation

Zonos requires the following system dependencies:

```bash
# For phonemization
apt install -y espeak-ng
```

#### Python dependencies

We highly recommend using a recent version of [uv](https://docs.astral.sh/uv/#installation) for installation. If you don't have uv installed, you can install it via pip: `pip install -U uv`.

##### Installing into a new uv virtual environment (recommended)

```bash
uv sync
uv sync --extra compile
```

##### Installing into the system/actived environment using uv

```bash
uv pip install -e .
uv pip install -e .[compile]
```

##### Confirm that it's working

For convenience we provide a minimal example to check that the installation works:

```bash
uv run sample.py
# python sample.py
```

##### Now you can convert pdfs into audiobooks using

(you can add any custom voice you like, follow the PDF To Audio Conversion steps above)

```bash
uv run audio_book.py
```

## Docker installation

```bash
git clone https://github.com/solveditnpc/zonos-audiobook.git
cd Zonos-audiobook

# For gradio
docker compose up

# Or for development you can do
docker build -t zonos .
docker run -it --gpus=all --net=host -v /path/to/Zonos:/Zonos -t zonos
cd /Zonos
python sample.py # this will generate a sample.wav in /Zonos
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/solveditnpc/zonos-audiobook

Awesome Lists containing this project

README