https://github.com/tcsenpai/audiocoqui

A multilingual tool to convert PDF ebooks to audiobooks using XTTS v2 TTS model by cloning a speaker voice.
https://github.com/tcsenpai/audiocoqui

ai audiobooks conversion ebook pdf tts utility

Last synced: 6 months ago
JSON representation

A multilingual tool to convert PDF ebooks to audiobooks using XTTS v2 TTS model by cloning a speaker voice.

Host: GitHub
URL: https://github.com/tcsenpai/audiocoqui
Owner: tcsenpai
License: other
Created: 2025-01-22T18:09:51.000Z (9 months ago)
Default Branch: master
Last Pushed: 2025-01-22T19:23:30.000Z (9 months ago)
Last Synced: 2025-04-15T11:05:16.084Z (6 months ago)
Topics: ai, audiobooks, conversion, ebook, pdf, tts, utility
Language: Python
Homepage:
Size: 3.07 MB
Stars: 13
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# audiocoqui

A multilingual tool to convert PDF ebooks to audiobooks using XTTS v2 TTS model by cloning a speaker voice.

![screenshot](./screenshot.png)

## Features

- [x] Clone speaker voice
- [x] Keep sentences together
- [x] Clean text input (remove line breaks, etc.)
- [x] Split PDF into pages
- [x] Convert pages to audio
- [x] Add silence between sections
- [x] Concatenate audio files automatically
- [x] Progress tracking and journaling with crash recovery

## Requirements

- Python 3.10+
- pip install -r requirements.txt

## Setup

- Copy .env.example to .env and fill in the missing values
- Download a speaker voice sample and put it in the source_audio folder (any .wav file of more than 10 seconds should work)
- Put your PDF in the proper folder as specified in the .env file

## Usage

NOTE: You can and should clean the output by removing the audio_pages folder after you're done (example in `clean_output` file)

- python src/main.py

## Expected output

- A folder with all the audio pages of the PDF and their chunks if splitted.
- A final audiobook file as specified in the .env file.

## FAQ

### GPU or CPU?

While the model is capable of running on CPU, it's recommended to use a GPU for faster processing.

### Model Size

The model size is slightly smaller than 2GB, so it's recommended to use a GPU with at least 4GB of VRAM or to ensure that your RAM is large enough to handle the model.

### Why we use a lot of small .wav files

We use a lot of small .wav files to enable crash recovery, avoid corruption and to enable progress tracking in a more reliable way.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.

## Credits

- [Coqui XTTS v2](https://huggingface.co/coqui/XTTS-v2)
- [PyTorch](https://pytorch.org/)
- All the authors of the used libraries

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tcsenpai/audiocoqui

Awesome Lists containing this project

README