https://github.com/tcsenpai/audiocoqui
A multilingual tool to convert PDF ebooks to audiobooks using XTTS v2 TTS model by cloning a speaker voice.
https://github.com/tcsenpai/audiocoqui
ai audiobooks conversion ebook pdf tts utility
Last synced: 6 months ago
JSON representation
A multilingual tool to convert PDF ebooks to audiobooks using XTTS v2 TTS model by cloning a speaker voice.
- Host: GitHub
- URL: https://github.com/tcsenpai/audiocoqui
- Owner: tcsenpai
- License: other
- Created: 2025-01-22T18:09:51.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2025-01-22T19:23:30.000Z (9 months ago)
- Last Synced: 2025-04-15T11:05:16.084Z (6 months ago)
- Topics: ai, audiobooks, conversion, ebook, pdf, tts, utility
- Language: Python
- Homepage:
- Size: 3.07 MB
- Stars: 13
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# audiocoqui
A multilingual tool to convert PDF ebooks to audiobooks using XTTS v2 TTS model by cloning a speaker voice.

## Features
- [x] Clone speaker voice
- [x] Keep sentences together
- [x] Clean text input (remove line breaks, etc.)
- [x] Split PDF into pages
- [x] Convert pages to audio
- [x] Add silence between sections
- [x] Concatenate audio files automatically
- [x] Progress tracking and journaling with crash recovery## Requirements
- Python 3.10+
- pip install -r requirements.txt## Setup
- Copy .env.example to .env and fill in the missing values
- Download a speaker voice sample and put it in the source_audio folder (any .wav file of more than 10 seconds should work)
- Put your PDF in the proper folder as specified in the .env file## Usage
NOTE: You can and should clean the output by removing the audio_pages folder after you're done (example in `clean_output` file)
- python src/main.py
## Expected output
- A folder with all the audio pages of the PDF and their chunks if splitted.
- A final audiobook file as specified in the .env file.## FAQ
### GPU or CPU?
While the model is capable of running on CPU, it's recommended to use a GPU for faster processing.
### Model Size
The model size is slightly smaller than 2GB, so it's recommended to use a GPU with at least 4GB of VRAM or to ensure that your RAM is large enough to handle the model.
### Why we use a lot of small .wav files
We use a lot of small .wav files to enable crash recovery, avoid corruption and to enable progress tracking in a more reliable way.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.
## Credits
- [Coqui XTTS v2](https://huggingface.co/coqui/XTTS-v2)
- [PyTorch](https://pytorch.org/)
- All the authors of the used libraries