Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zelosleone/audiobook-generator
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
https://github.com/zelosleone/audiobook-generator
ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech
Last synced: 9 days ago
JSON representation
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
- Host: GitHub
- URL: https://github.com/zelosleone/audiobook-generator
- Owner: zelosleone
- Created: 2024-12-08T01:53:41.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-12-08T02:04:27.000Z (2 months ago)
- Last Synced: 2024-12-08T02:29:03.239Z (2 months ago)
- Topics: ai-audio, audiobook, cuda, gpu-acceleration, machine-learning, pdf-converter, python, pytorch, speech-synthesis, text-processing, text-to-speech
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Audiobook Generator
A high-quality text-to-speech converter that transforms PDF and TXT files into MP4 audio files. Currently using WhisperSpeech technology, but adaptable to future better models.
## Example Output
https://github.com/user-attachments/assets/637660f8-7cc8-492f-b4f4-764cbbb3d9bd
## Features
- Supports PDF and TXT input files
- GPU acceleration with CUDA support
- High-quality audio output (44.1kHz, 320kbps AAC)
- Efficient memory management and batch processing
- Multi-threaded CPU processing## Requirements
- Python 3.x
- NVIDIA GPU with CUDA support (optional)
- Minimum 4GB RAM
- Required packages listed in `requirements.txt`## Installation
1. Clone the repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```## Usage
1. Place PDF files in `PDF` directory
2. Place TXT files in `TXT` directory
3. Run:
```bash
python main.py
```
4. Find generated audio in `Audio` directory## Technical Details
### Performance Optimizations
- CUDA-aware processing with automatic GPU detection
- Dynamic batch sizing based on available VRAM/RAM
- Multi-threaded CPU processing for non-GPU operations
- Memory-efficient chunking for large documents### Audio Processing
- 44.1kHz sampling rate
- 320kbps AAC encoding
- Stereo output
- Zero-quality loss audio settings### System Architecture
- Modular pipeline design for easy model swapping
- Buffered I/O operations (1MB buffer)
- Automatic memory management with CUDA cache clearing
- Fault-tolerant processing with error handling### Resource Management
- Dynamic worker allocation based on system specs
- Configurable chunk sizes (default: 2000 tokens)
- Adaptive batch processing
- Progressive audio concatenation## Contributing
Feel free to suggest optimizations or improvements through issues or pull requests. The system is designed to be modular, allowing for easy integration of new TTS models.