https://github.com/zelosleone/audiobook-generator

A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
https://github.com/zelosleone/audiobook-generator

ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech

Last synced: 3 months ago
JSON representation

A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.

Host: GitHub
URL: https://github.com/zelosleone/audiobook-generator
Owner: zelosleone
Created: 2024-12-08T01:53:41.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-12-08T02:04:27.000Z (7 months ago)
Last Synced: 2025-03-29T07:44:57.355Z (3 months ago)
Topics: ai-audio, audiobook, cuda, gpu-acceleration, machine-learning, pdf-converter, python, pytorch, speech-synthesis, text-processing, text-to-speech
Language: Python
Homepage:
Size: 587 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Audiobook Generator

A high-quality text-to-speech converter that transforms PDF and TXT files into MP4 audio files. Currently using WhisperSpeech technology, but adaptable to future better models.

## Example Output

https://github.com/user-attachments/assets/637660f8-7cc8-492f-b4f4-764cbbb3d9bd

## Features

- Supports PDF and TXT input files
- GPU acceleration with CUDA support
- High-quality audio output (44.1kHz, 320kbps AAC)
- Efficient memory management and batch processing
- Multi-threaded CPU processing

## Requirements

- Python 3.x
- NVIDIA GPU with CUDA support (optional)
- Minimum 4GB RAM
- Required packages listed in `requirements.txt`

## Installation

1. Clone the repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

1. Place PDF files in `PDF` directory
2. Place TXT files in `TXT` directory
3. Run:
```bash
python main.py
```
4. Find generated audio in `Audio` directory

## Technical Details

### Performance Optimizations

- CUDA-aware processing with automatic GPU detection
- Dynamic batch sizing based on available VRAM/RAM
- Multi-threaded CPU processing for non-GPU operations
- Memory-efficient chunking for large documents

### Audio Processing

- 44.1kHz sampling rate
- 320kbps AAC encoding
- Stereo output
- Zero-quality loss audio settings

### System Architecture

- Modular pipeline design for easy model swapping
- Buffered I/O operations (1MB buffer)
- Automatic memory management with CUDA cache clearing
- Fault-tolerant processing with error handling

### Resource Management

- Dynamic worker allocation based on system specs
- Configurable chunk sizes (default: 2000 tokens)
- Adaptive batch processing
- Progressive audio concatenation

## Contributing

Feel free to suggest optimizations or improvements through issues or pull requests. The system is designed to be modular, allowing for easy integration of new TTS models.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zelosleone/audiobook-generator

Awesome Lists containing this project

README