{"id":22656524,"url":"https://github.com/zelosleone/audiobook-generator","last_synced_at":"2026-05-05T12:33:34.003Z","repository":{"id":267057982,"uuid":"900150426","full_name":"zelosleone/Audiobook-Generator","owner":"zelosleone","description":"A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.","archived":false,"fork":false,"pushed_at":"2024-12-08T02:04:27.000Z","size":601,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T07:44:57.355Z","etag":null,"topics":["ai-audio","audiobook","cuda","gpu-acceleration","machine-learning","pdf-converter","python","pytorch","speech-synthesis","text-processing","text-to-speech"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zelosleone.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-08T01:53:41.000Z","updated_at":"2024-12-08T02:06:25.000Z","dependencies_parsed_at":"2024-12-08T02:29:08.713Z","dependency_job_id":"0293d671-0b69-4a8d-ad96-4a9d10caaf9f","html_url":"https://github.com/zelosleone/Audiobook-Generator","commit_stats":null,"previous_names":["zelosleone/audiobook-generator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zelosleone%2FAudiobook-Generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zelosleone%2FAudiobook-Generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zelosleone%2FAudiobook-Generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zelosleone%2FAudiobook-Generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zelosleone","download_url":"https://codeload.github.com/zelosleone/Audiobook-Generator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246156029,"owners_count":20732359,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-audio","audiobook","cuda","gpu-acceleration","machine-learning","pdf-converter","python","pytorch","speech-synthesis","text-processing","text-to-speech"],"created_at":"2024-12-09T10:14:44.989Z","updated_at":"2026-05-05T12:33:33.962Z","avatar_url":"https://github.com/zelosleone.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audiobook Generator\n\nA high-quality text-to-speech converter that transforms PDF and TXT files into MP4 audio files. Currently using WhisperSpeech technology, but adaptable to future better models.\n\n## Example Output\n\nhttps://github.com/user-attachments/assets/637660f8-7cc8-492f-b4f4-764cbbb3d9bd\n\n## Features\n\n- Supports PDF and TXT input files\n- GPU acceleration with CUDA support\n- High-quality audio output (44.1kHz, 320kbps AAC)\n- Efficient memory management and batch processing\n- Multi-threaded CPU processing\n\n## Requirements\n\n- Python 3.x\n- NVIDIA GPU with CUDA support (optional)\n- Minimum 4GB RAM\n- Required packages listed in `requirements.txt`\n\n## Installation\n\n1. Clone the repository\n2. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n## Usage\n\n1. Place PDF files in `PDF` directory\n2. Place TXT files in `TXT` directory\n3. Run:\n   ```bash\n   python main.py\n   ```\n4. Find generated audio in `Audio` directory\n\n## Technical Details\n\n### Performance Optimizations\n\n- CUDA-aware processing with automatic GPU detection\n- Dynamic batch sizing based on available VRAM/RAM\n- Multi-threaded CPU processing for non-GPU operations\n- Memory-efficient chunking for large documents\n\n### Audio Processing\n\n- 44.1kHz sampling rate\n- 320kbps AAC encoding\n- Stereo output\n- Zero-quality loss audio settings\n\n### System Architecture\n\n- Modular pipeline design for easy model swapping\n- Buffered I/O operations (1MB buffer)\n- Automatic memory management with CUDA cache clearing\n- Fault-tolerant processing with error handling\n\n### Resource Management\n\n- Dynamic worker allocation based on system specs\n- Configurable chunk sizes (default: 2000 tokens)\n- Adaptive batch processing\n- Progressive audio concatenation\n\n## Contributing\n\nFeel free to suggest optimizations or improvements through issues or pull requests. The system is designed to be modular, allowing for easy integration of new TTS models.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzelosleone%2Faudiobook-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzelosleone%2Faudiobook-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzelosleone%2Faudiobook-generator/lists"}