{"id":42259552,"url":"https://github.com/jamiepine/voicebox","last_synced_at":"2026-04-21T02:14:00.534Z","repository":{"id":334810511,"uuid":"1141782198","full_name":"jamiepine/voicebox","owner":"jamiepine","description":"The open-source voice synthesis studio powered by Qwen3-TTS.","archived":false,"fork":false,"pushed_at":"2026-01-31T15:44:31.000Z","size":93437,"stargazers_count":144,"open_issues_count":11,"forks_count":16,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-01-31T17:13:19.724Z","etag":null,"topics":["ai","qwen3-tts","voice-ai","voice-clone","whisper"],"latest_commit_sha":null,"homepage":"https://voicebox.sh","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jamiepine.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-25T12:27:03.000Z","updated_at":"2026-01-31T16:25:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"9bf6de60-573f-4c85-8778-a02be9f62d7e","html_url":"https://github.com/jamiepine/voicebox","commit_stats":null,"previous_names":["jamiepine/voicebox"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/jamiepine/voicebox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamiepine%2Fvoicebox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamiepine%2Fvoicebox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamiepine%2Fvoicebox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamiepine%2Fvoicebox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jamiepine","download_url":"https://codeload.github.com/jamiepine/voicebox/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamiepine%2Fvoicebox/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28968718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T04:44:20.970Z","status":"ssl_error","status_checked_at":"2026-02-01T04:44:19.994Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","qwen3-tts","voice-ai","voice-clone","whisper"],"created_at":"2026-01-27T06:00:24.459Z","updated_at":"2026-04-21T02:14:00.528Z","avatar_url":"https://github.com/jamiepine.png","language":"TypeScript","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\".github/assets/icon-dark.webp\" alt=\"Voicebox\" width=\"120\" height=\"120\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eVoicebox\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eThe open-source voice synthesis studio.\u003c/strong\u003e\u003cbr/\u003e\n  Clone voices. Generate speech. Apply effects. Build voice-powered apps.\u003cbr/\u003e\n  All running locally on your machine.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/jamiepine/voicebox/releases\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/downloads/jamiepine/voicebox/total?style=flat\u0026color=blue\" alt=\"Downloads\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/jamiepine/voicebox/releases/latest\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/v/release/jamiepine/voicebox?style=flat\" alt=\"Release\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/jamiepine/voicebox/stargazers\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/stars/jamiepine/voicebox?style=flat\" alt=\"Stars\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/jamiepine/voicebox/blob/main/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/jamiepine/voicebox?style=flat\" alt=\"License\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://deepwiki.com/jamiepine/voicebox\"\u003e\n    \u003cimg src=\"https://img.shields.io/static/v1?label=Ask\u0026message=DeepWiki\u0026color=5B6EF7\" alt=\"Ask DeepWiki\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://voicebox.sh\"\u003evoicebox.sh\u003c/a\u003e •\n  \u003ca href=\"https://docs.voicebox.sh\"\u003eDocs\u003c/a\u003e •\n  \u003ca href=\"#download\"\u003eDownload\u003c/a\u003e •\n  \u003ca href=\"#features\"\u003eFeatures\u003c/a\u003e •\n  \u003ca href=\"#api\"\u003eAPI\u003c/a\u003e •\n  \u003ca href=\"docs/content/docs/overview/troubleshooting.mdx\"\u003eTroubleshooting\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cbr/\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://voicebox.sh\"\u003e\n    \u003cimg src=\"landing/public/assets/app-screenshot-1.webp\" alt=\"Voicebox App Screenshot\" width=\"800\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cem\u003eClick the image above to watch the demo video on \u003ca href=\"https://voicebox.sh\"\u003evoicebox.sh\u003c/a\u003e\u003c/em\u003e\n\u003c/p\u003e\n\n\u003cbr/\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"landing/public/assets/app-screenshot-2.webp\" alt=\"Voicebox Screenshot 2\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"landing/public/assets/app-screenshot-3.webp\" alt=\"Voicebox Screenshot 3\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n\u003cbr/\u003e\n\n## What is Voicebox?\n\nVoicebox is a **local-first voice cloning studio** — a free and open-source alternative to ElevenLabs. Clone voices from a few seconds of audio or pick from 50+ preset voices, generate speech in 23 languages across 7 TTS engines, apply post-processing effects, and compose multi-voice projects with a timeline editor.\n\n- **Complete privacy** — models and voice data stay on your machine\n- **7 TTS engines** — Qwen3-TTS, Qwen CustomVoice, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, HumeAI TADA, and Kokoro\n- **Cloning and preset voices** — zero-shot cloning from a reference sample, or curated preset voices via Kokoro (50 voices) and Qwen CustomVoice (9 voices)\n- **23 languages** — from English to Arabic, Japanese, Hindi, Swahili, and more\n- **Post-processing effects** — pitch shift, reverb, delay, chorus, compression, and filters\n- **Expressive speech** — paralinguistic tags like `[laugh]`, `[sigh]`, `[gasp]` via Chatterbox Turbo; natural-language delivery control via Qwen CustomVoice\n- **Unlimited length** — auto-chunking with crossfade for scripts, articles, and chapters\n- **Stories editor** — multi-track timeline for conversations, podcasts, and narratives\n- **API-first** — REST API for integrating voice synthesis into your own projects\n- **Native performance** — built with Tauri (Rust), not Electron\n- **Runs everywhere** — macOS (MLX/Metal), Windows (CUDA), Linux, AMD ROCm, Intel Arc, Docker\n\n---\n\n## Download\n\n| Platform              | Download                                               |\n| --------------------- | ------------------------------------------------------ |\n| macOS (Apple Silicon) | [Download DMG](https://voicebox.sh/download/mac-arm)   |\n| macOS (Intel)         | [Download DMG](https://voicebox.sh/download/mac-intel) |\n| Windows               | [Download MSI](https://voicebox.sh/download/windows)   |\n| Docker                | `docker compose up`                                    |\n\n\u003e **[View all binaries →](https://github.com/jamiepine/voicebox/releases/latest)**\n\n\u003e **Linux** — Pre-built binaries are not yet available. See [voicebox.sh/linux-install](https://voicebox.sh/linux-install) for build-from-source instructions.\n\n\u003e **Having trouble?** See the [Troubleshooting Guide](docs/content/docs/overview/troubleshooting.mdx) for common install, generation, model-download, and GPU issues.\n\n---\n\n## Features\n\n### Multi-Engine Voice Cloning\n\nSeven TTS engines with different strengths, switchable per-generation:\n\n| Engine                      | Languages | Strengths                                                                                                                                |\n| --------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------- |\n| **Qwen3-TTS** (0.6B / 1.7B) | 10        | High-quality multilingual cloning, delivery instructions (\"speak slowly\", \"whisper\")                                                     |\n| **Qwen CustomVoice**        | 10        | 9 curated preset voices with natural-language delivery control — no reference audio required                                             |\n| **LuxTTS**                  | English   | Lightweight (~1GB VRAM), 48kHz output, 150x realtime on CPU                                                                              |\n| **Chatterbox Multilingual** | 23        | Broadest language coverage — Arabic, Danish, Finnish, Greek, Hebrew, Hindi, Malay, Norwegian, Polish, Swahili, Swedish, Turkish and more |\n| **Chatterbox Turbo**        | English   | Fast 350M model with paralinguistic emotion/sound tags                                                                                   |\n| **TADA** (1B / 3B)          | 10        | HumeAI speech-language model — 700s+ coherent audio, text-acoustic dual alignment                                                        |\n| **Kokoro**                  | 8         | 50 curated preset voices, tiny 82M model, fast CPU inference                                                                             |\n\n### Emotions \u0026 Paralinguistic Tags\n\nOnly **Chatterbox Turbo** interprets paralinguistic tags like `[laugh]` and\n`[sigh]`. Qwen3-TTS, LuxTTS, Chatterbox Multilingual, and HumeAI TADA read them\nliterally as text.\n\nWith **Chatterbox Turbo** selected, type `/` in the text input to open the tag\ninserter and add expressive tags inline with speech:\n\n`[laugh]` `[chuckle]` `[gasp]` `[cough]` `[sigh]` `[groan]` `[sniff]` `[shush]` `[clear throat]`\n\n### Post-Processing Effects\n\n8 audio effects powered by Spotify's `pedalboard` library. Apply after generation, preview in real time, build reusable presets.\n\n| Effect           | Description                                   |\n| ---------------- | --------------------------------------------- |\n| Pitch Shift      | Up or down by up to 12 semitones              |\n| Reverb           | Configurable room size, damping, wet/dry mix  |\n| Delay            | Echo with adjustable time, feedback, and mix  |\n| Chorus / Flanger | Modulated delay for metallic or lush textures |\n| Compressor       | Dynamic range compression                     |\n| Gain             | Volume adjustment (-40 to +40 dB)             |\n| High-Pass Filter | Remove low frequencies                        |\n| Low-Pass Filter  | Remove high frequencies                       |\n\nShips with 4 built-in presets (Robotic, Radio, Echo Chamber, Deep Voice) and supports custom presets. Effects can be assigned per-profile as defaults.\n\n### Unlimited Generation Length\n\nText is automatically split at sentence boundaries and each chunk is generated independently, then crossfaded together. Works with all engines.\n\n- Configurable auto-chunking limit (100–5,000 chars)\n- Crossfade slider (0–200ms) for smooth transitions\n- Max text length: 50,000 characters\n- Smart splitting respects abbreviations, CJK punctuation, and `[tags]`\n\n### Generation Versions\n\nEvery generation supports multiple versions with provenance tracking:\n\n- **Original** — clean TTS output, always preserved\n- **Effects versions** — apply different effects chains from any source version\n- **Takes** — regenerate with a new seed for variation\n- **Source tracking** — each version records its lineage\n- **Favorites** — star generations for quick access\n\n### Async Generation Queue\n\nGeneration is non-blocking. Submit and immediately start typing the next one.\n\n- Serial execution queue prevents GPU contention\n- Real-time SSE status streaming\n- Failed generations can be retried\n- Stale generations from crashes auto-recover on startup\n\n### Voice Profile Management\n\n- Create profiles from audio files or record directly in-app\n- Import/export profiles to share or back up\n- Multi-sample support for higher quality cloning\n- Per-profile default effects chains\n- Organize with descriptions and language tags\n\n### Stories Editor\n\nMulti-voice timeline editor for conversations, podcasts, and narratives.\n\n- Multi-track composition with drag-and-drop\n- Inline audio trimming and splitting\n- Auto-playback with synchronized playhead\n- Version pinning per track clip\n\n### Recording \u0026 Transcription\n\n- In-app recording with waveform visualization\n- System audio capture (macOS and Windows)\n- Automatic transcription powered by Whisper (including Whisper Turbo)\n- Export recordings in multiple formats\n\n### Model Management\n\n- Per-model unload to free GPU memory without deleting downloads\n- Custom models directory via `VOICEBOX_MODELS_DIR`\n- Model folder migration with progress tracking\n- Download cancel/clear UI\n\n### GPU Support\n\n| Platform                 | Backend        | Notes                                          |\n| ------------------------ | -------------- | ---------------------------------------------- |\n| macOS (Apple Silicon)    | MLX (Metal)    | 4-5x faster via Neural Engine                  |\n| Windows / Linux (NVIDIA) | PyTorch (CUDA) | Auto-downloads CUDA binary from within the app |\n| Linux (AMD)              | PyTorch (ROCm) | Auto-configures HSA_OVERRIDE_GFX_VERSION       |\n| Windows (any GPU)        | DirectML       | Universal Windows GPU support                  |\n| Intel Arc                | IPEX/XPU       | Intel discrete GPU acceleration                |\n| Any                      | CPU            | Works everywhere, just slower                  |\n\n---\n\n## API\n\nVoicebox exposes a full REST API for integrating voice synthesis into your own apps.\n\n```bash\n# Generate speech\ncurl -X POST http://localhost:17493/generate \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Hello world\", \"profile_id\": \"abc123\", \"language\": \"en\"}'\n\n# List voice profiles\ncurl http://localhost:17493/profiles\n\n# Create a profile\ncurl -X POST http://localhost:17493/profiles \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"name\": \"My Voice\", \"language\": \"en\"}'\n```\n\n**Use cases:** game dialogue, podcast production, accessibility tools, voice assistants, content automation.\n\nFull API documentation available at `http://localhost:17493/docs`.\n\n---\n\n## Tech Stack\n\n| Layer         | Technology                                        |\n| ------------- | ------------------------------------------------- |\n| Desktop App   | Tauri (Rust)                                      |\n| Frontend      | React, TypeScript, Tailwind CSS                   |\n| State         | Zustand, React Query                              |\n| Backend       | FastAPI (Python)                                  |\n| TTS Engines   | Qwen3-TTS, Qwen CustomVoice, LuxTTS, Chatterbox, Chatterbox Turbo, TADA, Kokoro |\n| Effects       | Pedalboard (Spotify)                              |\n| Transcription | Whisper / Whisper Turbo (PyTorch or MLX)          |\n| Inference     | MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU) |\n| Database      | SQLite                                            |\n| Audio         | WaveSurfer.js, librosa                            |\n\n---\n\n## Roadmap\n\n| Feature                 | Description                                    |\n| ----------------------- | ---------------------------------------------- |\n| **Real-time Streaming** | Stream audio as it generates, word by word     |\n| **Voice Design**        | Create new voices from text descriptions       |\n| **More Models**         | XTTS, Bark, and other open-source voice models  |\n| **Plugin Architecture** | Extend with custom models and effects          |\n| **Mobile Companion**    | Control Voicebox from your phone               |\n\nFor the **full engineering status, open-issue triage, and prioritized work queue**, see [`docs/PROJECT_STATUS.md`](docs/PROJECT_STATUS.md) — a living document that tracks what's shipped, what's in-flight, candidate TTS engines under evaluation, and why we've accepted or backlogged specific integrations.\n\n---\n\n## Development\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for detailed setup and contribution guidelines.\n\n### Quick Start\n\n```bash\ngit clone https://github.com/jamiepine/voicebox.git\ncd voicebox\n\njust setup   # creates Python venv, installs all deps\njust dev     # starts backend + desktop app\n```\n\nInstall [just](https://github.com/casey/just): `brew install just` or `cargo install just`. Run `just --list` to see all commands.\n\n**Prerequisites:** [Bun](https://bun.sh), [Rust](https://rustup.rs), [Python 3.11+](https://python.org), [Tauri Prerequisites](https://v2.tauri.app/start/prerequisites/), and [Xcode](https://developer.apple.com/xcode/) on macOS.\n\n### Building Locally\n\n```bash\njust build          # Build CPU server binary + Tauri app\njust build-local    # (Windows) Build CPU + CUDA server binaries + Tauri app\n```\n\n### Adding New Voice Models\n\nThe multi-engine architecture makes adding new TTS engines straightforward. A [step-by-step guide](docs/content/docs/developer/tts-engines.mdx) covers the full process: dependency research, backend protocol implementation, frontend wiring, and PyInstaller bundling.\n\nThe guide is optimized for AI coding agents. An [agent skill](.agents/skills/add-tts-engine/SKILL.md) can pick up a model name and handle the entire integration autonomously — you just test the build locally.\n\n### Project Structure\n\n```\nvoicebox/\n├── app/              # Shared React frontend\n├── tauri/            # Desktop app (Tauri + Rust)\n├── web/              # Web deployment\n├── backend/          # Python FastAPI server\n├── landing/          # Marketing website\n└── scripts/          # Build \u0026 release scripts\n```\n\n---\n\n## Contributing\n\nContributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n1. Fork the repo\n2. Create a feature branch\n3. Make your changes\n4. Submit a PR\n\n## Security\n\nFound a security vulnerability? Please report it responsibly. See [SECURITY.md](SECURITY.md) for details.\n\n---\n\n## License\n\nMIT License — see [LICENSE](LICENSE) for details.\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://voicebox.sh\"\u003evoicebox.sh\u003c/a\u003e\n\u003c/p\u003e\n","funding_links":[],"categories":["Projects","TTS (Text-to-Speech) | 文本转语音","🤖 AI \u0026 Machine Learning","语音合成","TypeScript","Audio \u0026 Speech","Uncategorized","Repos"],"sub_categories":["Industry Specific","Open Source TTS Models | 开源 TTS 模型","资源传输下载","Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamiepine%2Fvoicebox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjamiepine%2Fvoicebox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamiepine%2Fvoicebox/lists"}