{"id":49303618,"url":"https://github.com/tianqbu/doppelvoice","last_synced_at":"2026-05-02T14:01:09.730Z","repository":{"id":353895323,"uuid":"1221146351","full_name":"TianqBu/Doppelvoice","owner":"TianqBu","description":"Real-time Chinese↔English speech translation with zero-shot voice cloning · 端到端实时语音翻译 + 0样本音色克隆 · Powered by Doubao Seed LiveInterpret 2.0","archived":false,"fork":false,"pushed_at":"2026-04-26T17:01:47.000Z","size":372,"stargazers_count":2,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-28T10:35:10.953Z","etag":null,"topics":["chinese-english","doubao","protobuf","pyside6","real-time","simultaneous-interpretation","speech-translation","voice-cloning","websocket","windows"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TianqBu.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-25T20:14:53.000Z","updated_at":"2026-04-26T22:08:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/TianqBu/Doppelvoice","commit_stats":null,"previous_names":["tianqbu/doppelvoice"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/TianqBu/Doppelvoice","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TianqBu%2FDoppelvoice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TianqBu%2FDoppelvoice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TianqBu%2FDoppelvoice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TianqBu%2FDoppelvoice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TianqBu","download_url":"https://codeload.github.com/TianqBu/Doppelvoice/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TianqBu%2FDoppelvoice/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32463896,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"online","status_checked_at":"2026-04-30T02:00:05.929Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese-english","doubao","protobuf","pyside6","real-time","simultaneous-interpretation","speech-translation","voice-cloning","websocket","windows"],"created_at":"2026-04-26T08:08:21.665Z","updated_at":"2026-05-01T13:01:09.575Z","avatar_url":"https://github.com/TianqBu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Doppelvoice\n\n\u003e **Your voice, in any language.**\n\u003e Real-time speech-to-speech translation with zero-shot voice cloning across **9 languages**\n\u003e (Chinese / English / Japanese / Indonesian / Spanish / Portuguese / German / French + bilingual ZH⇄EN auto).\n\u003e The other party hears **the target language in your own voice** through any meeting app —\n\u003e Zoom, Teams, WeChat, Google Meet, OBS, anything that takes a microphone.\n\u003e\n\u003e _Powered by ByteDance Doubao Seed LiveInterpret 2.0._\n\n[中文](README.zh-CN.md) · [Architecture](docs/en/ARCHITECTURE.md) · [Setup](docs/en/SETUP.md) · [Troubleshooting](docs/en/TROUBLESHOOTING.md)\n\n[![tests](https://github.com/TianqBu/Doppelvoice/actions/workflows/tests.yml/badge.svg)](https://github.com/TianqBu/Doppelvoice/actions/workflows/tests.yml)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![Platform](https://img.shields.io/badge/platform-Windows-lightgrey.svg)]()\n[![Release](https://img.shields.io/github/v/release/TianqBu/Doppelvoice)](https://github.com/TianqBu/Doppelvoice/releases/latest)\n\n---\n\n## What it does\n\n```\nYou speak \u003csource lang\u003e  ─►  Doppelvoice  ─►  Peer hears \u003ctarget lang\u003e (in your voice)\n   ┌──────────────────┐       ┌─────────┐       ┌──────────────────────────────┐\n   │     your mic     │ ────► │ Doubao  │ ────► │ virtual mic → Zoom / Teams … │\n   └──────────────────┘       │ AST 2.0 │       └──────────────────────────────┘\n                              └─────────┘\n```\n\nPick any of 9 source/target language codes (`zh / en / ja / id / es / pt / de / fr`)\nor use `zhen` on both sides for bilingual ZH⇄EN auto-detection.\n\nEnd-to-end latency ≈ 2.5–3 s. Subtitles stream token-by-token; voice is cloned zero-shot from your speech as you talk.\n\n## Features\n\n- 🎙 **End-to-end speech-to-speech** — no separate STT / MT / TTS plumbing\n- 🗣 **Zero-shot voice cloning** — model captures your voice on the fly; explicit\n  `denoise=false` to retain breath / resonance details\n- 🌐 **9 languages** — `zh / en / ja / id / es / pt / de / fr / zhen` (the\n  last one is the bilingual ZH⇄EN auto mode)\n- ⚡ **~2.5 s latency** — production-grade real-time\n- 🪟 **Native Windows GUI** (PySide6) with live bilingual subtitles\n- 🔌 **Universal compatibility** — anything that accepts a microphone works\n- 🔁 **Automatic reconnect** with exponential backoff and fatal-error classification\n- 🔒 **Privacy-first defaults** — translated audio and subtitles never persist\n  to disk unless you opt in; logs auto-redact API keys and bearer tokens\n- 🧹 **Clean device picker** — one entry per physical device (host-API\n  duplicates collapsed; MME 31-char name truncation handled)\n- 🛠 **Configurable** — sample rate, jitter buffer, RMS gate, denoise toggle,\n  speaker_id, all tweakable\n\n## Demo\n\n![Doppelvoice GUI](docs/images/screenshot.png)\n\n## Quick start\n\nTwo ways to install. **Option A** is the fastest (no Python needed).\n\n### Option A — Pre-built Windows binary (recommended)\n\n1. Install [VB-Audio Virtual Cable](https://vb-audio.com/Cable/) → run installer as admin → reboot.\n2. Download the latest **`Doppelvoice-vX.Y.Z-win64.zip`** from the [Releases page](https://github.com/TianqBu/Doppelvoice/releases/latest).\n3. Unzip anywhere, then inside the folder: copy `.env.example` → `.env`, fill in `DOUBAO_APP_KEY` / `DOUBAO_ACCESS_KEY` (get them from the [Volcengine Console](https://console.volcengine.com/speech/app)).\n4. Double-click `Doppelvoice.exe`. The GUI opens.\n5. In your meeting app, set the microphone to **`CABLE Output (VB-Audio Virtual Cable)`**.\n\n### Option B — From source (for developers)\n\n```cmd\ngit clone https://github.com/TianqBu/Doppelvoice.git\ncd Doppelvoice\npython -m venv .venv\n.venv\\Scripts\\pip install -e .       :: installs from pyproject.toml\n:: or: .venv\\Scripts\\pip install -r requirements.txt\n\ncopy .env.example .env\nnotepad .env       :: fill in DOUBAO_APP_KEY / DOUBAO_ACCESS_KEY\n\ncheck.bat          :: verifies devices + API connectivity + StartSession\ngui.bat            :: launches the GUI\nrun.bat            :: CLI mode\n```\n\nIn your meeting app: pick **`CABLE Output (VB-Audio Virtual Cable)`** as the microphone.\n\n## CLI\n\n```cmd\nrun.bat                              :: start translation (CLI)\nrun.bat --gui                        :: launch GUI\nrun.bat --check                      :: self-check\nrun.bat --list-devices               :: list audio devices\nrun.bat --source en --target zh      :: reverse direction\nrun.bat --jitter-ms 80               :: lower latency (more underrun risk)\nrun.bat --log-level DEBUG            :: verbose logs\n```\n\n## Configuration\n\nAll settings have sensible defaults. Override via `.env` or CLI flags.\n\n| Variable | Default | Notes |\n|---|---|---|\n| `DOUBAO_APP_KEY` / `DOUBAO_ACCESS_KEY` | _required_ | from Volcengine console |\n| `DOUBAO_RESOURCE_ID` | `volc.service_type.10053` | AST 2.0 resource ID |\n| `SOURCE_LANG` / `TARGET_LANG` | `zh` / `en` | one of `zh / en / ja / id / es / pt / de / fr / zhen`. Use `zhen` on **both** sides for bilingual ZH⇄EN auto mode. |\n| `MODE` | `s2s` | `s2s` (speech→speech) or `s2t` (speech→text) |\n| `DENOISE` | `0` | `1` = server-side denoise on (cleaner input but flatter voice clone). `0` keeps breath / resonance for better cloning. |\n| `SPEAKER_ID` | _empty_ | Doubao `ReqParams.speaker_id` — empty = clone the speaker; set to a preset like `zh_female_vv_uranus_bigtts` to use a stock voice instead |\n| `INPUT_DEVICE` / `OUTPUT_DEVICE` | _auto_ | substring of device name (host API hidden; one entry per physical device) |\n| `LOG_LEVEL` | `INFO` | `DEBUG` for verbose |\n| `DUMP_AUDIO` | `false` | persist per-sentence ogg blobs (debug only) |\n| `LOG_SUBTITLE` | `false` | persist subtitle text in logs (debug only) |\n\n## Architecture\n\n```\nsrc/doppelvoice/\n├── engine/        # Doubao AST 2.0 protobuf WebSocket client\n├── audio/         # PortAudio (sounddevice) capture + playback + ogg/opus decoder\n├── pipeline/      # asyncio orchestration: capture → ws → decode → playback\n├── gui/           # PySide6 + qasync\n├── cli.py\n└── config.py\n```\n\nSee [docs/en/ARCHITECTURE.md](docs/en/ARCHITECTURE.md) for the full protocol details.\n\n## Tested with\n\n- Windows 10 / 11 x64\n- Python 3.10–3.12\n- VB-Audio Virtual Cable 1.0.4 (Driver Pack 43)\n- Zoom, 腾讯会议, 微信电话, Google Meet (Chrome), OBS\n\n## Known limitations\n\n1. **Voice cloning quality varies** with mic and clarity. AirPods over Bluetooth\n   HFP (16 kHz narrowband phone mode) gives mediocre results — a wired/USB mic\n   or laptop built-in mic is recommended. The default `denoise=false` already\n   tells the server to keep your voice's unique characteristics; toggling it\n   on in Settings would flatten the clone further.\n2. **End-to-end latency floor ≈ 2.5 s** is the model's hard limit per the\n   [Seed LiveInterpret 2.0 paper](https://arxiv.org/abs/2507.17527); local\n   processing adds \u003c500 ms.\n3. **Voice expressiveness** of the public AST API is good but not as lively\n   as the Volcengine Console demo (which goes through a different BFF endpoint).\n4. **Per-sentence audio decoding** (ogg_opus) adds ~500 ms latency vs raw\n   PCM (which the API does not currently honor).\n5. **Use headphones, not speakers.** With external speakers the meeting\n   audio gets re-captured by your mic, re-translated, and sent back to the\n   peer as their own translated voice — a textbook acoustic feedback loop.\n   See [Troubleshooting](docs/en/TROUBLESHOOTING.md#feedback-loop-when-using-speakers).\n\n## Privacy\n\n- API keys live only in `.env` (gitignored).\n- Translated audio and subtitle text are **not persisted** to disk by default.\n- Set `DUMP_AUDIO=1` / `LOG_SUBTITLE=1` for debugging only.\n- All audio is sent through ByteDance's Doubao API. Review their [Terms of Service](https://www.volcengine.com/docs/82379/1394617) before use with sensitive content.\n\n## Contributing\n\nPRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\n[MIT](LICENSE).\n\n## Acknowledgements\n\n- [ByteDance Seed LiveInterpret 2.0](https://seed.bytedance.com/en/seed_liveinterpret) — the underlying translation model\n- [kizuna-ai-lab/sokuji](https://github.com/kizuna-ai-lab/sokuji) — protobuf reverse-engineering reference\n- [VB-Audio Virtual Cable](https://vb-audio.com/Cable/) — virtual audio routing on Windows\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftianqbu%2Fdoppelvoice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftianqbu%2Fdoppelvoice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftianqbu%2Fdoppelvoice/lists"}