{"id":30261582,"url":"https://github.com/techlm77/live-caption","last_synced_at":"2025-08-24T13:32:48.481Z","repository":{"id":309025707,"uuid":"1034930162","full_name":"Techlm77/Live-Caption","owner":"Techlm77","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-09T09:45:45.000Z","size":19,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-09T11:35:27.005Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Techlm77.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-09T09:33:44.000Z","updated_at":"2025-08-09T09:45:48.000Z","dependencies_parsed_at":"2025-08-09T11:52:24.932Z","dependency_job_id":null,"html_url":"https://github.com/Techlm77/Live-Caption","commit_stats":null,"previous_names":["techlm77/live-caption"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Techlm77/Live-Caption","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Techlm77%2FLive-Caption","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Techlm77%2FLive-Caption/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Techlm77%2FLive-Caption/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Techlm77%2FLive-Caption/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Techlm77","download_url":"https://codeload.github.com/Techlm77/Live-Caption/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Techlm77%2FLive-Caption/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270624822,"owners_count":24618265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-15T02:00:12.559Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-15T20:07:55.159Z","updated_at":"2025-08-15T20:07:56.454Z","avatar_url":"https://github.com/Techlm77.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Live Captions (Windows, AprilASR)\n\nFast, simple, and accurate **live captions** on Windows using WASAPI loopback + [AprilASR]. Also supports **microphone mode**. Minimal UI, resizable window, and an optional history log.\n\n\u003e Press **Esc** or close the window to stop.\n\n---\n\n## Features\n\n- 🔊 **Default: output loopback** – captions your current Windows output while you keep listening normally  \n- 🎙️ **Mic mode** – caption from any microphone or capture device (`--mic`)  \n- 🪟 **Clean overlay** – bottom-aligned single-label window, **width \u0026 height resizable**, auto-wrap  \n- 🧠 **Low-latency streaming** – AprilASR async session; feeds ~10–40 ms audio blocks  \n- 🗂️ **History log** – final lines saved to `history.txt` for later reading  \n- 🧱 **Robust loopback** – retries on device changes, handles default output switches  \n- 🖥️ **No VB-Cable needed** – uses native WASAPI loopback for system audio  \n- ⚙️ Sensible CPU thread defaults (via env vars), overridable by users if needed\n\n---\n\n## Requirements\n\n- **OS:** Windows 10/11 for output loopback (Mic mode also works on Linux/macOS, but loopback is Windows-only for now)\n- **Python:** 3.10+\n- **Model:** an AprilASR `.april` model file (e.g. `april-english-dev-01110_en.april`)\n\n### Python packages\n\nCreate `requirements.txt`:\n\n```\napril-asr\nonnxruntime\nnumpy\nsounddevice\nsoundcard\ncffi\ncomtypes ; platform_system == \"Windows\"\n```\n\nInstall:\n\n```bash\npip install -r requirements.txt\n```\n\n\u003e `tkinter` ships with the standard Python installer on Windows.  \n\u003e On Linux you’d install `python3-tk` via your package manager.\n\n---\n\n## Get a Model\n\nDownload an AprilASR **`.april`** model and put it next to `start.py`, or pass the path with `--model`.  \n(See the AprilASR docs/releases for available English or multilingual models.)\n\n---\n\n## Usage\n\n### 1) Caption “what you hear” (default, Windows)\n\n```bash\npython start.py\n```\n\n- Captures the **current default Windows output** (speakers/headset) via WASAPI loopback.\n- You keep hearing audio normally; this just taps it for captions.\n\n### 2) Microphone mode\n\n```bash\npython start.py --mic\n```\n\nOptionally target a specific input device index:\n\n```bash\npython -c \"import sounddevice as sd; print(sd.query_devices())\"\npython start.py --mic --device 1\n```\n\n### 3) Common options\n\n```bash\n--model PATH         # .april model path (default: april-english-dev-01110_en.april)\n--font \"Segoe UI\"    # font family\n--font-size 24       # size in points\n--opacity 0.96       # 0..1 window opacity\n--raw-case           # don’t auto-normalize ALL-CAPS lines\n--history-file FILE  # final lines log (default: history.txt)\n```\n\nExamples:\n\n```bash\n# Larger font \u0026 higher opacity\npython start.py --font-size 28 --opacity 0.98\n\n# Mic mode with a specific device index\npython start.py --mic --device 7\n\n# Disable any capitalization normalization entirely\npython start.py --raw-case\n```\n\n---\n\n## How it Works (very short)\n\n- Uses `soundcard` to open a **loopback recorder** on the **current default** Windows speaker (no routing hacks).\n- Feeds **float32 → PCM16** mono audio frames (~10–40 ms) to an **asynchronous AprilASR session**.\n- The session invokes your **handler** for partial \u0026 final results.  \n  We display **partials immediately** and **append finals** to the window and `history.txt`.\n\n---\n\n## Tips \u0026 Troubleshooting\n\n- **No captions / device changed?** Try switching your default output device in Windows once; the loopback will reconnect automatically, or just restart the app.\n- **Mic mode silence?** Verify the right input index (`sd.query_devices()`), and that the device has input channels.\n- **Soundcard warnings (MediaFoundation)** are filtered by default; they’re harmless data discontinuities.\n- **Performance**  \n  - The app sets sensible CPU thread env vars automatically.  \n  - If you want to override: set `OMP_NUM_THREADS`, `ORT_NUM_THREADS`, etc. before running.\n- **GPU?** AprilASR typically runs on ONNX Runtime CPU for stability. If GPU builds are available for your setup, you can experiment by switching the ORT provider—but this project targets CPU by default.\n\n---\n\n## Roadmap (nice-to-haves)\n\n- Optional **per-app capture** (via Windows Audio Session APIs)  \n- Toggleable **always-on-top** and quick **position presets**  \n- **Latency/CPU** HUD for quick tuning\n\n---\n\n## Acknowledgments\n\n- [AprilASR] for the streaming speech recognition engine  \n- `soundcard` and `sounddevice` for straightforward audio I/O\n\n---\n\n**Enjoy!** If you hit any odd Windows audio edge cases, open an issue with your log output and `sd.query_devices()` dump.\n\n[AprilASR]: https://abb128.github.io/april-asr/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechlm77%2Flive-caption","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftechlm77%2Flive-caption","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechlm77%2Flive-caption/lists"}