{"id":50434122,"url":"https://github.com/aimer1124/local-voice-input","last_synced_at":"2026-06-09T05:01:53.076Z","repository":{"id":360457779,"uuid":"1250245682","full_name":"aimer1124/local-voice-input","owner":"aimer1124","description":"🎙️ 完全离线的本地 AI 语音 Prompt 输入器 — Whisper + Ollama + Raycast，零云端依赖，隐私 100% 自控","archived":false,"fork":false,"pushed_at":"2026-06-08T07:20:28.000Z","size":355,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-08T08:12:15.100Z","etag":null,"topics":["bash","llm","local-ai","macos","offline-ai","ollama","privacy","prompt-engineering","raycast","swift","voice-input","whisper","whisper-cpp"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aimer1124.png","metadata":{"files":{"readme":"README.en.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-26T12:51:11.000Z","updated_at":"2026-06-08T07:20:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/aimer1124/local-voice-input","commit_stats":null,"previous_names":["aimer1124/local-voice-input"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/aimer1124/local-voice-input","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimer1124%2Flocal-voice-input","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimer1124%2Flocal-voice-input/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimer1124%2Flocal-voice-input/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimer1124%2Flocal-voice-input/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aimer1124","download_url":"https://codeload.github.com/aimer1124/local-voice-input/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimer1124%2Flocal-voice-input/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34092262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","llm","local-ai","macos","offline-ai","ollama","privacy","prompt-engineering","raycast","swift","voice-input","whisper","whisper-cpp"],"created_at":"2026-05-31T16:02:39.890Z","updated_at":"2026-06-09T05:01:53.008Z","avatar_url":"https://github.com/aimer1124.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎙️ local-voice-input — Local AI Voice Prompt Input\n\n[![CI](https://github.com/aimer1124/local-voice-input/actions/workflows/release.yml/badge.svg)](https://github.com/aimer1124/local-voice-input/actions/workflows/release.yml)\n[![Latest release](https://img.shields.io/github/v/release/aimer1124/local-voice-input?color=brightgreen)](https://github.com/aimer1124/local-voice-input/releases/latest)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)\n[![Platform](https://img.shields.io/badge/platform-macOS%2013%2B%20%7C%20Apple%20Silicon-lightgrey)](#requirements)\n[![Roadmap](https://img.shields.io/badge/roadmap-v1.1%20%E2%86%92%20v2.0-blue)](./ROADMAP.md)\n\n\u003e **Fully offline** voice-to-text tool for programmers: press a hotkey, speak, get cleaned-up text pasted at your cursor.\n\u003e\n\u003e _CLI tool name: `vinput`_\n\n**100% local. Audio never leaves your machine. No cloud, no accounts.**\n\n[中文 README](./README.md) · [CHANGELOG](./CHANGELOG.md) · [ROADMAP](./ROADMAP.md) · [Contributing](./CONTRIBUTING.md)\n\n![demo](./assets/demo.gif)\n\n---\n\n## ✨ Features\n\n- 🔒 **Fully offline**: Whisper.cpp + Ollama, your voice stays on-device\n- 🧠 **AI intent refinement**: local LLM filters fillers and self-corrections, turns rambling into a structured prompt\n- ⚡ **Fast**: 10s of Chinese audio → text in 2–4s\n- 🎛️ **Toggle hotkey**: press to start, press again to stop — intuitive\n- 🖥️ **Multi-monitor HUD**: frosted-glass overlay near the bottom-center of whichever screen your cursor is on\n- 🎯 **Hotword-aware**: customizable hotwords boost recognition for technical terms\n- 🔌 **Zero-config**: works out of the box; every knob is configurable\n\n---\n\n## 🎬 Flow\n\n```\nPress ⌘⇧Space (Raycast hotkey)\n │\n ▼\n┌────────────────────────────┐\n│ 🎙️ Recording... │ ← Pop sound\n└────────────────────────────┘\n │ Speak your prompt\n ▼\nPress ⌘⇧Space again ← Tink sound\n │\n ▼\n┌────────────────────────────┐\n│ 💭 Transcribing... │\n└────────────────────────────┘\n │ ↓ Whisper.cpp\n┌────────────────────────────┐\n│ 🤖 Polishing with AI... │\n└────────────────────────────┘\n │ ↓ Ollama (qwen2.5:3b)\n │ ↓ pbcopy + ⌘V\n ▼\n┌────────────────────────────┐\n│ ✓ Done │\n└────────────────────────────┘\n │\n ▼\n Cleaned text appears at cursor\n```\n\n---\n\n## 🧠 How It Works\n\n### Architecture\n\n```\nRaycast Global Hotkey\n └─ voice-input.sh (Raycast Script Command)\n └─ vinput_bg.sh (main logic)\n │\n ├─ Lock dir (atomic mkdir) = mutex + state machine\n │ ├─ First press → record mode\n │ └─ Second press → toggle mode (SIGINT to rec)\n │\n ├─ SoX rec → Whisper-cli → Ollama → pbcopy + ⌘V\n │\n ├─ 30s guard timeout (safety)\n │\n └─ HUD (Swift binary) → screen-center overlay\n```\n\n### Key Mechanisms\n\n#### Toggle Lock (mkdir + PID file)\n\n`/tmp/vinput.lock.d` doubles as a mutex AND state machine:\n\n| State | mkdir | PID file | Behavior |\n|---|---|---|---|\n| Fresh session | succeeds | created | enter record mode |\n| Mid-recording press | fails | exists | toggle mode, SIGINT to rec |\n| Mid-transcription press | fails | gone | \"still processing previous\" HUD |\n| Stale lock (crash) | fails | dead PID | auto-clean, restart |\n\n`mkdir` is atomic and portable (macOS has no `flock`).\n\n#### Recording Control (USE_VAD)\n\n**Default USE_VAD=0** (recommended):\n- `rec` starts capturing immediately regardless of volume\n- Second hotkey press → instant SIGINT stop\n- 30s hard timeout as safety net\n\n**Optional USE_VAD=1**:\n- SoX silence filter: stop after 1.5s of silence\n- Only works in quiet environments\n\n#### Whisper Transcription\n\n- Model: `ggml-large-v3-turbo-q5_0` (547MB, Metal backend)\n- Hotwords injected via `--prompt` for better technical term recognition\n- Forced `-l zh` for Chinese, English terms guided by prompt\n\n#### LLM Refinement (with short-text skip)\n\n- Text shorter than `SHORT_TEXT_THRESHOLD` (15 chars default) → skip LLM, save 1–2s\n- Longer text → Ollama with `keep_alive=30m` for warm model\n- Fallback to raw Whisper output on failure\n\n#### Screen HUD\n\n- ~90 lines of Swift compiled to 92KB binary\n- `NSVisualEffectView` with `.hudWindow` material (matches system volume HUD)\n- `/tmp/vinput_hud.pid` for singleton pattern: new HUD kills previous\n- Mouse-passthrough, cross-Space, auto-width\n- Multi-screen aware via `NSEvent.mouseLocation`\n\n#### UTF-8 Encoding\n\nRaycast-spawned processes don't inherit Terminal's LANG, causing `pbcopy` to misinterpret UTF-8 bytes. Script forces:\n```bash\nexport LANG=\"${LANG:-en_US.UTF-8}\"\nexport LC_ALL=\"${LC_ALL:-en_US.UTF-8}\"\n```\n\n---\n\n## 📦 Installation\n\n### Homebrew Tap (recommended · v1.1.1+)\n\n```bash\nbrew tap aimer1124/tap\nbrew install local-voice-input\nvinput setup\n```\n\n`brew install` only deploys scripts and the HUD binary to `libexec/`. The follow-up `vinput setup` is the bootstrap step that:\n\n1. Installs `sox`, `jq`, `whisper-cpp`, `ollama` via brew (skips if present)\n2. Downloads Whisper `large-v3-turbo-q5_0` (~547MB)\n3. Symlinks `vinput / vinput.sh / vinput_bg.sh / hud` into `~/.whisper_models/`\n4. Writes default config `~/.config/vinput.conf` + hotwords list\n5. Copies the Raycast command script to `~/.config/raycast-scripts/`\n6. Starts the Ollama service, pulls `qwen2.5:3b` (~2GB), pre-warms it\n\nUpgrades: `brew upgrade local-voice-input` (the scripts are symlinks, so version bumps land instantly).\n\n### From source (developer / no brew tap)\n\n```bash\ngit clone https://github.com/aimer1124/local-voice-input.git\ncd local-voice-input\n./install.sh\n```\n\n`install.sh` covers the same ground as `vinput setup` but without going through the brew tap: it installs deps with `brew install` directly, downloads the Whisper model, compiles or downloads the HUD binary, copies scripts to `~/.whisper_models/`, and warms up Ollama.\n\n### Manual steps (cannot be automated)\n\n1. **Privacy → Microphone**: enable Raycast (sox will prompt on first use)\n2. **Privacy → Accessibility**: enable Raycast (for auto ⌘V paste)\n3. **Raycast Settings → Extensions → Script Commands**:\n - Add Script Directory: `~/.config/raycast-scripts`\n - Bind a hotkey to `🎙️ 语音输入` (recommended: `⌘⇧Space`)\n\n### Requirements\n\n- macOS 13+\n- Apple Silicon (M1/M2/M3/M4)\n- 16GB RAM recommended (8GB works)\n- 3GB disk space\n- A working microphone (⚠️ 3-pole TRS music headphones may route input to a phantom port)\n\n---\n\n## 🚀 Usage\n\n1. Click into any text field\n2. Press your hotkey (e.g., ⌘⇧Space)\n3. Speak after the Pop sound\n4. Press the hotkey again to stop\n5. Wait 2–4s — text appears at cursor\n\n### Performance budget (10s Chinese audio)\n\n| Stage | Time |\n|---|---|\n| Recording | however long you speak |\n| Whisper | ~2s |\n| Ollama | ~1s (short skip) / ~2s (long) |\n| Paste | \u003c0.1s |\n| **Overhead** | **2–4s** |\n\n---\n\n## ⚙️ Configuration\n\nAll settings live in `~/.config/vinput.conf`:\n\n```bash\n# Whisper ASR\nMODEL_PATH=\"$HOME/.whisper_models/ggml-large-v3-turbo-q5_0.bin\"\nWHISPER_LANG=\"zh\"\nWHISPER_THREADS=8\n\n# Ollama\nOLLAMA_MODEL=\"qwen2.5:3b\"\nSHORT_TEXT_THRESHOLD=15\n\n# Behavior\nAUTO_PASTE=1\nUSE_VAD=0\nMAX_REC_SECONDS=30\n```\n\nHotwords go in `~/.config/vinput_hotwords.txt` (one per line).\n\n---\n\n## 🆚 vs Commercial IMEs\n\n| | vinput | Commercial IMEs |\n|---|---|---|\n| Privacy | ✅ Fully local | ❌ Cloud |\n| Offline | ✅ | ❌ |\n| Technical terms | ✅ Custom hotwords + LLM | ⚠️ Generic |\n| Intent refinement | ✅ AI prompt distillation | ❌ Transcription only |\n| Streaming output | ❌ Wait until done | ✅ Real-time |\n| Direct insertion | ⚠️ via clipboard | ✅ System IME |\n| Dialects | ⚠️ Mandarin only | ✅ Many |\n\n**Best positioning**: use vinput as an **\"AI Prompt dictation button\"** alongside your system IME — vinput for long prompts to Claude/Cursor/ChatGPT, system IME for chat/passwords/short replies.\n\n---\n\n## 📄 License\n\nMIT — see [LICENSE](./LICENSE)\n\n---\n\n## 🙏 Acknowledgements\n\n- [whisper.cpp](https://github.com/ggerganov/whisper.cpp)\n- [Ollama](https://ollama.com/)\n- [Qwen](https://github.com/QwenLM/Qwen)\n- [Raycast](https://www.raycast.com/)\n- [SoX](http://sox.sourceforge.net/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faimer1124%2Flocal-voice-input","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faimer1124%2Flocal-voice-input","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faimer1124%2Flocal-voice-input/lists"}