https://github.com/aimer1124/local-voice-input
๐๏ธ ๅฎๅ
จ็ฆป็บฟ็ๆฌๅฐ AI ่ฏญ้ณ Prompt ่พๅ
ฅๅจ โ Whisper + Ollama + Raycast๏ผ้ถไบ็ซฏไพ่ต๏ผ้็ง 100% ่ชๆง
https://github.com/aimer1124/local-voice-input
bash llm local-ai macos offline-ai ollama privacy prompt-engineering raycast swift voice-input whisper whisper-cpp
Last synced: 23 days ago
JSON representation
๐๏ธ ๅฎๅ จ็ฆป็บฟ็ๆฌๅฐ AI ่ฏญ้ณ Prompt ่พๅ ฅๅจ โ Whisper + Ollama + Raycast๏ผ้ถไบ็ซฏไพ่ต๏ผ้็ง 100% ่ชๆง
- Host: GitHub
- URL: https://github.com/aimer1124/local-voice-input
- Owner: aimer1124
- License: mit
- Created: 2026-05-26T12:51:11.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-08T07:20:28.000Z (23 days ago)
- Last Synced: 2026-06-08T08:12:15.100Z (23 days ago)
- Topics: bash, llm, local-ai, macos, offline-ai, ollama, privacy, prompt-engineering, raycast, swift, voice-input, whisper, whisper-cpp
- Language: Shell
- Size: 347 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.en.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
# ๐๏ธ local-voice-input โ Local AI Voice Prompt Input
[](https://github.com/aimer1124/local-voice-input/actions/workflows/release.yml)
[](https://github.com/aimer1124/local-voice-input/releases/latest)
[](./LICENSE)
[](#requirements)
[](./ROADMAP.md)
> **Fully offline** voice-to-text tool for programmers: press a hotkey, speak, get cleaned-up text pasted at your cursor.
>
> _CLI tool name: `vinput`_
**100% local. Audio never leaves your machine. No cloud, no accounts.**
[ไธญๆ README](./README.md) ยท [CHANGELOG](./CHANGELOG.md) ยท [ROADMAP](./ROADMAP.md) ยท [Contributing](./CONTRIBUTING.md)

---
## โจ Features
- ๐ **Fully offline**: Whisper.cpp + Ollama, your voice stays on-device
- ๐ง **AI intent refinement**: local LLM filters fillers and self-corrections, turns rambling into a structured prompt
- โก **Fast**: 10s of Chinese audio โ text in 2โ4s
- ๐๏ธ **Toggle hotkey**: press to start, press again to stop โ intuitive
- ๐ฅ๏ธ **Multi-monitor HUD**: frosted-glass overlay near the bottom-center of whichever screen your cursor is on
- ๐ฏ **Hotword-aware**: customizable hotwords boost recognition for technical terms
- ๐ **Zero-config**: works out of the box; every knob is configurable
---
## ๐ฌ Flow
```
Press โโงSpace (Raycast hotkey)
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐๏ธ Recording... โ โ Pop sound
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Speak your prompt
โผ
Press โโงSpace again โ Tink sound
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ญ Transcribing... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Whisper.cpp
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ค Polishing with AI... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Ollama (qwen2.5:3b)
โ โ pbcopy + โV
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Done โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Cleaned text appears at cursor
```
---
## ๐ง How It Works
### Architecture
```
Raycast Global Hotkey
โโ voice-input.sh (Raycast Script Command)
โโ vinput_bg.sh (main logic)
โ
โโ Lock dir (atomic mkdir) = mutex + state machine
โ โโ First press โ record mode
โ โโ Second press โ toggle mode (SIGINT to rec)
โ
โโ SoX rec โ Whisper-cli โ Ollama โ pbcopy + โV
โ
โโ 30s guard timeout (safety)
โ
โโ HUD (Swift binary) โ screen-center overlay
```
### Key Mechanisms
#### Toggle Lock (mkdir + PID file)
`/tmp/vinput.lock.d` doubles as a mutex AND state machine:
| State | mkdir | PID file | Behavior |
|---|---|---|---|
| Fresh session | succeeds | created | enter record mode |
| Mid-recording press | fails | exists | toggle mode, SIGINT to rec |
| Mid-transcription press | fails | gone | "still processing previous" HUD |
| Stale lock (crash) | fails | dead PID | auto-clean, restart |
`mkdir` is atomic and portable (macOS has no `flock`).
#### Recording Control (USE_VAD)
**Default USE_VAD=0** (recommended):
- `rec` starts capturing immediately regardless of volume
- Second hotkey press โ instant SIGINT stop
- 30s hard timeout as safety net
**Optional USE_VAD=1**:
- SoX silence filter: stop after 1.5s of silence
- Only works in quiet environments
#### Whisper Transcription
- Model: `ggml-large-v3-turbo-q5_0` (547MB, Metal backend)
- Hotwords injected via `--prompt` for better technical term recognition
- Forced `-l zh` for Chinese, English terms guided by prompt
#### LLM Refinement (with short-text skip)
- Text shorter than `SHORT_TEXT_THRESHOLD` (15 chars default) โ skip LLM, save 1โ2s
- Longer text โ Ollama with `keep_alive=30m` for warm model
- Fallback to raw Whisper output on failure
#### Screen HUD
- ~90 lines of Swift compiled to 92KB binary
- `NSVisualEffectView` with `.hudWindow` material (matches system volume HUD)
- `/tmp/vinput_hud.pid` for singleton pattern: new HUD kills previous
- Mouse-passthrough, cross-Space, auto-width
- Multi-screen aware via `NSEvent.mouseLocation`
#### UTF-8 Encoding
Raycast-spawned processes don't inherit Terminal's LANG, causing `pbcopy` to misinterpret UTF-8 bytes. Script forces:
```bash
export LANG="${LANG:-en_US.UTF-8}"
export LC_ALL="${LC_ALL:-en_US.UTF-8}"
```
---
## ๐ฆ Installation
### Homebrew Tap (recommended ยท v1.1.1+)
```bash
brew tap aimer1124/tap
brew install local-voice-input
vinput setup
```
`brew install` only deploys scripts and the HUD binary to `libexec/`. The follow-up `vinput setup` is the bootstrap step that:
1. Installs `sox`, `jq`, `whisper-cpp`, `ollama` via brew (skips if present)
2. Downloads Whisper `large-v3-turbo-q5_0` (~547MB)
3. Symlinks `vinput / vinput.sh / vinput_bg.sh / hud` into `~/.whisper_models/`
4. Writes default config `~/.config/vinput.conf` + hotwords list
5. Copies the Raycast command script to `~/.config/raycast-scripts/`
6. Starts the Ollama service, pulls `qwen2.5:3b` (~2GB), pre-warms it
Upgrades: `brew upgrade local-voice-input` (the scripts are symlinks, so version bumps land instantly).
### From source (developer / no brew tap)
```bash
git clone https://github.com/aimer1124/local-voice-input.git
cd local-voice-input
./install.sh
```
`install.sh` covers the same ground as `vinput setup` but without going through the brew tap: it installs deps with `brew install` directly, downloads the Whisper model, compiles or downloads the HUD binary, copies scripts to `~/.whisper_models/`, and warms up Ollama.
### Manual steps (cannot be automated)
1. **Privacy โ Microphone**: enable Raycast (sox will prompt on first use)
2. **Privacy โ Accessibility**: enable Raycast (for auto โV paste)
3. **Raycast Settings โ Extensions โ Script Commands**:
- Add Script Directory: `~/.config/raycast-scripts`
- Bind a hotkey to `๐๏ธ ่ฏญ้ณ่พๅ
ฅ` (recommended: `โโงSpace`)
### Requirements
- macOS 13+
- Apple Silicon (M1/M2/M3/M4)
- 16GB RAM recommended (8GB works)
- 3GB disk space
- A working microphone (โ ๏ธ 3-pole TRS music headphones may route input to a phantom port)
---
## ๐ Usage
1. Click into any text field
2. Press your hotkey (e.g., โโงSpace)
3. Speak after the Pop sound
4. Press the hotkey again to stop
5. Wait 2โ4s โ text appears at cursor
### Performance budget (10s Chinese audio)
| Stage | Time |
|---|---|
| Recording | however long you speak |
| Whisper | ~2s |
| Ollama | ~1s (short skip) / ~2s (long) |
| Paste | <0.1s |
| **Overhead** | **2โ4s** |
---
## โ๏ธ Configuration
All settings live in `~/.config/vinput.conf`:
```bash
# Whisper ASR
MODEL_PATH="$HOME/.whisper_models/ggml-large-v3-turbo-q5_0.bin"
WHISPER_LANG="zh"
WHISPER_THREADS=8
# Ollama
OLLAMA_MODEL="qwen2.5:3b"
SHORT_TEXT_THRESHOLD=15
# Behavior
AUTO_PASTE=1
USE_VAD=0
MAX_REC_SECONDS=30
```
Hotwords go in `~/.config/vinput_hotwords.txt` (one per line).
---
## ๐ vs Commercial IMEs
| | vinput | Commercial IMEs |
|---|---|---|
| Privacy | โ
Fully local | โ Cloud |
| Offline | โ
| โ |
| Technical terms | โ
Custom hotwords + LLM | โ ๏ธ Generic |
| Intent refinement | โ
AI prompt distillation | โ Transcription only |
| Streaming output | โ Wait until done | โ
Real-time |
| Direct insertion | โ ๏ธ via clipboard | โ
System IME |
| Dialects | โ ๏ธ Mandarin only | โ
Many |
**Best positioning**: use vinput as an **"AI Prompt dictation button"** alongside your system IME โ vinput for long prompts to Claude/Cursor/ChatGPT, system IME for chat/passwords/short replies.
---
## ๐ License
MIT โ see [LICENSE](./LICENSE)
---
## ๐ Acknowledgements
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp)
- [Ollama](https://ollama.com/)
- [Qwen](https://github.com/QwenLM/Qwen)
- [Raycast](https://www.raycast.com/)
- [SoX](http://sox.sourceforge.net/)