{"id":48380428,"url":"https://github.com/gabrimatic/local-whisper","last_synced_at":"2026-04-05T19:30:49.085Z","repository":{"id":329716258,"uuid":"1120491200","full_name":"gabrimatic/local-whisper","owner":"gabrimatic","description":"On-device voice transcription, grammar correction, and text-to-speech for macOS. Runs on MLX.","archived":false,"fork":false,"pushed_at":"2026-03-08T14:31:51.000Z","size":1514,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-08T18:06:41.511Z","etag":null,"topics":["apple-silicon","kokoro","macos","menu-bar","mlx","offline","privacy","python","qwen3-asr","speech-to-text","swift","text-to-speech","transcription"],"latest_commit_sha":null,"homepage":"https://github.com/gabrimatic/local-whisper","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gabrimatic.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"buy_me_a_coffee":"gabrimatic"}},"created_at":"2025-12-21T10:35:54.000Z","updated_at":"2026-03-08T14:31:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gabrimatic/local-whisper","commit_stats":null,"previous_names":["gabrimatic/local-whisper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gabrimatic/local-whisper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrimatic%2Flocal-whisper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrimatic%2Flocal-whisper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrimatic%2Flocal-whisper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrimatic%2Flocal-whisper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gabrimatic","download_url":"https://codeload.github.com/gabrimatic/local-whisper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gabrimatic%2Flocal-whisper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31448215,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T15:22:31.103Z","status":"ssl_error","status_checked_at":"2026-04-05T15:22:00.205Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","kokoro","macos","menu-bar","mlx","offline","privacy","python","qwen3-asr","speech-to-text","swift","text-to-speech","transcription"],"created_at":"2026-04-05T19:30:43.614Z","updated_at":"2026-04-05T19:30:49.076Z","avatar_url":"https://github.com/gabrimatic.png","language":"Python","funding_links":["https://buymeacoffee.com/gabrimatic","https://www.buymeacoffee.com/gabrimatic"],"categories":[],"sub_categories":[],"readme":"# Local Whisper\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![Platform: macOS](https://img.shields.io/badge/platform-macOS-lightgrey.svg)]()\n[![Apple Silicon](https://img.shields.io/badge/Apple_Silicon-required-blue.svg)]()\n[![Python 3.11+](https://img.shields.io/badge/Python-3.11+-blue.svg)]()\n\n**On-device voice transcription, grammar correction, and text-to-speech for macOS. Private, fast, runs on MLX.**\n\nDouble-tap, speak, tap to stop. Text is ready. Multiple engines, pluggable grammar, all MLX-native on Apple Silicon. Nothing leaves your Mac.\nSelect text, hit ⌥T, hear it read aloud. Multiple voices, streaming playback, same deal.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/hero.png\" width=\"600\" alt=\"Local Whisper recording in Notes\"\u003e\n\u003c/p\u003e\n\n---\n\n## Quick Start\n\n**Apple Silicon required.** Microphone and Accessibility permissions needed.\n\n```bash\ngit clone https://github.com/gabrimatic/local-whisper.git\ncd local-whisper\n./setup.sh\n```\n\nOne command. Installs deps, downloads core local models, builds the UI, sets up auto-start, creates the `wh` alias.\n\n| Action | Key |\n|--------|-----|\n| Start recording | Double-tap **Right Option** |\n| Hold to record | Hold **Right Option** past double-tap threshold |\n| Stop and transcribe | Tap **Right Option** or **Space** |\n| Cancel | **Esc** |\n| Read selected text aloud | **⌥T** |\n| Stop speech | **⌥T** again or **Esc** |\n\n---\n\n## What It Does\n\n- **On-device transcription** via MLX. Multiple engines, up to 20 minutes per recording.\n- **Grammar correction** with pluggable backends: Apple Intelligence, Ollama, LM Studio. Or disable it.\n- **Text-to-speech** reads any selected text aloud. Works in any app, multiple voices, streaming playback, fully offline via Kokoro MLX.\n- **Text replacements** for custom spoken-to-correct mappings.\n- **Audio processing**: VAD, silence trimming, noise reduction, normalization.\n- **Keyboard shortcuts** for proofreading, rewriting, prompt engineering on selected text.\n- **CLI**: `wh whisper`, `wh listen`, `wh transcribe` for scripting and automation.\n- **Native macOS UI**: menu bar, Liquid Glass overlay, settings window.\n- **Auto-backup** of every recording and transcription.\n\n### Keyboard Shortcuts\n\n| Shortcut | Action |\n|----------|--------|\n| **⌥T** | Read selected text aloud (again or Esc to stop) |\n| **Ctrl+Shift+G** | Proofread selected text |\n| **Ctrl+Shift+R** | Rewrite selected text |\n| **Ctrl+Shift+P** | Optimize selected text as an LLM prompt |\n\nResults go to clipboard. TTS plays through speakers.\n\n### Feedback\n\n- **Sounds**: Pop on start, Glass on success, Basso on failure\n- **Menu bar**: animated waveform (recording), speaker icon (speech)\n- **Overlay**: `0.0` recording · `···` processing · `Copied` done · `Failed` error · `Speaking...`\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/overlay-recording.png\" width=\"280\" alt=\"Floating overlay during recording\"\u003e\n\u003c/p\u003e\n\n---\n\n## Transcription Engines\n\nSwitch via Settings, `wh engine \u003cname\u003e`, or config.\n\n### Qwen3-ASR (default)\n\nIn-process via [qwen3-asr-mlx](https://github.com/gabrimatic/qwen3-asr-mlx). No server, no network. Long audio native.\n\n| Setting | Default | Notes |\n|---------|---------|-------|\n| `model` | `mlx-community/Qwen3-ASR-1.7B-bf16` | Downloaded by `setup.sh` |\n| `language` | `auto` | Force with `en`, `fa`, etc. |\n| `timeout` | `0` | No limit |\n| `prefill_step_size` | `4096` | Higher = faster on Apple Silicon |\n\n### WhisperKit (alternative)\n\nWhisper on Apple Neural Engine via [Argmax](https://github.com/argmaxinc/WhisperKit). Install with `brew install whisperkit-cli`, switch with `wh engine whisperkit`.\n\n| Model | Notes |\n|-------|-------|\n| `tiny` / `tiny.en` | Fastest, lowest accuracy |\n| `base` / `base.en` | |\n| `small` / `small.en` | |\n| `whisper-large-v3-v20240930` | Best accuracy (default) |\n\n---\n\n## Text-to-Speech\n\nKokoro-82M via [kokoro-mlx](https://github.com/gabrimatic/kokoro-mlx). Runs in-process, no server, no network. Streaming playback starts before full synthesis completes.\n\n**Usage:**\n- **⌥T** on selected text in any app. Press ⌥T again, Esc, or start a recording to stop.\n- **CLI:** `wh whisper \"text\"`, `wh whisper --voice af_bella \"text\"`, or pipe stdin with `echo \"hello\" | wh whisper`.\n\nThe overlay shows \"Generating speech...\" during synthesis, then \"Speaking...\" during playback. The shortcut is configurable via `tts.speak_shortcut` in config.\n\n### Voices\n\nMultiple presets available. Default is Sky (`af_sky`).\n\n| Voice | ID | Type |\n|-------|-----|------|\n| Heart | `af_heart` | American female |\n| Bella | `af_bella` | American female |\n| Nova | `af_nova` | American female |\n| Sky | `af_sky` (default) | American female |\n| Sarah | `af_sarah` | American female |\n| Nicole | `af_nicole` | American female |\n| Alice | `bf_alice` | British female |\n| Emma | `bf_emma` | British female |\n| Adam | `am_adam` | American male |\n| Echo | `am_echo` | American male |\n| Eric | `am_eric` | American male |\n| Liam | `am_liam` | American male |\n| Daniel | `bm_daniel` | British male |\n| George | `bm_george` | British male |\n\n---\n\n## Grammar Backends\n\nOptional. Pick a grammar backend or disable it:\n\n| Backend | Requirements | Notes |\n|---------|-------------|-------|\n| **Apple Intelligence** | macOS 15+, Apple Silicon, Apple Intelligence enabled | Fastest, best quality |\n| **Ollama** | [Ollama](https://ollama.com) installed and running | Works on any Mac |\n| **LM Studio** | [LM Studio](https://lmstudio.ai) with a model loaded and the local server started | Works on any Mac |\n| **Disabled** | None | Transcription only |\n\nSwitch from menu bar (instant), `wh backend \u003cname\u003e` (restarts), or Settings.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eOllama setup\u003c/strong\u003e\u003c/summary\u003e\n\n1. Download from [ollama.com](https://ollama.com)\n2. Pull a model and start the server:\n\n```bash\nollama pull gemma3:4b-it-qat\nollama serve\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eLM Studio setup\u003c/strong\u003e\u003c/summary\u003e\n\n1. Download from [lmstudio.ai](https://lmstudio.ai)\n2. Download and load a model (e.g., `google/gemma-3-4b`)\n3. **Start the local server**: Developer tab \u003e Start Server\n\n\u003e Loading a model does **not** start the server. Start it from Developer tab.\n\n\u003c/details\u003e\n\n---\n\n## Usage\n\n### CLI\n\n`wh` controls everything:\n\n```bash\nwh                  # Status and help\nwh status           # Service status, PID, grammar backend\nwh start            # Launch the service\nwh stop             # Stop the service\nwh restart          # Restart (rebuilds Swift UI if sources changed)\nwh build            # Rebuild Swift UI app\n\nwh engine           # Show current engine and list available\nwh engine whisperkit  # Switch transcription engine\nwh backend          # Show current grammar backend and list available\nwh backend ollama   # Switch grammar backend\n\nwh replace          # Show text replacement rules\nwh replace add \"gonna\" \"going to\"\nwh replace remove \"gonna\"\nwh replace on|off   # Enable or disable replacements\n\nwh whisper \"text\"   # Speak text aloud via Kokoro TTS\nwh whisper --voice af_bella \"text\"\necho \"hello\" | wh whisper\n\nwh listen           # Record until silence, output transcription\nwh listen 30        # Record up to 30 seconds\nwh listen --raw     # Raw transcription, no grammar\n\nwh transcribe recording.wav\nwh transcribe --raw audio.wav\n\nwh config           # Interactive config editor (static summary when piped)\nwh config edit      # Open config.toml in $EDITOR\nwh config path      # Print config file path\nwh doctor           # Check system health\nwh doctor --fix     # Auto-repair issues\nwh log              # Tail service log\nwh update           # Pull, upgrade deps, warm up models, rebuild, restart\nwh version          # Show version\nwh uninstall        # Completely remove Local Whisper\n```\n\n### Menu Bar\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/menu-bar.png\" width=\"380\" alt=\"Local Whisper menu bar\"\u003e\n\u003c/p\u003e\n\n| Item | What it does |\n|------|-------------|\n| Status | Current state |\n| Grammar | Switch grammar backend in-place |\n| Replacements | Toggle, shows rule count |\n| Retry Last / Copy Last | Re-transcribe or re-copy |\n| Transcriptions | Last 20, click to copy |\n| Recordings | Audio files, click to reveal in Finder |\n| Settings... | Full GUI |\n| Restart Service | Restart background service |\n| Check for Updates | Pull, rebuild, restart |\n| Quit | Exit |\n\n### Settings\n\nThree tabs: General (engine, grammar, TTS, shortcuts, UI), Advanced (audio, params, backends), About.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/settings.png\" width=\"480\" alt=\"Settings window\"\u003e\n\u003c/p\u003e\n\nSaves to `~/.whisper/config.toml`. Restart-required fields warn and offer immediate restart.\n\n---\n\n## Configuration\n\n`~/.whisper/config.toml`. Edit via Settings, `wh config`, or directly.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eFull config reference\u003c/strong\u003e\u003c/summary\u003e\n\n```toml\n[hotkey]\nkey = \"alt_r\"              # alt_r, alt_l, ctrl_r, ctrl_l, cmd_r, cmd_l,\n                           # shift_r, shift_l, caps_lock, f1-f12\ndouble_tap_threshold = 0.4 # seconds\n\n[transcription]\nengine = \"qwen3_asr\"      # \"qwen3_asr\" (default) or \"whisperkit\"\n\n[qwen3_asr]\nmodel = \"mlx-community/Qwen3-ASR-1.7B-bf16\"\nlanguage = \"auto\"          # \"en\", \"fa\", etc. or \"auto\"\ntimeout = 0                # 0 = no limit\nprefill_step_size = 4096   # higher = faster on Apple Silicon\ntemperature = 0.0\ntop_p = 1.0\ntop_k = 0\nrepetition_context_size = 100\nrepetition_penalty = 1.2\nchunk_duration = 1200.0    # max chunk length in seconds\n\n[whisper]\nmodel = \"whisper-large-v3-v20240930\"\nlanguage = \"auto\"\nurl = \"http://localhost:50060/v1/audio/transcriptions\"\ncheck_url = \"http://localhost:50060/\"\ntimeout = 0\ntemperature = 0.0\ncompression_ratio_threshold = 2.4\nno_speech_threshold = 0.6\nlogprob_threshold = -1.0\ntemperature_fallback_count = 5\nprompt_preset = \"none\"     # \"none\", \"technical\", \"dictation\", or \"custom\"\nprompt = \"\"                # used only when prompt_preset = \"custom\"\n\n[grammar]\nbackend = \"apple_intelligence\"  # \"apple_intelligence\", \"ollama\", or \"lm_studio\"\nenabled = false\n\n[ollama]\nurl = \"http://localhost:11434/api/generate\"\ncheck_url = \"http://localhost:11434/\"\nmodel = \"gemma3:4b-it-qat\"\nkeep_alive = \"60m\"\ntimeout = 0\nmax_chars = 0\nmax_predict = 0\nnum_ctx = 0\nunload_on_exit = false\n\n[apple_intelligence]\nmax_chars = 0\ntimeout = 0\n\n[lm_studio]\nurl = \"http://localhost:1234/v1/chat/completions\"\ncheck_url = \"http://localhost:1234/\"\nmodel = \"google/gemma-3-4b\"\nmax_chars = 0\nmax_tokens = 0\ntimeout = 0\n\n[replacements]\nenabled = false\n\n[replacements.rules]\n# \"gonna\" = \"going to\"\n# \"wanna\" = \"want to\"\n\n[audio]\nsample_rate = 16000\nmin_duration = 0\nmax_duration = 0           # 0 = no limit\nmin_rms = 0.005            # silence threshold (0.0-1.0)\nvad_enabled = true\nnoise_reduction = true\nnormalize_audio = true\npre_buffer = 0.0           # seconds before hotkey (0.0 = disabled)\n\n[backup]\ndirectory = \"~/.whisper\"\nhistory_limit = 100        # max entries for text and audio history (1-1000)\n\n[ui]\nshow_overlay = true\noverlay_opacity = 0.92\nsounds_enabled = true\nnotifications_enabled = false\nauto_paste = false         # paste at cursor, preserving clipboard\n\n[shortcuts]\nenabled = true\nproofread = \"ctrl+shift+g\"\nrewrite = \"ctrl+shift+r\"\nprompt_engineer = \"ctrl+shift+p\"\n\n[tts]\nenabled = true\nprovider = \"kokoro\"\nspeak_shortcut = \"alt+t\"\n\n[kokoro_tts]\nmodel = \"mlx-community/Kokoro-82M-bf16\"\nvoice = \"af_sky\"           # See voice table in README for all available presets\n```\n\n\u003c/details\u003e\n\n---\n\n## Privacy\n\nZero network calls. Every component runs on-device or localhost.\n\n| Component | Runs at |\n|-----------|---------|\n| Qwen3-ASR | In-process MLX |\n| Kokoro TTS | In-process MLX |\n| WhisperKit | localhost:50060 |\n| Apple Intelligence | On-device |\n| Ollama | localhost:11434 |\n| LM Studio | localhost:1234 |\n\nModels cached at `~/.whisper/models/`. Config and backups at `~/.whisper/`.\n\n---\n\n## Architecture\n\nPython headless service (LaunchAgent). Swift owns all UI.\n\n```\nPython (LaunchAgent, headless)\n  ├── Recording, transcription, grammar, replacements, clipboard, hotkeys\n  ├── Text-to-Speech (Kokoro-82M, in-process)\n  ├── IPC server at ~/.whisper/ipc.sock (Swift UI communication)\n  └── Command server at ~/.whisper/cmd.sock (CLI commands)\n\nSwift (subprocess, all UI)\n  ├── Menu bar with grammar submenus and transcription history\n  ├── Floating overlay pill (recording, processing, speaking states)\n  └── Settings window (General, Advanced, About)\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eData flow\u003c/strong\u003e\u003c/summary\u003e\n\n```\n┌───────────────────────────────────────────────────────────┐\n│  Microphone → pre-buffer (ring) + live capture            │\n└──────────────────────────┬────────────────────────────────┘\n                           ▼\n┌───────────────────────────────────────────────────────────┐\n│  Audio Processing                                         │\n│  VAD → silence trim → noise reduction → normalize         │\n└──────────────────────────┬────────────────────────────────┘\n                           ▼\n┌───────────────────────────────────────────────────────────┐\n│  Transcription Engine                                     │\n│                                                           │\n│  Qwen3-ASR (default)       │  WhisperKit (alternative)   │\n│  In-process MLX            │  localhost:50060             │\n│  Long audio native         │  Split at 28s gaps          │\n└──────────────────────────┬────────────────────────────────┘\n                           ▼\n┌───────────────────────────────────────────────────────────┐\n│  Grammar Correction                                       │\n│                                                           │\n│  Apple Intelligence  │  Ollama        │  LM Studio        │\n│  On-device           │  localhost LLM │  OpenAI-compatible │\n└──────────────────────────┬────────────────────────────────┘\n                           ▼\n┌───────────────────────────────────────────────────────────┐\n│  Text Replacements                                        │\n│  Case-insensitive, word-boundary-aware regex              │\n└──────────────────────────┬────────────────────────────────┘\n                           ▼\n┌───────────────────────────────────────────────────────────┐\n│  Clipboard · Saved to ~/.whisper/                         │\n│  (auto_paste: pasted at cursor, clipboard preserved)      │\n└───────────────────────────────────────────────────────────┘\n```\n\n\u003c/details\u003e\n\n---\n\n## Troubleshooting\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003e\"This process is not trusted\"\u003c/strong\u003e\u003c/summary\u003e\n\nGrant Accessibility to the `wh` process, **not** your terminal app. System Settings opens automatically on first run.\n\nIf it didn't:\n```bash\nopen x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility\n```\n\nEnable `wh`, then `wh restart`.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eDouble-tap not working\u003c/strong\u003e\u003c/summary\u003e\n\nTap twice within 0.4s (default). Adjust `double_tap_threshold` in config.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eApple Intelligence not working\u003c/strong\u003e\u003c/summary\u003e\n\nVerify:\n1. **macOS 15** (Sequoia) or later\n2. **Apple Silicon** (M1/M2/M3/M4)\n3. **Apple Intelligence** enabled in System Settings \u003e Apple Intelligence \u0026 Siri\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eOllama not working\u003c/strong\u003e\u003c/summary\u003e\n\nVerify:\n1. Ollama installed: [ollama.com](https://ollama.com)\n2. Model pulled: `ollama pull gemma3:4b-it-qat`\n3. Server running: `ollama serve`\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eLM Studio not working\u003c/strong\u003e\u003c/summary\u003e\n\nVerify:\n1. LM Studio installed: [lmstudio.ai](https://lmstudio.ai)\n2. A model is downloaded and loaded\n3. **Local server is running** (most common issue): Developer tab \u003e Start Server\n4. Confirm with: `curl http://localhost:1234/v1/models`\n\nLoading a model does **not** start the server.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eSlow first transcription\u003c/strong\u003e\u003c/summary\u003e\n\n`setup.sh` pre-downloads and warms the built-in local models used by transcription and TTS. It does not pull Ollama or LM Studio models for you. Skip setup and the first transcription loads them on demand. After that, loaded from disk.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eEmpty transcription\u003c/strong\u003e\u003c/summary\u003e\n\n- Speak clearly, close to the microphone\n- Check microphone permissions in System Settings\n- Confirm the correct input device is selected\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eOverlay not showing\u003c/strong\u003e\u003c/summary\u003e\n\nCheck `show_overlay = true` in `~/.whisper/config.toml`.\n\n\u003c/details\u003e\n\n---\n\n## Development\n\n```bash\npython3 -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -e .\n\nwh build              # Build Swift UI (one-time)\nwh                    # Run the service\npython tests/test_flow.py  # Run tests (requires a grammar backend)\n```\n\n### Adding an Engine or Grammar Backend\n\nEngines: implement `TranscriptionEngine` in `engines/`, register in `ENGINE_REGISTRY`.\nGrammar backends: implement `GrammarBackend` in `backends/`, register in `BACKEND_REGISTRY`.\n\nMenu, CLI, and Settings auto-generate from the registries.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eProject structure\u003c/strong\u003e\u003c/summary\u003e\n\n```\nlocal-whisper/\n├── pyproject.toml\n├── setup.sh\n├── tests/\n│   ├── test_flow.py\n│   └── fixtures/\n├── LocalWhisperUI/                  # Swift UI app\n│   ├── Package.swift\n│   └── Sources/LocalWhisperUI/\n│       ├── AppMain.swift            # @main entry point\n│       ├── AppState.swift           # Observable state, IPC handler\n│       ├── IPCClient.swift          # Unix socket client\n│       ├── IPCMessages.swift        # Codable message types\n│       ├── MenuBarView.swift        # Menu bar dropdown\n│       ├── OverlayWindowController.swift\n│       ├── OverlayView.swift        # Floating pill overlay\n│       ├── GeneralSettingsView.swift\n│       ├── AdvancedSettingsView.swift       # struct shell + @State + body\n│       ├── AdvancedSettingsView+Audio.swift\n│       ├── AdvancedSettingsView+Transcription.swift\n│       ├── AdvancedSettingsView+Grammar.swift\n│       ├── AdvancedSettingsView+IO.swift\n│       ├── SettingsView.swift\n│       ├── SharedViews.swift\n│       ├── AboutView.swift\n│       └── Constants.swift\n└── src/whisper_voice/\n    ├── app.py              # App class + service_main (imports mixins)\n    ├── app_ipc.py          # IPCMixin: IPC send/receive\n    ├── app_recording.py    # RecordingMixin: keyboard + recording lifecycle\n    ├── app_pipeline.py     # PipelineMixin: transcription pipeline\n    ├── app_commands.py     # CommandsMixin: CLI command handlers\n    ├── app_switching.py    # SwitchingMixin: engine/backend switching\n    ├── cli/                # CLI package (wh)\n    │   ├── constants.py    # Colors, path constants\n    │   ├── lifecycle.py    # start/stop/status\n    │   ├── build.py        # Swift UI build, restart\n    │   ├── settings.py     # engine/backend/replace commands\n    │   ├── editor.py       # Interactive config TUI\n    │   ├── client.py       # whisper/listen/transcribe socket client\n    │   ├── doctor.py       # wh doctor + wh update\n    │   └── main.py         # help, version, cli_main dispatcher\n    ├── config/             # Config package\n    │   ├── schema.py       # Dataclasses + DEFAULT_CONFIG\n    │   ├── loader.py       # load_config, get_config, singleton\n    │   ├── toml_helpers.py # _find/_replace_in_section, _serialize_toml_value\n    │   └── mutations.py    # add/remove_replacement, update_config_field\n    ├── ipc_server.py       # IPC server (Swift UI)\n    ├── cmd_server.py       # Command server (CLI)\n    ├── audio.py            # Recording and pre-buffer\n    ├── audio_processor.py  # VAD, noise reduction, normalization\n    ├── backup.py           # History persistence\n    ├── grammar.py          # Grammar backend factory\n    ├── transcriber.py      # Engine routing\n    ├── utils.py            # Helpers\n    ├── shortcuts.py        # Text transformation shortcuts\n    ├── key_interceptor.py  # CGEvent tap\n    ├── tts_processor.py    # TTS shortcut handler\n    ├── tts/\n    │   ├── base.py         # TTSProvider base\n    │   └── kokoro_tts.py   # Kokoro provider (MLX)\n    ├── engines/\n    │   ├── base.py         # TranscriptionEngine base\n    │   ├── qwen3_asr.py    # Qwen3-ASR (MLX)\n    │   └── whisperkit.py   # WhisperKit (localhost)\n    └── backends/\n        ├── base.py         # Backend base\n        ├── modes.py        # Transformation modes\n        ├── ollama/\n        ├── lm_studio/\n        └── apple_intelligence/\n```\n\nData stored in `~/.whisper/`:\n```\n~/.whisper/\n├── config.toml             # Settings\n├── ipc.sock                # Python/Swift IPC\n├── cmd.sock                # CLI commands\n├── LocalWhisperUI.app      # Swift UI (built by setup.sh)\n├── last_recording.wav\n├── last_raw.txt            # Before grammar\n├── last_transcription.txt  # Final text\n├── audio_history/\n├── history/                # Last 100 transcriptions\n└── models/                 # Qwen3-ASR, Kokoro TTS\n```\n\n\u003c/details\u003e\n\n---\n\n## Credits\n\n[qwen3-asr-mlx](https://github.com/gabrimatic/qwen3-asr-mlx) (MLX port of Qwen3-ASR) · [kokoro-mlx](https://github.com/gabrimatic/kokoro-mlx) (MLX port of Kokoro-82M) · [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) by [Qwen Team](https://qwen.ai) · [Kokoro-82M](https://github.com/remsky/Kokoro-FastAPI) · [WhisperKit](https://github.com/argmaxinc/WhisperKit) by [Argmax](https://www.argmaxinc.com) · [Apple Intelligence](https://www.apple.com/apple-intelligence/) · [Apple FM SDK](https://github.com/apple/python-apple-fm-sdk) · [Ollama](https://ollama.com) · [LM Studio](https://lmstudio.ai) · [SwiftUI](https://developer.apple.com/swiftui/)\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eLegal notices\u003c/strong\u003e\u003c/summary\u003e\n\n### Trademarks\n\n\"Whisper\" is a trademark of OpenAI. \"Apple Intelligence\" is a trademark of Apple Inc. \"WhisperKit\" is a trademark of Argmax, Inc. \"Qwen\" is a trademark of Alibaba Cloud. \"Ollama\" and \"LM Studio\" are trademarks of their respective owners.\n\nThis project is not affiliated with, endorsed by, or sponsored by OpenAI, Apple, Argmax, Alibaba Cloud, or any other trademark holder. All trademark names are used solely to describe compatibility with their respective technologies.\n\n### Third-Party Licenses\n\nThis project depends on [pynput](https://github.com/moses-palmer/pynput), licensed under LGPL-3.0. When installed via pip (the default), pynput is dynamically linked and fully compatible with this project's MIT license.\n\nAll other dependencies use MIT, BSD, or Apache 2.0 licenses. See each package for details.\n\n\u003c/details\u003e\n\n## License\n\nMIT License. See [LICENSE](LICENSE) for details.\n\n---\n\nCreated by [Soroush Yousefpour](https://gabrimatic.info)\n\n[![\"Buy Me A Coffee\"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/gabrimatic)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabrimatic%2Flocal-whisper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgabrimatic%2Flocal-whisper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabrimatic%2Flocal-whisper/lists"}