{"id":50376955,"url":"https://github.com/sergekruf/voicevoice","last_synced_at":"2026-05-30T10:01:50.303Z","repository":{"id":359499499,"uuid":"1245702828","full_name":"sergekruf/voicevoice","owner":"sergekruf","description":"Local Whisper-based voice dictation for macOS. Hold Fn, talk, release — text appears in any field. ANE inference, no cloud, adaptive dictionary.","archived":false,"fork":false,"pushed_at":"2026-05-22T07:35:39.000Z","size":496,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T14:31:02.970Z","etag":null,"topics":["accessibility","apple-silicon","dictation","local-first","macos","open-source","privacy","speech-to-text","swift","swiftui","whisper","whisperkit"],"latest_commit_sha":null,"homepage":"https://voicevoice.vectrolab.ru","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sergekruf.png","metadata":{"files":{"readme":"README.en.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-21T13:25:26.000Z","updated_at":"2026-05-22T07:35:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sergekruf/voicevoice","commit_stats":null,"previous_names":["sergekruf/voicevoice"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/sergekruf/voicevoice","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergekruf%2Fvoicevoice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergekruf%2Fvoicevoice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergekruf%2Fvoicevoice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergekruf%2Fvoicevoice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sergekruf","download_url":"https://codeload.github.com/sergekruf/voicevoice/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sergekruf%2Fvoicevoice/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33687722,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","apple-silicon","dictation","local-first","macos","open-source","privacy","speech-to-text","swift","swiftui","whisper","whisperkit"],"created_at":"2026-05-30T10:01:49.317Z","updated_at":"2026-05-30T10:01:50.297Z","avatar_url":"https://github.com/sergekruf.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VoiceVoice\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![Release](https://img.shields.io/github/v/release/sergekruf/voicevoice)](https://github.com/sergekruf/voicevoice/releases/latest)\n[![Platform: macOS 13+](https://img.shields.io/badge/macOS-13%2B-black?logo=apple)](https://www.apple.com/macos/)\n[![Apple Silicon](https://img.shields.io/badge/Apple%20Silicon-required-orange?logo=apple)](#requirements)\n[![Downloads](https://img.shields.io/github/downloads/sergekruf/voicevoice/total?label=downloads)](https://github.com/sergekruf/voicevoice/releases)\n\n🇷🇺 [Читать на русском](README.md)\n\n**Voice dictation for macOS with local Whisper.** Hold `Fn`, talk, release — the text appears in any active input field. Recognition runs entirely on your machine via the Apple Neural Engine — not a single phrase leaves your computer.\n\nLanding: [voicevoice.vectrolab.ru](https://voicevoice.vectrolab.ru) · Pre-built `.dmg` available\n\n## Features\n\n- **Hotkey-driven dictation** — `Fn` (default), right `⌥ Option`, or `Caps Lock`. Hold → talk → release → text in your field.\n- **Local Whisper** (`large-v3-turbo`, 4-bit quantized, ~632 MB) via [WhisperKit](https://github.com/argmaxinc/WhisperKit). Inference on the Apple Neural Engine, ~10× faster than real-time on M4.\n- **Adaptive dictionary** — for ~5 minutes after a successful paste, VoiceVoice watches the focused field. If you correct the recognized text, it remembers `wrong → right` pairs and auto-applies them on subsequent dictations.\n- **Fuzzy matching** with configurable threshold — a `клод код → Claude Code` rule also fires on `клот кот`, `клоуд код`, etc.\n- **Edit \u0026 Learn** for apps where Accessibility can't read field contents (Bitrix24, Max, Slack, Termius…) — one-click manual correction from the HUD.\n- **Three-tier paste**: CGEvent ⌘V → AppleScript → AXUIElement direct write. Text reaches anywhere — Notes, Safari, Telegram, Termius, Slack, VS Code, Cursor, Claude Desktop, Max, Bitrix24…\n- **TransientType marker** for clipboard managers (Maccy / Paste / PasteNow / Raycast) — our temporary clipboard writes don't pollute your history.\n- **Number normalization** — `«один миллион четыреста двадцать пять»` → `1 425 689`, extra spaces and periods stripped.\n- **Auto-emoji** (optional) — appends one contextual emoji on trigger words: «спасибо» → 🙏, «поздравляю» → 🎉, «хаха» → 😄, etc.\n- **Result HUD** + history of last 200 transcriptions + searchable dictionary.\n- **Quiet mode** — hide all popups / toasts while keeping the recording indicator visible. Great for screencasts.\n- **Privacy-by-default** — zero telemetry, zero cloud, sandbox-compatible, ad-hoc signed with a stable identity (TCC permissions survive rebuilds).\n\n## Requirements\n\n- macOS **13 Ventura** or newer (14+ recommended)\n- Apple Silicon (M1 / M2 / M3 / M4 / M5) — on Intel Macs Whisper falls back to CPU and runs 5–10× slower, making interactive dictation impractical\n- Xcode 15+ (only if building from source)\n- Microphone + Accessibility permissions (requested on first launch)\n\n## Installation\n\n### Pre-built .dmg\n\nEasiest path — download from the landing: [voicevoice.vectrolab.ru](https://voicevoice.vectrolab.ru) or [latest GitHub release](https://github.com/sergekruf/voicevoice/releases/latest).\n\n### Build from source\n\n```bash\ngit clone https://github.com/sergekruf/voicevoice.git\ncd voicevoice\n./setup-signing.sh    # one-time: creates a stable self-signed identity so TCC permissions persist across rebuilds\n./build-app.sh        # builds the SwiftPM target → .app bundle → signs\nopen build/VoiceVoice.app\n```\n\nOr via Xcode: `open Package.swift`, wait for WhisperKit + GRDB resolution, hit ▶︎ Run.\n\n## First launch\n\n1. Onboarding window appears. Grant:\n   - **Microphone** — click \"Request access\".\n   - **Accessibility** — needed to globally hear `Fn` and emulate `⌘V`. macOS opens System Settings → Privacy \u0026 Security → Accessibility; manually toggle VoiceVoice on.\n2. **Disable system dictation:** System Settings → Keyboard → Dictation → off. Otherwise macOS's overlay intercepts `Fn` on top of ours.\n3. On first launch WhisperKit downloads the `large-v3-turbo` model (~632 MB) to `~/Library/Application Support/VoiceVoice/models/`. Progress shows in the menu bar.\n\n## Usage\n\n1. Put the cursor in any text field.\n2. **Hold Fn** → the \"Recording…\" indicator appears.\n3. Speak. You can dictate punctuation explicitly («запятая», «точка», «вопросительный знак») — Whisper places them reasonably well on its own.\n4. **Release Fn** → after ~0.5–1 s (on M4) the text appears in the field.\n5. If something was misrecognized — the auto-dictionary picks up your manual fix if you correct it within 5 minutes. For apps without AX support — click \"Edit \u0026 Learn\" in the HUD.\n\n## Where data lives\n\n```\n~/Library/Application Support/VoiceVoice/\n├── data.db           # SQLite (GRDB): dictionary + history\n└── models/           # WhisperKit CoreML models\n```\n\nWipe everything:\n```bash\nrm -rf \"$HOME/Library/Application Support/VoiceVoice\"\n```\n\n## Project layout\n\n```\nvoicevoice/\n├── Package.swift                 # SwiftPM manifest (WhisperKit, GRDB)\n├── build-app.sh                  # build .app bundle from CLI\n├── make-dmg.sh                   # build installer .dmg\n├── setup-signing.sh              # create self-signed identity\n└── Sources/VoiceVoice/\n    ├── VoiceVoiceApp.swift       # @main, MenuBarExtra\n    ├── Resources/                # Info.plist, entitlements\n    ├── Models/                   # AppSettings, CorrectionEntry, TranscriptionRecord\n    ├── Storage/                  # GRDB Database, CorrectionStore, HistoryStore\n    ├── Services/\n    │   ├── AudioRecorder.swift   # AVAudioEngine 16 kHz mono\n    │   ├── Transcriber.swift     # WhisperKit wrapper\n    │   ├── HotkeyMonitor.swift   # global CGEvent tap\n    │   ├── TextInserter.swift    # three-tier paste + TransientType marker\n    │   ├── TextChangeWatcher.swift # auto-dictionary via AX polling + BFS\n    │   ├── ClipboardSnapshot.swift # NSPasteboard snapshot / restore\n    │   ├── NumberNormalizer.swift\n    │   ├── EmojiEnhancer.swift   # auto-emoji\n    │   ├── Tokenizer.swift       # Unicode word/non-word tokens\n    │   ├── DiffEngine.swift      # token-level LCS diff\n    │   ├── CorrectionApplier.swift # apply dictionary (exact + fuzzy)\n    │   └── AppController.swift   # orchestrator\n    └── Views/\n        ├── SettingsView.swift    # settings + HelpHint (`?` tooltips)\n        ├── ResultHUD.swift       # post-recognition HUD\n        ├── EditAndLearnWindow.swift\n        ├── MenuBarContent.swift\n        ├── HistoryView.swift\n        ├── DictionaryView.swift\n        ├── OnboardingView.swift\n        └── WindowOpener.swift\n```\n\n## Known limitations\n\n- In apps with empty / incomplete AX trees (Bitrix24 as a CEF app without AX, Max on Qt) **auto-learn is unavailable** — there's nothing to poll. Paste via ⌘V still works, and the HUD shows an Edit \u0026 Learn button for manual corrections.\n- On external USB keyboards, `Fn` sometimes doesn't generate a modifier event — switch to right `⌥ Option` or `Caps Lock` in settings.\n- When macOS system dictation is active, its overlay intercepts `Fn` — disable it (see onboarding).\n\n## Stack\n\n- **Swift 6** / **SwiftUI** / **AppKit** (MenuBarExtra, NSPanel, AXUIElement)\n- [**WhisperKit**](https://github.com/argmaxinc/WhisperKit) (CoreML + ANE)\n- [**GRDB**](https://github.com/groue/GRDB.swift) (SQLite wrapper)\n\n## Contributing\n\nIssues and PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md). Code style: Swift API Design Guidelines.\n\n## License\n\nMIT — see [LICENSE](LICENSE). WhisperKit and GRDB are also MIT.\n\n---\n\nBuilt at [VectroLab](https://vectrolab.ru) · Yekaterinburg\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsergekruf%2Fvoicevoice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsergekruf%2Fvoicevoice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsergekruf%2Fvoicevoice/lists"}