https://github.com/sergekruf/voicevoice
Local Whisper-based voice dictation for macOS. Hold Fn, talk, release β text appears in any field. ANE inference, no cloud, adaptive dictionary.
https://github.com/sergekruf/voicevoice
accessibility apple-silicon dictation local-first macos open-source privacy speech-to-text swift swiftui whisper whisperkit
Last synced: about 1 month ago
JSON representation
Local Whisper-based voice dictation for macOS. Hold Fn, talk, release β text appears in any field. ANE inference, no cloud, adaptive dictionary.
- Host: GitHub
- URL: https://github.com/sergekruf/voicevoice
- Owner: sergekruf
- License: mit
- Created: 2026-05-21T13:25:26.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-22T07:35:39.000Z (about 1 month ago)
- Last Synced: 2026-05-22T14:31:02.970Z (about 1 month ago)
- Topics: accessibility, apple-silicon, dictation, local-first, macos, open-source, privacy, speech-to-text, swift, swiftui, whisper, whisperkit
- Language: Swift
- Homepage: https://voicevoice.vectrolab.ru
- Size: 484 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.en.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# VoiceVoice
[](LICENSE)
[](https://github.com/sergekruf/voicevoice/releases/latest)
[](https://www.apple.com/macos/)
[](#requirements)
[](https://github.com/sergekruf/voicevoice/releases)
π·πΊ [Π§ΠΈΡΠ°ΡΡ Π½Π° ΡΡΡΡΠΊΠΎΠΌ](README.md)
**Voice dictation for macOS with local Whisper.** Hold `Fn`, talk, release β the text appears in any active input field. Recognition runs entirely on your machine via the Apple Neural Engine β not a single phrase leaves your computer.
Landing: [voicevoice.vectrolab.ru](https://voicevoice.vectrolab.ru) Β· Pre-built `.dmg` available
## Features
- **Hotkey-driven dictation** β `Fn` (default), right `β₯ Option`, or `Caps Lock`. Hold β talk β release β text in your field.
- **Local Whisper** (`large-v3-turbo`, 4-bit quantized, ~632 MB) via [WhisperKit](https://github.com/argmaxinc/WhisperKit). Inference on the Apple Neural Engine, ~10Γ faster than real-time on M4.
- **Adaptive dictionary** β for ~5 minutes after a successful paste, VoiceVoice watches the focused field. If you correct the recognized text, it remembers `wrong β right` pairs and auto-applies them on subsequent dictations.
- **Fuzzy matching** with configurable threshold β a `ΠΊΠ»ΠΎΠ΄ ΠΊΠΎΠ΄ β Claude Code` rule also fires on `ΠΊΠ»ΠΎΡ ΠΊΠΎΡ`, `ΠΊΠ»ΠΎΡΠ΄ ΠΊΠΎΠ΄`, etc.
- **Edit & Learn** for apps where Accessibility can't read field contents (Bitrix24, Max, Slack, Termiusβ¦) β one-click manual correction from the HUD.
- **Three-tier paste**: CGEvent βV β AppleScript β AXUIElement direct write. Text reaches anywhere β Notes, Safari, Telegram, Termius, Slack, VS Code, Cursor, Claude Desktop, Max, Bitrix24β¦
- **TransientType marker** for clipboard managers (Maccy / Paste / PasteNow / Raycast) β our temporary clipboard writes don't pollute your history.
- **Number normalization** β `Β«ΠΎΠ΄ΠΈΠ½ ΠΌΠΈΠ»Π»ΠΈΠΎΠ½ ΡΠ΅ΡΡΡΠ΅ΡΡΠ° Π΄Π²Π°Π΄ΡΠ°ΡΡ ΠΏΡΡΡΒ»` β `1 425 689`, extra spaces and periods stripped.
- **Auto-emoji** (optional) β appends one contextual emoji on trigger words: Β«ΡΠΏΠ°ΡΠΈΠ±ΠΎΒ» β π, Β«ΠΏΠΎΠ·Π΄ΡΠ°Π²Π»ΡΡΒ» β π, Β«Ρ
Π°Ρ
Π°Β» β π, etc.
- **Result HUD** + history of last 200 transcriptions + searchable dictionary.
- **Quiet mode** β hide all popups / toasts while keeping the recording indicator visible. Great for screencasts.
- **Privacy-by-default** β zero telemetry, zero cloud, sandbox-compatible, ad-hoc signed with a stable identity (TCC permissions survive rebuilds).
## Requirements
- macOS **13 Ventura** or newer (14+ recommended)
- Apple Silicon (M1 / M2 / M3 / M4 / M5) β on Intel Macs Whisper falls back to CPU and runs 5β10Γ slower, making interactive dictation impractical
- Xcode 15+ (only if building from source)
- Microphone + Accessibility permissions (requested on first launch)
## Installation
### Pre-built .dmg
Easiest path β download from the landing: [voicevoice.vectrolab.ru](https://voicevoice.vectrolab.ru) or [latest GitHub release](https://github.com/sergekruf/voicevoice/releases/latest).
### Build from source
```bash
git clone https://github.com/sergekruf/voicevoice.git
cd voicevoice
./setup-signing.sh # one-time: creates a stable self-signed identity so TCC permissions persist across rebuilds
./build-app.sh # builds the SwiftPM target β .app bundle β signs
open build/VoiceVoice.app
```
Or via Xcode: `open Package.swift`, wait for WhisperKit + GRDB resolution, hit βΆοΈ Run.
## First launch
1. Onboarding window appears. Grant:
- **Microphone** β click "Request access".
- **Accessibility** β needed to globally hear `Fn` and emulate `βV`. macOS opens System Settings β Privacy & Security β Accessibility; manually toggle VoiceVoice on.
2. **Disable system dictation:** System Settings β Keyboard β Dictation β off. Otherwise macOS's overlay intercepts `Fn` on top of ours.
3. On first launch WhisperKit downloads the `large-v3-turbo` model (~632 MB) to `~/Library/Application Support/VoiceVoice/models/`. Progress shows in the menu bar.
## Usage
1. Put the cursor in any text field.
2. **Hold Fn** β the "Recordingβ¦" indicator appears.
3. Speak. You can dictate punctuation explicitly (Β«Π·Π°ΠΏΡΡΠ°ΡΒ», Β«ΡΠΎΡΠΊΠ°Β», Β«Π²ΠΎΠΏΡΠΎΡΠΈΡΠ΅Π»ΡΠ½ΡΠΉ Π·Π½Π°ΠΊΒ») β Whisper places them reasonably well on its own.
4. **Release Fn** β after ~0.5β1 s (on M4) the text appears in the field.
5. If something was misrecognized β the auto-dictionary picks up your manual fix if you correct it within 5 minutes. For apps without AX support β click "Edit & Learn" in the HUD.
## Where data lives
```
~/Library/Application Support/VoiceVoice/
βββ data.db # SQLite (GRDB): dictionary + history
βββ models/ # WhisperKit CoreML models
```
Wipe everything:
```bash
rm -rf "$HOME/Library/Application Support/VoiceVoice"
```
## Project layout
```
voicevoice/
βββ Package.swift # SwiftPM manifest (WhisperKit, GRDB)
βββ build-app.sh # build .app bundle from CLI
βββ make-dmg.sh # build installer .dmg
βββ setup-signing.sh # create self-signed identity
βββ Sources/VoiceVoice/
βββ VoiceVoiceApp.swift # @main, MenuBarExtra
βββ Resources/ # Info.plist, entitlements
βββ Models/ # AppSettings, CorrectionEntry, TranscriptionRecord
βββ Storage/ # GRDB Database, CorrectionStore, HistoryStore
βββ Services/
β βββ AudioRecorder.swift # AVAudioEngine 16 kHz mono
β βββ Transcriber.swift # WhisperKit wrapper
β βββ HotkeyMonitor.swift # global CGEvent tap
β βββ TextInserter.swift # three-tier paste + TransientType marker
β βββ TextChangeWatcher.swift # auto-dictionary via AX polling + BFS
β βββ ClipboardSnapshot.swift # NSPasteboard snapshot / restore
β βββ NumberNormalizer.swift
β βββ EmojiEnhancer.swift # auto-emoji
β βββ Tokenizer.swift # Unicode word/non-word tokens
β βββ DiffEngine.swift # token-level LCS diff
β βββ CorrectionApplier.swift # apply dictionary (exact + fuzzy)
β βββ AppController.swift # orchestrator
βββ Views/
βββ SettingsView.swift # settings + HelpHint (`?` tooltips)
βββ ResultHUD.swift # post-recognition HUD
βββ EditAndLearnWindow.swift
βββ MenuBarContent.swift
βββ HistoryView.swift
βββ DictionaryView.swift
βββ OnboardingView.swift
βββ WindowOpener.swift
```
## Known limitations
- In apps with empty / incomplete AX trees (Bitrix24 as a CEF app without AX, Max on Qt) **auto-learn is unavailable** β there's nothing to poll. Paste via βV still works, and the HUD shows an Edit & Learn button for manual corrections.
- On external USB keyboards, `Fn` sometimes doesn't generate a modifier event β switch to right `β₯ Option` or `Caps Lock` in settings.
- When macOS system dictation is active, its overlay intercepts `Fn` β disable it (see onboarding).
## Stack
- **Swift 6** / **SwiftUI** / **AppKit** (MenuBarExtra, NSPanel, AXUIElement)
- [**WhisperKit**](https://github.com/argmaxinc/WhisperKit) (CoreML + ANE)
- [**GRDB**](https://github.com/groue/GRDB.swift) (SQLite wrapper)
## Contributing
Issues and PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md). Code style: Swift API Design Guidelines.
## License
MIT β see [LICENSE](LICENSE). WhisperKit and GRDB are also MIT.
---
Built at [VectroLab](https://vectrolab.ru) Β· Yekaterinburg