An open API service indexing awesome lists of open source software.

https://github.com/sergekruf/voicevoice

Local Whisper-based voice dictation for macOS. Hold Fn, talk, release β€” text appears in any field. ANE inference, no cloud, adaptive dictionary.
https://github.com/sergekruf/voicevoice

accessibility apple-silicon dictation local-first macos open-source privacy speech-to-text swift swiftui whisper whisperkit

Last synced: about 1 month ago
JSON representation

Local Whisper-based voice dictation for macOS. Hold Fn, talk, release β€” text appears in any field. ANE inference, no cloud, adaptive dictionary.

Awesome Lists containing this project

README

          

# VoiceVoice

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Release](https://img.shields.io/github/v/release/sergekruf/voicevoice)](https://github.com/sergekruf/voicevoice/releases/latest)
[![Platform: macOS 13+](https://img.shields.io/badge/macOS-13%2B-black?logo=apple)](https://www.apple.com/macos/)
[![Apple Silicon](https://img.shields.io/badge/Apple%20Silicon-required-orange?logo=apple)](#requirements)
[![Downloads](https://img.shields.io/github/downloads/sergekruf/voicevoice/total?label=downloads)](https://github.com/sergekruf/voicevoice/releases)

πŸ‡·πŸ‡Ί [Π§ΠΈΡ‚Π°Ρ‚ΡŒ Π½Π° русском](README.md)

**Voice dictation for macOS with local Whisper.** Hold `Fn`, talk, release β€” the text appears in any active input field. Recognition runs entirely on your machine via the Apple Neural Engine β€” not a single phrase leaves your computer.

Landing: [voicevoice.vectrolab.ru](https://voicevoice.vectrolab.ru) Β· Pre-built `.dmg` available

## Features

- **Hotkey-driven dictation** β€” `Fn` (default), right `βŒ₯ Option`, or `Caps Lock`. Hold β†’ talk β†’ release β†’ text in your field.
- **Local Whisper** (`large-v3-turbo`, 4-bit quantized, ~632 MB) via [WhisperKit](https://github.com/argmaxinc/WhisperKit). Inference on the Apple Neural Engine, ~10Γ— faster than real-time on M4.
- **Adaptive dictionary** β€” for ~5 minutes after a successful paste, VoiceVoice watches the focused field. If you correct the recognized text, it remembers `wrong β†’ right` pairs and auto-applies them on subsequent dictations.
- **Fuzzy matching** with configurable threshold β€” a `ΠΊΠ»ΠΎΠ΄ ΠΊΠΎΠ΄ β†’ Claude Code` rule also fires on `ΠΊΠ»ΠΎΡ‚ ΠΊΠΎΡ‚`, `ΠΊΠ»ΠΎΡƒΠ΄ ΠΊΠΎΠ΄`, etc.
- **Edit & Learn** for apps where Accessibility can't read field contents (Bitrix24, Max, Slack, Termius…) β€” one-click manual correction from the HUD.
- **Three-tier paste**: CGEvent ⌘V β†’ AppleScript β†’ AXUIElement direct write. Text reaches anywhere β€” Notes, Safari, Telegram, Termius, Slack, VS Code, Cursor, Claude Desktop, Max, Bitrix24…
- **TransientType marker** for clipboard managers (Maccy / Paste / PasteNow / Raycast) β€” our temporary clipboard writes don't pollute your history.
- **Number normalization** β€” `Β«ΠΎΠ΄ΠΈΠ½ ΠΌΠΈΠ»Π»ΠΈΠΎΠ½ чСтырСста Π΄Π²Π°Π΄Ρ†Π°Ρ‚ΡŒ ΠΏΡΡ‚ΡŒΒ»` β†’ `1 425 689`, extra spaces and periods stripped.
- **Auto-emoji** (optional) β€” appends one contextual emoji on trigger words: «спасибо» β†’ πŸ™, Β«ΠΏΠΎΠ·Π΄Ρ€Π°Π²Π»ΡΡŽΒ» β†’ πŸŽ‰, Β«Ρ…Π°Ρ…Π°Β» β†’ πŸ˜„, etc.
- **Result HUD** + history of last 200 transcriptions + searchable dictionary.
- **Quiet mode** β€” hide all popups / toasts while keeping the recording indicator visible. Great for screencasts.
- **Privacy-by-default** β€” zero telemetry, zero cloud, sandbox-compatible, ad-hoc signed with a stable identity (TCC permissions survive rebuilds).

## Requirements

- macOS **13 Ventura** or newer (14+ recommended)
- Apple Silicon (M1 / M2 / M3 / M4 / M5) β€” on Intel Macs Whisper falls back to CPU and runs 5–10Γ— slower, making interactive dictation impractical
- Xcode 15+ (only if building from source)
- Microphone + Accessibility permissions (requested on first launch)

## Installation

### Pre-built .dmg

Easiest path β€” download from the landing: [voicevoice.vectrolab.ru](https://voicevoice.vectrolab.ru) or [latest GitHub release](https://github.com/sergekruf/voicevoice/releases/latest).

### Build from source

```bash
git clone https://github.com/sergekruf/voicevoice.git
cd voicevoice
./setup-signing.sh # one-time: creates a stable self-signed identity so TCC permissions persist across rebuilds
./build-app.sh # builds the SwiftPM target β†’ .app bundle β†’ signs
open build/VoiceVoice.app
```

Or via Xcode: `open Package.swift`, wait for WhisperKit + GRDB resolution, hit β–ΆοΈŽ Run.

## First launch

1. Onboarding window appears. Grant:
- **Microphone** β€” click "Request access".
- **Accessibility** β€” needed to globally hear `Fn` and emulate `⌘V`. macOS opens System Settings β†’ Privacy & Security β†’ Accessibility; manually toggle VoiceVoice on.
2. **Disable system dictation:** System Settings β†’ Keyboard β†’ Dictation β†’ off. Otherwise macOS's overlay intercepts `Fn` on top of ours.
3. On first launch WhisperKit downloads the `large-v3-turbo` model (~632 MB) to `~/Library/Application Support/VoiceVoice/models/`. Progress shows in the menu bar.

## Usage

1. Put the cursor in any text field.
2. **Hold Fn** β†’ the "Recording…" indicator appears.
3. Speak. You can dictate punctuation explicitly («запятая», Β«Ρ‚ΠΎΡ‡ΠΊΠ°Β», Β«Π²ΠΎΠΏΡ€ΠΎΡΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΉ Π·Π½Π°ΠΊΒ») β€” Whisper places them reasonably well on its own.
4. **Release Fn** β†’ after ~0.5–1 s (on M4) the text appears in the field.
5. If something was misrecognized β€” the auto-dictionary picks up your manual fix if you correct it within 5 minutes. For apps without AX support β€” click "Edit & Learn" in the HUD.

## Where data lives

```
~/Library/Application Support/VoiceVoice/
β”œβ”€β”€ data.db # SQLite (GRDB): dictionary + history
└── models/ # WhisperKit CoreML models
```

Wipe everything:
```bash
rm -rf "$HOME/Library/Application Support/VoiceVoice"
```

## Project layout

```
voicevoice/
β”œβ”€β”€ Package.swift # SwiftPM manifest (WhisperKit, GRDB)
β”œβ”€β”€ build-app.sh # build .app bundle from CLI
β”œβ”€β”€ make-dmg.sh # build installer .dmg
β”œβ”€β”€ setup-signing.sh # create self-signed identity
└── Sources/VoiceVoice/
β”œβ”€β”€ VoiceVoiceApp.swift # @main, MenuBarExtra
β”œβ”€β”€ Resources/ # Info.plist, entitlements
β”œβ”€β”€ Models/ # AppSettings, CorrectionEntry, TranscriptionRecord
β”œβ”€β”€ Storage/ # GRDB Database, CorrectionStore, HistoryStore
β”œβ”€β”€ Services/
β”‚ β”œβ”€β”€ AudioRecorder.swift # AVAudioEngine 16 kHz mono
β”‚ β”œβ”€β”€ Transcriber.swift # WhisperKit wrapper
β”‚ β”œβ”€β”€ HotkeyMonitor.swift # global CGEvent tap
β”‚ β”œβ”€β”€ TextInserter.swift # three-tier paste + TransientType marker
β”‚ β”œβ”€β”€ TextChangeWatcher.swift # auto-dictionary via AX polling + BFS
β”‚ β”œβ”€β”€ ClipboardSnapshot.swift # NSPasteboard snapshot / restore
β”‚ β”œβ”€β”€ NumberNormalizer.swift
β”‚ β”œβ”€β”€ EmojiEnhancer.swift # auto-emoji
β”‚ β”œβ”€β”€ Tokenizer.swift # Unicode word/non-word tokens
β”‚ β”œβ”€β”€ DiffEngine.swift # token-level LCS diff
β”‚ β”œβ”€β”€ CorrectionApplier.swift # apply dictionary (exact + fuzzy)
β”‚ └── AppController.swift # orchestrator
└── Views/
β”œβ”€β”€ SettingsView.swift # settings + HelpHint (`?` tooltips)
β”œβ”€β”€ ResultHUD.swift # post-recognition HUD
β”œβ”€β”€ EditAndLearnWindow.swift
β”œβ”€β”€ MenuBarContent.swift
β”œβ”€β”€ HistoryView.swift
β”œβ”€β”€ DictionaryView.swift
β”œβ”€β”€ OnboardingView.swift
└── WindowOpener.swift
```

## Known limitations

- In apps with empty / incomplete AX trees (Bitrix24 as a CEF app without AX, Max on Qt) **auto-learn is unavailable** β€” there's nothing to poll. Paste via ⌘V still works, and the HUD shows an Edit & Learn button for manual corrections.
- On external USB keyboards, `Fn` sometimes doesn't generate a modifier event β€” switch to right `βŒ₯ Option` or `Caps Lock` in settings.
- When macOS system dictation is active, its overlay intercepts `Fn` β€” disable it (see onboarding).

## Stack

- **Swift 6** / **SwiftUI** / **AppKit** (MenuBarExtra, NSPanel, AXUIElement)
- [**WhisperKit**](https://github.com/argmaxinc/WhisperKit) (CoreML + ANE)
- [**GRDB**](https://github.com/groue/GRDB.swift) (SQLite wrapper)

## Contributing

Issues and PRs welcome. See [CONTRIBUTING.md](CONTRIBUTING.md). Code style: Swift API Design Guidelines.

## License

MIT β€” see [LICENSE](LICENSE). WhisperKit and GRDB are also MIT.

---

Built at [VectroLab](https://vectrolab.ru) Β· Yekaterinburg