https://github.com/jacopone/whisper-dictation
🎤 Privacy-first local speech-to-text dictation for NixOS - Whisper.cpp powered push-to-talk with real-time feedback
https://github.com/jacopone/whisper-dictation
dictation gnome nixos privacy speech-to-text voice-input wayland whisper
Last synced: 3 months ago
JSON representation
🎤 Privacy-first local speech-to-text dictation for NixOS - Whisper.cpp powered push-to-talk with real-time feedback
- Host: GitHub
- URL: https://github.com/jacopone/whisper-dictation
- Owner: jacopone
- License: mit
- Created: 2025-10-01T20:36:33.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2026-02-15T23:35:26.000Z (5 months ago)
- Last Synced: 2026-02-16T04:27:37.244Z (4 months ago)
- Topics: dictation, gnome, nixos, privacy, speech-to-text, voice-input, wayland, whisper
- Language: Python
- Size: 64.5 KB
- Stars: 6
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Whisper Dictation
Privacy-first local speech-to-text for NixOS -- whisper.cpp powered, push-to-talk, paste anywhere.
## Features
- **100% local and private** -- no cloud, no telemetry, works fully offline
- **Push-to-talk** -- hold Super+Period, speak, release to paste text
- **Real-time feedback** -- floating GTK4 window shows transcription status
- **Multilingual** -- supports 99 languages with auto-detection
- **Wayland native** -- built for GNOME on Wayland, works in any application
- **Optimized for technical speech** -- tuned for developer and AI workflows
## Requirements
- NixOS or any Linux distribution with Nix
- Wayland compositor (GNOME recommended)
- PulseAudio or PipeWire
- User must be in the `input` group for keyboard monitoring
## Installation
### NixOS (recommended)
Add to your `flake.nix`:
```nix
{
inputs.whisper-dictation.url = "github:jacopone/whisper-dictation";
# In your configuration
environment.systemPackages = [
inputs.whisper-dictation.packages.${system}.default
];
# Enable auto-start
systemd.user.services.whisper-dictation = {
enable = true;
wantedBy = [ "graphical-session.target" ];
};
}
```
### Manual
```bash
git clone https://github.com/jacopone/whisper-dictation.git
cd whisper-dictation
nix develop
python -m whisper_dictation.daemon
```
**First-time setup:** ensure your user is in the `input` group (`sudo usermod -aG input $USER`, then log out and back in), download a Whisper model to `~/.local/share/whisper-models/`, and start the `ydotoold` daemon. See the [first-time setup section in DEVELOPMENT.md](DEVELOPMENT.md) for details.
## Usage
Start the daemon and dictate:
```bash
run-daemon # use config file settings
run-daemon-en # English only (fastest)
run-daemon-it # Italian only
run-daemon-auto # auto-detect language (adds ~1-2s)
```
Then in any application:
1. Click in a text field
2. Hold **Super+Period**
3. Speak naturally
4. Release the key -- text is pasted instantly
Override settings per-session with command-line flags:
```bash
python -m whisper_dictation.daemon --verbose --language auto --model base
```
## Configuration
Edit `~/.config/whisper-dictation/config.yaml`. Key settings:
- `whisper.model` -- model size: `tiny`, `base` (recommended), `small`, `medium`, `large`
- `whisper.language` -- language code (`en`, `it`, `auto`, etc.)
- `hotkey.key` / `hotkey.modifiers` -- push-to-talk keybinding
See `config.yaml` in the repository for all available options.
Model selection guide
| Model | Size | Speed | Accuracy | Use Case |
|--------|--------|----------|----------|---------------------------|
| tiny | 39 MB | ~1-2s | 60% | Quick notes, testing |
| base | 142 MB | ~4-6s | 70% | Recommended for speed |
| small | 466 MB | ~10-15s | 80% | Balanced performance |
| medium | 1.5 GB | ~20-30s | 85% | High accuracy |
| large | 2.9 GB | ~40-60s | 90% | Maximum accuracy |
Times measured on CPU (4 threads). GPU acceleration can reduce times by 5-10x.
## How It Works
1. **Keyboard monitoring** -- `evdev` captures low-level key events
2. **Audio recording** -- `ffmpeg` records microphone input while the key is held
3. **Transcription** -- `whisper.cpp` processes audio locally on your machine
4. **Text insertion** -- `ydotool` pastes transcribed text into the active window
5. **UI feedback** -- GTK4 floating window shows real-time status
## Comparison
| Feature | Whisper Dictation | Aqua Voice | Talon Voice |
|-------------------|-------------------|-------------|--------------|
| Privacy | Local | Cloud | Local |
| Cost | Free | $8/mo | $15/mo |
| NixOS support | Native | No | Manual |
| Technical terms | 65-85% | 97% | 95% |
| Wayland | Yes | Limited | X11 only |
| Real-time | Yes | Yes | Yes |
## Development
See [DEVELOPMENT.md](DEVELOPMENT.md) for the full development guide.
## Troubleshooting
See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for solutions to common issues (audio, keyboard detection, ydotool, hotkeys, performance).
## Contributing
Contributions welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## License
MIT License -- see [LICENSE](LICENSE).