https://github.com/jacopone/whisper-dictation

🎤 Privacy-first local speech-to-text dictation for NixOS - Whisper.cpp powered push-to-talk with real-time feedback
https://github.com/jacopone/whisper-dictation

dictation gnome nixos privacy speech-to-text voice-input wayland whisper

Last synced: 3 months ago
JSON representation

🎤 Privacy-first local speech-to-text dictation for NixOS - Whisper.cpp powered push-to-talk with real-time feedback

Host: GitHub
URL: https://github.com/jacopone/whisper-dictation
Owner: jacopone
License: mit
Created: 2025-10-01T20:36:33.000Z (9 months ago)
Default Branch: master
Last Pushed: 2026-02-15T23:35:26.000Z (5 months ago)
Last Synced: 2026-02-16T04:27:37.244Z (4 months ago)
Topics: dictation, gnome, nixos, privacy, speech-to-text, voice-input, wayland, whisper
Language: Python
Size: 64.5 KB
Stars: 6
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          # Whisper Dictation

Privacy-first local speech-to-text for NixOS -- whisper.cpp powered, push-to-talk, paste anywhere.

## Features

- **100% local and private** -- no cloud, no telemetry, works fully offline

- **Push-to-talk** -- hold Super+Period, speak, release to paste text

- **Real-time feedback** -- floating GTK4 window shows transcription status

- **Multilingual** -- supports 99 languages with auto-detection

- **Wayland native** -- built for GNOME on Wayland, works in any application

- **Optimized for technical speech** -- tuned for developer and AI workflows

## Requirements

- NixOS or any Linux distribution with Nix

- Wayland compositor (GNOME recommended)

- PulseAudio or PipeWire

- User must be in the `input` group for keyboard monitoring

## Installation

### NixOS (recommended)

Add to your `flake.nix`:

```nix

{

  inputs.whisper-dictation.url = "github:jacopone/whisper-dictation";

  # In your configuration

  environment.systemPackages = [

    inputs.whisper-dictation.packages.${system}.default

  ];

  # Enable auto-start

  systemd.user.services.whisper-dictation = {

    enable = true;

    wantedBy = [ "graphical-session.target" ];

  };

}

```

### Manual

```bash

git clone https://github.com/jacopone/whisper-dictation.git

cd whisper-dictation

nix develop

python -m whisper_dictation.daemon

```

**First-time setup:** ensure your user is in the `input` group (`sudo usermod -aG input $USER`, then log out and back in), download a Whisper model to `~/.local/share/whisper-models/`, and start the `ydotoold` daemon. See the [first-time setup section in DEVELOPMENT.md](DEVELOPMENT.md) for details.

## Usage

Start the daemon and dictate:

```bash

run-daemon          # use config file settings

run-daemon-en       # English only (fastest)

run-daemon-it       # Italian only

run-daemon-auto     # auto-detect language (adds ~1-2s)

```

Then in any application:

1. Click in a text field

2. Hold **Super+Period**

3. Speak naturally

4. Release the key -- text is pasted instantly

Override settings per-session with command-line flags:

```bash

python -m whisper_dictation.daemon --verbose --language auto --model base

```

## Configuration

Edit `~/.config/whisper-dictation/config.yaml`. Key settings:

- `whisper.model` -- model size: `tiny`, `base` (recommended), `small`, `medium`, `large`

- `whisper.language` -- language code (`en`, `it`, `auto`, etc.)

- `hotkey.key` / `hotkey.modifiers` -- push-to-talk keybinding

See `config.yaml` in the repository for all available options.

Model selection guide

| Model  | Size   | Speed    | Accuracy | Use Case                  |

|--------|--------|----------|----------|---------------------------|

| tiny   | 39 MB  | ~1-2s    | 60%      | Quick notes, testing      |

| base   | 142 MB | ~4-6s    | 70%      | Recommended for speed     |

| small  | 466 MB | ~10-15s  | 80%      | Balanced performance      |

| medium | 1.5 GB | ~20-30s  | 85%      | High accuracy             |

| large  | 2.9 GB | ~40-60s  | 90%      | Maximum accuracy          |

Times measured on CPU (4 threads). GPU acceleration can reduce times by 5-10x.

## How It Works

1. **Keyboard monitoring** -- `evdev` captures low-level key events

2. **Audio recording** -- `ffmpeg` records microphone input while the key is held

3. **Transcription** -- `whisper.cpp` processes audio locally on your machine

4. **Text insertion** -- `ydotool` pastes transcribed text into the active window

5. **UI feedback** -- GTK4 floating window shows real-time status

## Comparison

| Feature           | Whisper Dictation | Aqua Voice  | Talon Voice  |

|-------------------|-------------------|-------------|--------------|

| Privacy           | Local             | Cloud       | Local        |

| Cost              | Free              | $8/mo       | $15/mo       |

| NixOS support     | Native            | No          | Manual       |

| Technical terms   | 65-85%            | 97%         | 95%          |

| Wayland           | Yes               | Limited     | X11 only     |

| Real-time         | Yes               | Yes         | Yes          |

## Development

See [DEVELOPMENT.md](DEVELOPMENT.md) for the full development guide.

## Troubleshooting

See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for solutions to common issues (audio, keyboard detection, ydotool, hotkeys, performance).

## Contributing

Contributions welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

MIT License -- see [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jacopone/whisper-dictation

Awesome Lists containing this project

README