https://github.com/mwmdev/dictr
Push-to-talk voice dictation for Linux / X11
https://github.com/mwmdev/dictr
cli dictation dictation-tool linux local push-to-talk rust speech-to-text stt whisper x11
Last synced: about 18 hours ago
JSON representation
Push-to-talk voice dictation for Linux / X11
- Host: GitHub
- URL: https://github.com/mwmdev/dictr
- Owner: mwmdev
- License: apache-2.0
- Created: 2026-02-15T21:50:07.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-06-10T21:11:05.000Z (about 19 hours ago)
- Last Synced: 2026-06-10T22:09:00.472Z (about 18 hours ago)
- Topics: cli, dictation, dictation-tool, linux, local, push-to-talk, rust, speech-to-text, stt, whisper, x11
- Language: Rust
- Homepage:
- Size: 153 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE-APACHE
Awesome Lists containing this project
README
# dictr
[](https://github.com/mwmdev/dictr/actions/workflows/ci.yml)
[](https://crates.io/crates/dictr)
[](LICENSE-MIT)
Push-to-talk voice dictation for Linux.
Single binary - Private - Fast - Customizable
## Features
- **Push-to-talk** — hold a hotkey to record, release to transcribe and paste
- **Local inference** — runs [Whisper](https://github.com/ggerganov/whisper.cpp) locally, your audio never leaves your machine
- **CUDA GPU acceleration** — optional NVIDIA GPU support for sub-second transcription
- **OpenAI API fallback** — use the OpenAI Whisper API as an alternative backend
- **Text replacements** — custom post-processing rules for text replacement
- **File transcription** — transcribe audio files directly via `--file` (any format ffmpeg supports)
## Usage
```sh
dictr # Default: AltGr hotkey, local whisper, safe clipboard paste
dictr --hotkey F9 # Use F9 instead of AltGr
dictr --backend api # Use OpenAI Whisper API (requires OPENAI_API_KEY)
dictr --api-url http://... # Custom API endpoint
dictr --model /path/to/model # Specific model file
dictr --paste # Force clipboard paste output
dictr --type # Use xdotool typing instead of paste
dictr --device AT2020 # Select mic by name substring
dictr --list-devices # List available input devices
dictr --language fr # Transcribe in French
dictr --initial-prompt '...' # Guide transcription with context
dictr --min-duration 500 # Min recording duration in ms (default: 300)
dictr --file recording.ogg # Transcribe an audio file (requires ffmpeg)
dictr --verbose # Debug output
```
## Install
### Interactive installer
```sh
curl -fsSL https://raw.githubusercontent.com/mwmdev/dictr/main/install.sh | sh
```
### Cargo
```sh
cargo install dictr
```
Then download a [Whisper model](https://huggingface.co/ggerganov/whisper.cpp/tree/main) to `~/.local/share/dictr/models/`.
### NixOS
From a source checkout:
```sh
nix profile add .#dictr-cpu # CPU build
nvidia-smi --query-gpu=compute_cap --format=csv,noheader
nix profile add .#dictr-cuda-sm120 # CUDA build for compute capability 12.0
```
CUDA packages are named `dictr-cuda-smXX`, where `XX` is the compute capability
without the dot. For example, compute capability `8.9` uses `dictr-cuda-sm89`.
Run `nix flake show` to list supported CUDA targets. The CUDA package keeps the
CUDA runtime libraries in the Nix closure, so it does not need `LD_LIBRARY_PATH`
wrappers.
### Build from source
Requires Linux with X11, `xdotool`, `xclip`, ALSA or PipeWire. Optional: `ffmpeg` (for `--file`). Build deps: `cmake`, `clang`, `pkg-config`, `libasound2-dev`, `libx11-dev`, `libxi-dev`, `libxtst-dev`, `libxrandr-dev`, `libssl-dev`. For CUDA: NVIDIA CUDA toolkit.
```sh
cargo build --release # CPU only
cargo build --release --features cuda # With GPU
```
On NixOS, use `nix-shell --run "cargo build --release"` for CPU builds, or
`nix-shell --argstr cudaArch 120 --run "cargo build --release --features cuda"`
for CUDA builds.
## Configuration
`~/.config/dictr/config.toml`:
```toml
hotkey = "AltGr" # Supported hotkeys: AltGr, Alt, Ctrl, RCtrl, Shift, RShift, Super, CapsLock, Space, Escape, F1-F12
backend = "local" # "local" or "api"
model_path = "~/.local/share/dictr/models/ggml-base.bin"
api_key = "" # or set OPENAI_API_KEY env var
api_url = "https://api.openai.com/v1/audio/transcriptions"
output_mode = "paste" # "paste" or "type"; paste is layout-safe
typing_delay_ms = 2
min_duration_ms = 300
device = "AT2020USB+"
language = "en"
initial_prompt = "commit, readme, build, test, deploy, refactor" # Guide transcription with context (e.g. expected words, domain-specific terms)
[replacements]
"slash " = "/"
"new line" = "\n"
```
### Text replacements
The `[replacements]` table performs substitution on transcription output. Useful for special cases like "slash" → "/" or "new line" → "\n". Keys are replaced with their corresponding values in the final transcribed text.
### Text insertion
The default `output_mode = "paste"` inserts text through the clipboard and then
restores the previous clipboard contents. This avoids keyboard layout issues
with simulated typing, such as QWERTY/AZERTY `a`/`q` and `w`/`z` swaps. Use
`--type` or `output_mode = "type"` only if you specifically need the older
`xdotool type` behavior.
## License
Licensed under either of [MIT](LICENSE-MIT) or [Apache-2.0](LICENSE-APACHE) at your option.