https://github.com/mizrael/keryxis
Local, privacy-first speech-to-text tool that injects dictated text into any application.
https://github.com/mizrael/keryxis
ai local privacy rust speech-to-text whisper
Last synced: 3 months ago
JSON representation
Local, privacy-first speech-to-text tool that injects dictated text into any application.
- Host: GitHub
- URL: https://github.com/mizrael/keryxis
- Owner: mizrael
- Created: 2026-03-21T22:05:01.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-22T15:44:34.000Z (3 months ago)
- Last Synced: 2026-03-22T19:33:50.760Z (3 months ago)
- Topics: ai, local, privacy, rust, speech-to-text, whisper
- Language: Rust
- Homepage:
- Size: 252 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Keryxis
**The herald who proclaims your words** — a local, privacy-first speech-to-text tool that injects dictated text into any application.
Keryxis runs as a background daemon with a floating overlay, using OpenAI's Whisper model locally on your machine. No cloud APIs, no data leaves your computer.
## Features
- **Three activation modes:**
- **Press to talk** — press a hotkey to start/stop recording
- **Auto-stop** — press a hotkey, recording stops automatically when you go silent
- **Hands-free** — always listening for a configurable wake word (e.g., "hey terminal")
- **Works with any application** — injects text into whatever window is focused (terminal, editor, browser, etc.)
- **Multi-language** — supports 99 languages with priority-based detection (try English first, then Italian, etc.)
- **Background daemon** with floating overlay showing recording state and target application
- **Settings panel** — change mode, hotkey, wake word, model size, and languages from the overlay UI
- **Local & private** — Whisper runs entirely on your machine with Metal GPU acceleration on macOS
- **Cross-platform** — runs on macOS, Linux, and Windows with pre-built binaries for all three
- **Microphone selection** — choose from available input devices in the settings panel
## Requirements
- **macOS**, **Linux**, or **Windows**
- **Microphone** access
- **macOS Accessibility permission** — required for text injection (System Settings → Privacy & Security → Accessibility → add Keryxis or your terminal)
- ~75MB disk space for the Whisper tiny model (more for larger models)
## Installation
### Pre-built release
Download the latest release for your platform from the [Releases](https://github.com/mizrael/keryxis/releases) page:
| Platform | File | Notes |
|----------|------|-------|
| macOS (Apple Silicon) | `keryxis-macos-arm64.tar.gz` | M1/M2/M3/M4 Macs |
| macOS (Intel) | `keryxis-macos-x86_64.tar.gz` | Older Intel Macs |
| Linux (x86_64) | `keryxis-linux-x86_64.tar.gz` | |
| Windows (x86_64) | `keryxis-windows-x86_64.zip` | |
No Rust toolchain needed.
### Build from source
Requires the [Rust toolchain](https://rustup.rs/).
**macOS** (default features include Metal GPU acceleration + GUI):
```bash
cargo build --release
```
**Linux** (no Metal, GUI enabled):
```bash
cargo build --release --no-default-features --features gui
```
**Windows** (no Metal, GUI enabled):
```bash
cargo build --release --no-default-features --features gui
```
**Build without GUI** (headless/server, no overlay window):
```bash
cargo build --release --no-default-features --features metal # macOS
cargo build --release --no-default-features # Linux/Windows
```
**Build with CUDA** (Linux with NVIDIA GPU):
```bash
cargo build --release --no-default-features --features cuda,gui
```
## Setup
### 1. Platform-specific setup
**macOS:** Grant Accessibility permission — go to **System Settings → Privacy & Security → Accessibility** and add your terminal app (Terminal.app, iTerm2, or VS Code).
**Windows:** No additional setup required. You may need to allow microphone access in Windows Settings → Privacy → Microphone.
**Linux:** Ensure your user has access to audio devices (usually automatic).
### 2. Start Keryxis
Simply run the binary with no arguments:
```bash
keryxis
```
This starts the background daemon and opens the floating overlay. On first launch, the Whisper model (`tiny`, ~75MB) is downloaded automatically. On Windows, the console window hides automatically.
You can also start explicitly:
```bash
keryxis daemon start
```
To pre-download a specific model size (optional):
```bash
keryxis download-model --size small
```
Available sizes: `tiny` (75MB, fastest), `base` (150MB), `small` (500MB), `medium` (1.5GB), `large` (3GB, most accurate).
## Usage
### Daemon commands
```bash
# Start daemon + overlay
keryxis daemon start
# Check status
keryxis daemon status
# Stop daemon + overlay
keryxis daemon stop
# Open overlay separately
keryxis overlay
```
### Foreground mode (no daemon)
```bash
# Run in foreground with default settings
keryxis start
# Run with specific mode
keryxis start --mode vad
keryxis start --mode wake-word
# Run with a different hotkey
keryxis start --hotkey "Ctrl+Shift+R"
```
### Configuration via CLI
```bash
# Show current config
keryxis config --show
# Change settings
keryxis config --mode vad
keryxis config --hotkey "Alt+R"
keryxis config --wake-word "hey computer"
keryxis config --model small
keryxis config --language it
```
## The Overlay
The floating overlay shows:
```
● RDY > Terminal [Press to talk] ≡ ⚙
```
- **Status indicator**: green (ready), red pulsing (recording), yellow (processing/disconnected), gray (paused — overlay focused)
- **Target app**: which application will receive the text
- **Mode label**: current activation mode
- **≡** — toggle live daemon log viewer
- **⚙** — open settings panel
### Settings panel
Click ⚙ to configure:
- **Mode** — Press to talk / Auto-stop / Hands-free
- **Hotkey** — click the field and press your desired key combination (e.g., Alt+Space, Cmd+R)
- **Wake word** — the phrase that activates hands-free mode
- **Microphone** — select from available input devices, or use system default (click ⟳ to refresh the list)
- **Model** — Whisper model size (Tiny through Large)
- **Languages** — ordered priority list; click `+ Language` to add, click a language to remove, `^` to reorder
Changes auto-restart the daemon when you click Save.
## Configuration
Config file location:
- **macOS/Linux:** `~/.config/keryxis/config.toml`
- **Windows:** `%APPDATA%\keryxis\config.toml`
```toml
[activation]
mode = "toggle" # toggle, vad, or wake_word
hotkey = "Alt+Space"
wake_word = "hey terminal"
[whisper]
model_size = "tiny" # tiny, base, small, medium, large
language = "" # single language override (legacy)
languages = ["en", "it"] # priority list — tried in order
[vad]
energy_threshold = 0.01 # speech detection sensitivity (0.0 - 1.0)
silence_duration_ms = 1500
min_speech_duration_ms = 500
[audio]
sample_rate = 16000
channels = 1
# device = "Headset Microphone (Jabra)" # optional — omit for system default
[daemon]
auto_start_overlay = true
[overlay]
position = "top-right" # top-right, top-left, bottom-right, bottom-left
opacity = 0.85 # overlay background opacity (0.0 - 1.0)
```
## How It Works
1. **Audio capture** — records from your microphone via `cpal` (cross-platform)
2. **Voice activity detection** — energy-based VAD detects speech onset and silence
3. **Speech recognition** — local Whisper model (via `whisper-rs` / `whisper.cpp`) transcribes audio
4. **Text injection** — `enigo` simulates keyboard input into the focused application
5. **Daemon** — background process communicates state to the overlay via IPC (Unix socket on macOS/Linux, TCP on Windows) using newline-delimited JSON
### Multi-language priority
When multiple languages are configured (e.g., `["en", "it"]`), Keryxis tries each in order. English is tried first — if the transcription is non-empty, it's used. Otherwise, Italian is tried. This is faster than auto-detect because each attempt with a specific language skips Whisper's language detection step.
## Files & Paths
### macOS / Linux
| Path | Purpose |
|------|---------|
| `~/.config/keryxis/config.toml` | Configuration |
| `~/.local/share/keryxis/models/` | Whisper model files |
| `~/.local/state/keryxis/daemon.pid` | Daemon PID file |
| `~/.local/state/keryxis/daemon.log` | Daemon log file |
| `~/.local/state/keryxis/keryxis.sock` | Unix socket for IPC |
### Windows
| Path | Purpose |
|------|---------|
| `%APPDATA%\keryxis\config.toml` | Configuration |
| `%APPDATA%\keryxis\models\` | Whisper model files |
| `%LOCALAPPDATA%\keryxis\daemon.pid` | Daemon PID file |
| `%LOCALAPPDATA%\keryxis\daemon.log` | Daemon log file |
| TCP `127.0.0.1:19457` | IPC (replaces Unix socket) |
## Supported Hotkeys
Modifiers: `Alt` / `Option`, `Ctrl` / `Control`, `Shift`, `Cmd` / `Meta` / `Super`
Keys: `A`-`Z`, `F1`-`F12`, `Space`, `Tab`, `Return`, `Escape`, `Backspace`
Examples: `Alt+Space`, `Ctrl+Shift+R`, `Cmd+E`, `F5`
## Troubleshooting
**"Accessibility permission required"** (macOS) — Add your terminal to System Settings → Privacy & Security → Accessibility. Toggle it off and on if already listed.
**Overlay shows "OFF"** — Daemon isn't running. Run `keryxis daemon start` or just `keryxis`.
**Overlay shows "PAUSED"** — The overlay window is focused. Click on another application to resume.
**Wrong microphone** — Open settings (⚙) and select the correct microphone from the dropdown. Click ⟳ to refresh the device list.
**Wake word not detected** — Whisper may transcribe your wake word differently. Check the daemon log (`≡` button in overlay) to see what Whisper hears. Punctuation is stripped before matching.
**Transcription is slow** — Use the `tiny` model. Ensure Metal GPU acceleration is working on macOS (check daemon log for "using device Metal").
**No audio captured** — Check that your microphone is working and not muted. On macOS, ensure microphone permission is granted. On Windows, check Settings → Privacy → Microphone.
## License
MIT