https://github.com/0xPD33/sonori
Sonori is a fully local STT app for Linux (Wayland).
https://github.com/0xPD33/sonori
asr automatic-speech-recognition ctranslate2 linux onnxruntime speech-recognition speech-to-text stt voice-activity-detection voice-recognition vulkan wayland wgpu whisper whisper-cpp
Last synced: 3 days ago
JSON representation
Sonori is a fully local STT app for Linux (Wayland).
- Host: GitHub
- URL: https://github.com/0xPD33/sonori
- Owner: 0xPD33
- License: mit
- Created: 2025-03-18T00:56:25.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2026-03-08T17:42:10.000Z (3 months ago)
- Last Synced: 2026-03-08T21:33:13.618Z (3 months ago)
- Topics: asr, automatic-speech-recognition, ctranslate2, linux, onnxruntime, speech-recognition, speech-to-text, stt, voice-activity-detection, voice-recognition, vulkan, wayland, wgpu, whisper, whisper-cpp
- Language: Rust
- Homepage:
- Size: 2.73 MB
- Stars: 17
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
- Awesome-Whisper-Apps - sonori - square) - Voice input system (By Platform / Linux)
README
# Sonori
**Local AI speech transcription with a transparent overlay for Linux**
Real-time or on-demand transcription, entirely on your device.
[](LICENSE)
[](#system-requirements)
[](#compositor-wayland)

---
> **Note:** Active development. You may encounter bugs or instability as new features are added.
## Features
### Core
- **Local AI Processing** - All transcription happens on your device, no cloud services required
- **Multi-Backend Support** - Choose between CTranslate2, Whisper.cpp, Moonshine, or Parakeet TDT backends
- **Dual Transcription Modes** - Real-time continuous transcription or manual on-demand sessions
- **Voice Activity Detection** - Uses Silero VAD for accurate speech detection
- **Automatic Model Download** - Models are downloaded automatically on first run
### Interface
- **Transparent Overlay** - Non-intrusive overlay at the bottom of your screen
- **CLI Mode** - Run without GUI using `--cli` flag for headless/terminal usage
- **Audio Visualization** - Spectrogram display shows audio input in real-time
- **System Tray Integration** - Quick access with window control and status display
- **Typewriter Effect** - Character-by-character text reveal animation when transcription completes
### Optional Features
- **GPU Acceleration** - Vulkan-based rendering; ONNX Runtime GPU acceleration for Moonshine and Parakeet TDT backends
- **Global Shortcuts** - System-wide hotkeys via XDG Desktop Portal (e.g., Super+\ to toggle recording)
- **Auto-Paste** - Automatic text injection via XDG Desktop Portal, with wtype/dotool fallback for compositors without portal support
- **Sound Feedback** - Audio cues for recording state changes
- **Magic Mode** - Post-process transcriptions through a local LLM to clean up grammar, remove filler words, and improve readability
### Roadmap
**Planned:**
- Better error handling and UI improvements
- CUDA support for GPU acceleration
- Additional local AI backends
- Optional cloud API support (Deepgram, OpenAI)
**Not Planned:**
- GUI framework (custom wgpu/wgsl implementation by design)
- Windows/macOS support (contributions welcome)
## System Requirements
**Platform:** Linux x86_64 only
**Tested on:** NixOS with KDE Plasma/KWin and niri (Wayland)
### Compositor (Wayland)
| Protocol | Required | Purpose |
|----------|----------|---------|
| `zwlr_layer_shell_v1` | **Yes** | Transparent overlay rendering |
| XDG Portal: GlobalShortcuts | No | System-wide hotkeys |
| XDG Portal: RemoteDesktop | No | Auto-paste via portal (fallback: wtype/dotool) |
**Compositor Compatibility:**
| Compositor | Status |
|------------|--------|
| KDE Plasma (KWin) | ✅ Full support |
| niri | ✅ Full support (use IPC for keybindings) |
| Hyprland | ✅ Should work |
| Sway | ✅ Should work |
| GNOME (Mutter) | ❌ No layer shell (use CLI mode) |
### Hardware
- **GPU:** Vulkan-capable with appropriate drivers
- **Audio:** Working microphone, PipeWire or PulseAudio
## Installation
### AppImage (Recommended)
```bash
# Download from GitHub Releases
chmod +x Sonori-*-x86_64.AppImage
./Sonori-*-x86_64.AppImage
```
### Release Tarball
```bash
tar -xzf sonori-*-x86_64-linux.tar.gz
./sonori-*/sonori
```
### NixOS
```bash
# Try without installing
nix run github:0xPD33/sonori
# Install to profile
nix profile install github:0xPD33/sonori
```
Or add to your flake:
```nix
{
inputs.sonori.url = "github:0xPD33/sonori";
# Then add: inputs.sonori.packages.${system}.default
}
```
### Building from Source
**Prerequisites:** [Rust](https://rustup.rs/) and distribution-specific dependencies.
Ubuntu/Debian 24.04+
```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install -y build-essential portaudio19-dev libclang-dev pkg-config \
libxkbcommon-dev libwayland-dev libx11-dev libxcursor-dev libxi-dev libxrandr-dev \
libasound2-dev libssl-dev libfftw3-dev curl cmake libvulkan-dev libopenblas-dev glslc
# Install ONNX Runtime (not in repos)
ONNX_VERSION=1.22.0
wget https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_VERSION}/onnxruntime-linux-x64-${ONNX_VERSION}.tgz
tar -xzf onnxruntime-linux-x64-${ONNX_VERSION}.tgz
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/include/* /usr/local/include/
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib/
sudo mkdir -p /usr/local/lib64
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib64/
echo "/usr/local/lib" | sudo tee /etc/ld.so.conf.d/onnxruntime.conf
echo "/usr/local/lib64" | sudo tee -a /etc/ld.so.conf.d/onnxruntime.conf
sudo ldconfig
```
Set environment variables before building:
```bash
export BLAS_INCLUDE_DIRS=/usr/include/x86_64-linux-gnu
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=system
export ORT_LIB_LOCATION=/usr/local/lib
```
Fedora/RHEL
```bash
sudo dnf install gcc gcc-c++ portaudio-devel clang-devel pkg-config \
libxkbcommon-devel wayland-devel libX11-devel libXcursor-devel libXi-devel libXrandr-devel \
alsa-lib-devel openssl-devel fftw-devel curl cmake vulkan-loader-devel vulkan-headers \
openblas-devel shaderc onnxruntime-devel
```
Set environment variables before building:
```bash
export BLAS_INCLUDE_DIRS=/usr/include/openblas
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=system
```
Arch/Manjaro
```bash
sudo pacman -S base-devel portaudio clang pkgconf \
libxkbcommon wayland libx11 libxcursor libxi libxrandr alsa-lib openssl fftw curl cmake \
vulkan-headers vulkan-tools openblas shaderc
# Install onnxruntime from AUR (e.g., yay -S onnxruntime)
```
Set environment variables before building:
```bash
export BLAS_INCLUDE_DIRS=/usr/include/openblas
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=system
```
NixOS
```bash
nix develop # All dependencies included
```
**Build:**
```bash
git clone https://github.com/0xPD33/sonori
cd sonori
# Ensure environment variables are set (see distro-specific instructions above)
cargo build --release
./target/release/sonori
```
### Desktop Integration
**NixOS:** Automatic via Nix flake.
**Other distributions:**
```bash
./install-desktop.sh --user # User installation (recommended)
sudo ./install-desktop.sh --system # System-wide installation
```
See [desktop/README.md](desktop/README.md) for details.
## Usage
### GUI Mode (Default)
```bash
sonori
```
1. A transparent overlay appears at the bottom of your screen
2. **Real-time mode:** Recording starts automatically
3. **Manual mode:** Press Record to start/stop sessions
4. Use overlay buttons to copy text, clear history, switch modes, or exit
### CLI Mode
```bash
sonori --cli
```
- Transcription appears directly in terminal
- Real-time mode: auto-starts recording
- Manual mode: use spacebar to start/stop
- `Ctrl+C` to exit
### Command Line Options
| Option | Description |
|--------|-------------|
| `--cli` | Run in CLI mode without GUI |
| `--mode ` | Set transcription mode (default: manual) |
| `--manual` | Shorthand for `--mode manual` |
| `--help` | Show help information |
| `--version` | Display version |
### IPC Commands (External Control)
Control a running Sonori instance via CLI subcommands. Useful for compositor keybindings on niri, sway, etc. where XDG GlobalShortcuts portal isn't available.
```bash
sonori toggle # Toggle recording on/off
sonori start # Start recording session
sonori stop # Stop recording session
sonori cancel # Cancel session without processing
sonori status # Get current status (JSON)
sonori switch-mode manual|realtime
```
**Example niri keybinding** (`~/.config/niri/config.kdl`):
```kdl
binds {
Mod+backslash { spawn "sonori" "toggle"; }
}
```
## Configuration
Sonori uses `config.toml` for configuration. Defaults work well for most users.
**Quick Setup:** Choose a preset from the [Configuration Guide](./CONFIGURATION.md):
- **Fast & Lightweight** - Good for older computers
- **Balanced Performance** - Recommended for most users
- **High Quality** - For powerful computers with GPU
- **Real-Time** - Live transcription as you speak
- **Multilingual** - For non-English languages
- **Moonshine** - ONNX-based backend with fast real-time performance
- **Parakeet TDT** - NVIDIA NeMo model via sherpa-onnx, multilingual or English-only
## Troubleshooting
### Wayland / Layer Shell
Sonori uses `zwlr_layer_shell_v1` for the transparent overlay.
- Verify Wayland session: `echo $XDG_SESSION_TYPE` should return `wayland`
- Check [Compositor Compatibility](#compositor-wayland) table above
- GNOME/Mutter doesn't support layer shell - use CLI mode (`--cli`)
### Vulkan / GPU
Required for UI rendering and optional GPU-accelerated transcription.
- Install Vulkan libraries: `vulkan-loader`, `vulkan-headers`
- Vendor-specific packages may be needed (e.g., `mesa-vulkan-drivers` on Ubuntu)
- Test with: `vulkaninfo` or `vkcube`
- For GPU transcription: enable `gpu_enabled = true` in `[backend_config]`
### XDG Desktop Portal Features
**Global Shortcuts** (`global_shortcuts_enabled`):
- Requires KDE Plasma 6+ or GNOME 45+
- Accept permission dialog on first run
- Check portal is running: `systemctl --user status xdg-desktop-portal`
**Auto-Paste** (`portal_input_enabled`):
- Uses XDG RemoteDesktop portal for keyboard injection (KDE Plasma)
- Falls back to `wtype` when portal is unavailable (sway, Hyprland, niri, river, labwc, COSMIC)
- Falls back to `dotool` if wtype also fails (works on all compositors via uinput — requires `input` group membership)
- Copies text to clipboard via `wl-copy`, then simulates the configured paste shortcut
### Model Issues
**Automatic conversion fails:**
```bash
# NixOS
nix-shell model-conversion/shell.nix
ct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json
# Other distros
pip install -U ctranslate2 huggingface_hub torch transformers
ct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json
```
**30-second truncation:** Whisper's 30-second window with 448 token limit can truncate dense speech. Solutions:
1. Keep recordings under 25 seconds
2. Adjust `chunk_duration_seconds` (15-25) in `[manual_mode_config]`
3. Try CTranslate2 backend
**Moonshine model layout:** Moonshine uses ONNX merged models (auto-downloaded) and expects a model name like `tiny` or `base`. If you see decoder input errors, set `[moonshine_options].enable_cache = false` and retry.
**Parakeet model layout:** Parakeet uses INT8 split ONNX models via sherpa-onnx (auto-downloaded from HuggingFace). Models are stored in `~/.cache/sonori/models/parakeet-tdt-v3-int8/` (v3, multilingual) or `parakeet-tdt-v2-int8/` (v2, English-only).
## Known Issues
- Not all Wayland compositors supported (tested primarily on KDE Plasma/KWin)
- Transcription accuracy depends on Whisper model quality
- CPU usage can be high when idle (buffer size related)
## Contributing
Contributions welcome! Whether fixing bugs, adding features, improving docs, or testing on different distributions.
**Getting Started:**
- See [ARCHITECTURE.md](./ARCHITECTURE.md) to understand the codebase
- Check planned features and known issues above
- Test on your distribution
- Open an issue or PR
## Credits
- [Rust](https://www.rust-lang.org/)
- [CTranslate2](https://github.com/OpenNMT/CTranslate2) / [Faster Whisper](https://github.com/SYSTRAN/faster-whisper)
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) / [whisper-rs](https://codeberg.org/tazz4843/whisper-rs)
- [ONNX Runtime](https://github.com/microsoft/onnxruntime)
- [OpenAI Whisper](https://github.com/openai/whisper)
- [Moonshine](https://github.com/moonshine-ai/moonshine)
- [NVIDIA NeMo / Parakeet TDT](https://github.com/NVIDIA/NeMo)
- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
- [Silero VAD](https://github.com/snakers4/silero-vad)
- [CPAL](https://github.com/RustAudio/cpal)
- [Winit Fork](https://github.com/SergioRibera/winit)
- [WGPU](https://github.com/gfx-rs/wgpu)
## License
MIT License - see [LICENSE](LICENSE) for details.