{"id":50493452,"url":"https://github.com/0xPD33/sonori","last_synced_at":"2026-06-18T22:00:48.199Z","repository":{"id":282986091,"uuid":"950311731","full_name":"0xPD33/sonori","owner":"0xPD33","description":"Sonori is a fully local STT app for Linux (Wayland).","archived":false,"fork":false,"pushed_at":"2026-03-08T17:42:10.000Z","size":2858,"stargazers_count":17,"open_issues_count":2,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-03-08T21:33:13.618Z","etag":null,"topics":["asr","automatic-speech-recognition","ctranslate2","linux","onnxruntime","speech-recognition","speech-to-text","stt","voice-activity-detection","voice-recognition","vulkan","wayland","wgpu","whisper","whisper-cpp"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0xPD33.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-03-18T00:56:25.000Z","updated_at":"2026-03-08T18:08:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"09c88c97-a816-4d6e-a45a-6cd68360087c","html_url":"https://github.com/0xPD33/sonori","commit_stats":null,"previous_names":["0xpd33/sonori"],"tags_count":28,"template":false,"template_full_name":null,"purl":"pkg:github/0xPD33/sonori","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xPD33%2Fsonori","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xPD33%2Fsonori/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xPD33%2Fsonori/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xPD33%2Fsonori/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0xPD33","download_url":"https://codeload.github.com/0xPD33/sonori/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xPD33%2Fsonori/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34508867,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","automatic-speech-recognition","ctranslate2","linux","onnxruntime","speech-recognition","speech-to-text","stt","voice-activity-detection","voice-recognition","vulkan","wayland","wgpu","whisper","whisper-cpp"],"created_at":"2026-06-02T05:00:36.702Z","updated_at":"2026-06-18T22:00:48.193Z","avatar_url":"https://github.com/0xPD33.png","language":"Rust","funding_links":[],"categories":["By Platform"],"sub_categories":["Linux"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Sonori\n\n**Local AI speech transcription with a transparent overlay for Linux**\n\nReal-time or on-demand transcription, entirely on your device.\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![Platform: Linux](https://img.shields.io/badge/Platform-Linux%20x86__64-orange.svg)](#system-requirements)\n[![Wayland](https://img.shields.io/badge/Wayland-Layer%20Shell-blueviolet.svg)](#compositor-wayland)\n\n\u003cbr\u003e\n\n![Sonori Demo](assets/sonori-demo.gif)\n\n\u003c/div\u003e\n\n---\n\n\u003e **Note:** Active development. You may encounter bugs or instability as new features are added.\n\n## Features\n\n### Core\n- **Local AI Processing** - All transcription happens on your device, no cloud services required\n- **Multi-Backend Support** - Choose between CTranslate2, Whisper.cpp, Moonshine, or Parakeet TDT backends\n- **Dual Transcription Modes** - Real-time continuous transcription or manual on-demand sessions\n- **Voice Activity Detection** - Uses Silero VAD for accurate speech detection\n- **Automatic Model Download** - Models are downloaded automatically on first run\n\n### Interface\n- **Transparent Overlay** - Non-intrusive overlay at the bottom of your screen\n- **CLI Mode** - Run without GUI using `--cli` flag for headless/terminal usage\n- **Audio Visualization** - Spectrogram display shows audio input in real-time\n- **System Tray Integration** - Quick access with window control and status display\n- **Typewriter Effect** - Character-by-character text reveal animation when transcription completes\n\n### Optional Features\n- **GPU Acceleration** - Vulkan-based rendering; ONNX Runtime GPU acceleration for Moonshine and Parakeet TDT backends\n- **Global Shortcuts** - System-wide hotkeys via XDG Desktop Portal (e.g., Super+\\ to toggle recording)\n- **Auto-Paste** - Automatic text injection via XDG Desktop Portal, with wtype/dotool fallback for compositors without portal support\n- **Sound Feedback** - Audio cues for recording state changes\n- **Magic Mode** - Post-process transcriptions through a local LLM to clean up grammar, remove filler words, and improve readability\n\n### Roadmap\n\n**Planned:**\n- Better error handling and UI improvements\n- CUDA support for GPU acceleration\n- Additional local AI backends\n- Optional cloud API support (Deepgram, OpenAI)\n\n**Not Planned:**\n- GUI framework (custom wgpu/wgsl implementation by design)\n- Windows/macOS support (contributions welcome)\n\n## System Requirements\n\n**Platform:** Linux x86_64 only\n\n**Tested on:** NixOS with KDE Plasma/KWin and niri (Wayland)\n\n### Compositor (Wayland)\n\n| Protocol | Required | Purpose |\n|----------|----------|---------|\n| `zwlr_layer_shell_v1` | **Yes** | Transparent overlay rendering |\n| XDG Portal: GlobalShortcuts | No | System-wide hotkeys |\n| XDG Portal: RemoteDesktop | No | Auto-paste via portal (fallback: wtype/dotool) |\n\n**Compositor Compatibility:**\n| Compositor | Status |\n|------------|--------|\n| KDE Plasma (KWin) | ✅ Full support |\n| niri | ✅ Full support (use IPC for keybindings) |\n| Hyprland | ✅ Should work |\n| Sway | ✅ Should work |\n| GNOME (Mutter) | ❌ No layer shell (use CLI mode) |\n\n### Hardware\n- **GPU:** Vulkan-capable with appropriate drivers\n- **Audio:** Working microphone, PipeWire or PulseAudio\n\n## Installation\n\n### AppImage (Recommended)\n\n```bash\n# Download from GitHub Releases\nchmod +x Sonori-*-x86_64.AppImage\n./Sonori-*-x86_64.AppImage\n```\n\n### Release Tarball\n\n```bash\ntar -xzf sonori-*-x86_64-linux.tar.gz\n./sonori-*/sonori\n```\n\n### NixOS\n\n```bash\n# Try without installing\nnix run github:0xPD33/sonori\n\n# Install to profile\nnix profile install github:0xPD33/sonori\n```\n\nOr add to your flake:\n```nix\n{\n  inputs.sonori.url = \"github:0xPD33/sonori\";\n  # Then add: inputs.sonori.packages.${system}.default\n}\n```\n\n### Building from Source\n\n**Prerequisites:** [Rust](https://rustup.rs/) and distribution-specific dependencies.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eUbuntu/Debian 24.04+\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\n# Install system dependencies\nsudo apt-get update\nsudo apt-get install -y build-essential portaudio19-dev libclang-dev pkg-config \\\n  libxkbcommon-dev libwayland-dev libx11-dev libxcursor-dev libxi-dev libxrandr-dev \\\n  libasound2-dev libssl-dev libfftw3-dev curl cmake libvulkan-dev libopenblas-dev glslc\n\n# Install ONNX Runtime (not in repos)\nONNX_VERSION=1.22.0\nwget https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_VERSION}/onnxruntime-linux-x64-${ONNX_VERSION}.tgz\ntar -xzf onnxruntime-linux-x64-${ONNX_VERSION}.tgz\nsudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/include/* /usr/local/include/\nsudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib/\nsudo mkdir -p /usr/local/lib64\nsudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib64/\necho \"/usr/local/lib\" | sudo tee /etc/ld.so.conf.d/onnxruntime.conf\necho \"/usr/local/lib64\" | sudo tee -a /etc/ld.so.conf.d/onnxruntime.conf\nsudo ldconfig\n```\n\nSet environment variables before building:\n```bash\nexport BLAS_INCLUDE_DIRS=/usr/include/x86_64-linux-gnu\nexport OPENBLAS_PATH=/usr\nexport ORT_STRATEGY=system\nexport ORT_LIB_LOCATION=/usr/local/lib\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eFedora/RHEL\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\nsudo dnf install gcc gcc-c++ portaudio-devel clang-devel pkg-config \\\n  libxkbcommon-devel wayland-devel libX11-devel libXcursor-devel libXi-devel libXrandr-devel \\\n  alsa-lib-devel openssl-devel fftw-devel curl cmake vulkan-loader-devel vulkan-headers \\\n  openblas-devel shaderc onnxruntime-devel\n```\n\nSet environment variables before building:\n```bash\nexport BLAS_INCLUDE_DIRS=/usr/include/openblas\nexport OPENBLAS_PATH=/usr\nexport ORT_STRATEGY=system\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eArch/Manjaro\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\nsudo pacman -S base-devel portaudio clang pkgconf \\\n  libxkbcommon wayland libx11 libxcursor libxi libxrandr alsa-lib openssl fftw curl cmake \\\n  vulkan-headers vulkan-tools openblas shaderc\n# Install onnxruntime from AUR (e.g., yay -S onnxruntime)\n```\n\nSet environment variables before building:\n```bash\nexport BLAS_INCLUDE_DIRS=/usr/include/openblas\nexport OPENBLAS_PATH=/usr\nexport ORT_STRATEGY=system\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eNixOS\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\nnix develop  # All dependencies included\n```\n\u003c/details\u003e\n\n**Build:**\n```bash\ngit clone https://github.com/0xPD33/sonori\ncd sonori\n# Ensure environment variables are set (see distro-specific instructions above)\ncargo build --release\n./target/release/sonori\n```\n\n### Desktop Integration\n\n**NixOS:** Automatic via Nix flake.\n\n**Other distributions:**\n```bash\n./install-desktop.sh --user        # User installation (recommended)\nsudo ./install-desktop.sh --system # System-wide installation\n```\n\nSee [desktop/README.md](desktop/README.md) for details.\n\n## Usage\n\n### GUI Mode (Default)\n\n```bash\nsonori\n```\n\n1. A transparent overlay appears at the bottom of your screen\n2. **Real-time mode:** Recording starts automatically\n3. **Manual mode:** Press Record to start/stop sessions\n4. Use overlay buttons to copy text, clear history, switch modes, or exit\n\n### CLI Mode\n\n```bash\nsonori --cli\n```\n\n- Transcription appears directly in terminal\n- Real-time mode: auto-starts recording\n- Manual mode: use spacebar to start/stop\n- `Ctrl+C` to exit\n\n### Command Line Options\n\n| Option | Description |\n|--------|-------------|\n| `--cli` | Run in CLI mode without GUI |\n| `--mode \u003crealtime\\|manual\u003e` | Set transcription mode (default: manual) |\n| `--manual` | Shorthand for `--mode manual` |\n| `--help` | Show help information |\n| `--version` | Display version |\n\n### IPC Commands (External Control)\n\nControl a running Sonori instance via CLI subcommands. Useful for compositor keybindings on niri, sway, etc. where XDG GlobalShortcuts portal isn't available.\n\n```bash\nsonori toggle      # Toggle recording on/off\nsonori start       # Start recording session\nsonori stop        # Stop recording session\nsonori cancel      # Cancel session without processing\nsonori status      # Get current status (JSON)\nsonori switch-mode manual|realtime\n```\n\n**Example niri keybinding** (`~/.config/niri/config.kdl`):\n```kdl\nbinds {\n    Mod+backslash { spawn \"sonori\" \"toggle\"; }\n}\n```\n\n## Configuration\n\nSonori uses `config.toml` for configuration. Defaults work well for most users.\n\n**Quick Setup:** Choose a preset from the [Configuration Guide](./CONFIGURATION.md):\n- **Fast \u0026 Lightweight** - Good for older computers\n- **Balanced Performance** - Recommended for most users\n- **High Quality** - For powerful computers with GPU\n- **Real-Time** - Live transcription as you speak\n- **Multilingual** - For non-English languages\n- **Moonshine** - ONNX-based backend with fast real-time performance\n- **Parakeet TDT** - NVIDIA NeMo model via sherpa-onnx, multilingual or English-only\n\n## Troubleshooting\n\n### Wayland / Layer Shell\n\nSonori uses `zwlr_layer_shell_v1` for the transparent overlay.\n\n- Verify Wayland session: `echo $XDG_SESSION_TYPE` should return `wayland`\n- Check [Compositor Compatibility](#compositor-wayland) table above\n- GNOME/Mutter doesn't support layer shell - use CLI mode (`--cli`)\n\n### Vulkan / GPU\n\nRequired for UI rendering and optional GPU-accelerated transcription.\n\n- Install Vulkan libraries: `vulkan-loader`, `vulkan-headers`\n- Vendor-specific packages may be needed (e.g., `mesa-vulkan-drivers` on Ubuntu)\n- Test with: `vulkaninfo` or `vkcube`\n- For GPU transcription: enable `gpu_enabled = true` in `[backend_config]`\n\n### XDG Desktop Portal Features\n\n**Global Shortcuts** (`global_shortcuts_enabled`):\n- Requires KDE Plasma 6+ or GNOME 45+\n- Accept permission dialog on first run\n- Check portal is running: `systemctl --user status xdg-desktop-portal`\n\n**Auto-Paste** (`portal_input_enabled`):\n- Uses XDG RemoteDesktop portal for keyboard injection (KDE Plasma)\n- Falls back to `wtype` when portal is unavailable (sway, Hyprland, niri, river, labwc, COSMIC)\n- Falls back to `dotool` if wtype also fails (works on all compositors via uinput — requires `input` group membership)\n- Copies text to clipboard via `wl-copy`, then simulates the configured paste shortcut\n\n### Model Issues\n\n**Automatic conversion fails:**\n```bash\n# NixOS\nnix-shell model-conversion/shell.nix\nct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json\n\n# Other distros\npip install -U ctranslate2 huggingface_hub torch transformers\nct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json\n```\n\n**30-second truncation:** Whisper's 30-second window with 448 token limit can truncate dense speech. Solutions:\n1. Keep recordings under 25 seconds\n2. Adjust `chunk_duration_seconds` (15-25) in `[manual_mode_config]`\n3. Try CTranslate2 backend\n\n**Moonshine model layout:** Moonshine uses ONNX merged models (auto-downloaded) and expects a model name like `tiny` or `base`. If you see decoder input errors, set `[moonshine_options].enable_cache = false` and retry.\n\n**Parakeet model layout:** Parakeet uses INT8 split ONNX models via sherpa-onnx (auto-downloaded from HuggingFace). Models are stored in `~/.cache/sonori/models/parakeet-tdt-v3-int8/` (v3, multilingual) or `parakeet-tdt-v2-int8/` (v2, English-only).\n\n## Known Issues\n\n- Not all Wayland compositors supported (tested primarily on KDE Plasma/KWin)\n- Transcription accuracy depends on Whisper model quality\n- CPU usage can be high when idle (buffer size related)\n\n## Contributing\n\nContributions welcome! Whether fixing bugs, adding features, improving docs, or testing on different distributions.\n\n**Getting Started:**\n- See [ARCHITECTURE.md](./ARCHITECTURE.md) to understand the codebase\n- Check planned features and known issues above\n- Test on your distribution\n- Open an issue or PR\n\n## Credits\n\n- [Rust](https://www.rust-lang.org/)\n- [CTranslate2](https://github.com/OpenNMT/CTranslate2) / [Faster Whisper](https://github.com/SYSTRAN/faster-whisper)\n- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) / [whisper-rs](https://codeberg.org/tazz4843/whisper-rs)\n- [ONNX Runtime](https://github.com/microsoft/onnxruntime)\n- [OpenAI Whisper](https://github.com/openai/whisper)\n- [Moonshine](https://github.com/moonshine-ai/moonshine)\n- [NVIDIA NeMo / Parakeet TDT](https://github.com/NVIDIA/NeMo)\n- [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)\n- [Silero VAD](https://github.com/snakers4/silero-vad)\n- [CPAL](https://github.com/RustAudio/cpal)\n- [Winit Fork](https://github.com/SergioRibera/winit)\n- [WGPU](https://github.com/gfx-rs/wgpu)\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xPD33%2Fsonori","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0xPD33%2Fsonori","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xPD33%2Fsonori/lists"}