{"id":51327677,"url":"https://github.com/novafabric/yazses","last_synced_at":"2026-07-01T20:00:52.314Z","repository":{"id":361432400,"uuid":"1254421991","full_name":"novafabric/yazses","owner":"novafabric","description":"Local, offline hold-to-talk voice dictation for Linux, macOS \u0026 Windows — hold a key, speak, release, and it types into any app. On-device faster-whisper STT plus voice commands \u0026 macros. No cloud, no API key, fully private.","archived":false,"fork":false,"pushed_at":"2026-07-01T11:51:43.000Z","size":5333,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-07-01T13:24:15.794Z","etag":null,"topics":["accessibility","cross-platform","developer-tools","dictation","faster-whisper","hold-to-talk","linux","macos","offline","on-device","privacy","python","speech-recognition","speech-to-text","voice","voice-commands","voice-dictation","voice-typing","whisper","windows"],"latest_commit_sha":null,"homepage":"https://novafabric.github.io/yazses/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/novafabric.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":".github/SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-30T14:45:11.000Z","updated_at":"2026-07-01T11:50:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/novafabric/yazses","commit_stats":null,"previous_names":["novafabric/yazses"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/novafabric/yazses","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/novafabric%2Fyazses","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/novafabric%2Fyazses/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/novafabric%2Fyazses/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/novafabric%2Fyazses/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/novafabric","download_url":"https://codeload.github.com/novafabric/yazses/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/novafabric%2Fyazses/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35020872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-01T02:00:05.325Z","response_time":130,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","cross-platform","developer-tools","dictation","faster-whisper","hold-to-talk","linux","macos","offline","on-device","privacy","python","speech-recognition","speech-to-text","voice","voice-commands","voice-dictation","voice-typing","whisper","windows"],"created_at":"2026-07-01T20:00:36.599Z","updated_at":"2026-07-01T20:00:52.297Z","avatar_url":"https://github.com/novafabric.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# YazSes\n\n[![Tests](https://github.com/MSKazemi/yazses/actions/workflows/test.yml/badge.svg)](https://github.com/MSKazemi/yazses/actions/workflows/test.yml)\n[![PyPI](https://img.shields.io/pypi/v/yazses)](https://pypi.org/project/yazses/)\n[![Snap Store](https://img.shields.io/snapcraft/v/yazses?logo=snapcraft\u0026label=snap\u0026color=82BEA0)](https://snapcraft.io/yazses)\n[![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n\n**Hold a key → speak → release.** On-device voice dictation that types into any app, plus voice commands and macros — entirely offline. No cloud. No API key. No subscription.\n\nYazSes is an open-source, offline voice-dictation daemon for Linux, macOS, and Windows. It transcribes your speech locally with [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and types the result into whatever window has focus. Use it when you want hands-free dictation and editor/terminal voice commands without sending audio to Google, Apple, or Microsoft.\n\n![yazses doctor — all green, fully offline](docs/screenshots/yazses-doctor.png)\n\n---\n\n## Two versions of YazSes\n\nThis repo holds **one product** with **two implementations** — not two separate apps, but two generations of the same idea. The one you install and run is **Part 1 (Python)**, on this `main` branch.\n\n| | **Part 1 — Python** · `main` | **Rust HCI exploration** · `archive/rust-hci-v1` |\n|---|---|---|\n| What it is | The shipping app — voice dictation, commands, macros | An early-stage rewrite exploring deeper **human–computer interaction**: an on-device *agent* (LLM tool-use, personal memory, editor awareness) |\n| Status | ✅ **Active — current product** (v1.3.0, installed \u0026 maintained) | ⏸️ **Paused / archived** — not shipped, not installable |\n| Hold-to-talk dictation | ✅ | ✅ |\n| Offline STT | ✅ faster-whisper (CPU int8) | ✅ Whisper + Moonshine v2 (~9 ms) |\n| Voice commands | ✅ regex grammar (+ optional SLM router) → key sequences | ✅ via LLM tool-calls |\n| Voice macros · Mid-Thought Undo · Punch-In · Prosody Ink · Ghost Ahead | ✅ | ❌ |\n| Dysfluency-Friendly Mode · learning corpus + `yazses tune` | ✅ | ❌ |\n| Friendly CLI (`-h`, examples, `yazses update`) | ✅ | ❌ |\n| On-device **LLM agent** (OS tools: git commit, media, notes, screenshots…) | ❌ (optional offline text *cleanup* only) | ✅ |\n| **Personal memory** (encrypted on-device vector store) | ❌ | ✅ |\n| Editor context (Neovim / VS Code) | ✅ LSP context, opt-in | ✅ 5-tier window detection + bridges |\n| Screen-reader integration (AT-SPI / NVDA) | ❌ | ✅ |\n| Packaged \u0026 distributed (PyPI, snap, APT) | ✅ | ❌ |\n\n**Bottom line:** if you want YazSes, use **Part 1** (this branch) — an offline dictation + voice-command daemon. The Rust branch is kept only for reference; nothing on `main` builds, installs, or depends on it. The Rust effort aimed at a more ambitious agentic HCI layer but was left in early stages — revisiting it is a deliberate future decision, not part of day-to-day work here.\n\n---\n\n## Quick Start\n\n**Step 1 — Install** (see [all install options](#all-install-options) for every platform)\n\n| Platform | Command |\n|---|---|\n| **Linux** (Debian/Ubuntu) | `bash \u003c(curl -fsSL https://raw.githubusercontent.com/MSKazemi/yazses/main/install-apt.sh)` |\n| **Linux** (any distro) | `sudo snap install yazses` |\n| **Any OS** (Python ≥ 3.11) | `pipx install yazses` |\n\n**Step 2 — Provision the system** *(Linux — one command; the APT install does it automatically)*\n\n```sh\nyazses setup        # installs audio + injection deps, joins the input group, sets up ydotoold\n# then log out and back in (the input-group change needs a fresh login)\n```\n\n`yazses setup` fixes everything dictation needs and is safe to re-run — it only does what's missing:\n- **`libportaudio2`** — audio capture (without it the daemon crashes on start with `OSError: PortAudio library not found`).\n- **injection backends** — `xdotool`/`xclip` (X11) and `wtype`/`ydotool`/`wl-clipboard` (Wayland).\n- **`input` group** — required to read the hold-to-talk hotkey from the kernel.\n- **`ydotoold`** — the virtual-input daemon. On **GNOME/KDE Wayland** this is the *only* way to inject keystrokes (`wtype` is blocked there), so `setup` installs and enables it.\n\n\u003e Prefer to do it by hand? `sudo apt install libportaudio2 xdotool ydotool wtype xclip wl-clipboard pipx \u0026\u0026 sudo usermod -aG input \"$USER\"`, then enable `ydotoold` (see [install-linux](docs/install-linux.md)). Verify anytime with `yazses doctor` — you want `[OK] Keyboard capture`, `[OK] Microphone`, and `[OK] Injection`. macOS/Windows skip this step (grant Accessibility/permissions when prompted — see below).\n\n**Step 3 — Set up**\n\n```sh\nyazses doctor               # check mic, injection backend, permissions (want all [OK])\nyazses enroll               # calibrate your microphone (~30 seconds)\nyazses start                # start the dictation daemon\n```\n\n**Step 4 — Use it** — hold the hotkey, speak, release. The text is typed into the focused app.\n\n| OS | Hold this key | Say… |\n|---|---|---|\n| Linux | `Space` | *\"the quick brown fox\"* (types it) · *\"go to line 42\"* · *\"run the tests\"* |\n| macOS | `Right Option` | *\"delete the last word\"* · *\"save file\"* · *\"new function parse config\"* |\n| Windows | `Right Ctrl` | *\"undo that\"* · *\"select all\"* · *\"comment this line\"* |\n\nRelease the key — YazSes transcribes and acts within about a second.\n\n\u003e **First time on macOS?** v0 builds are unsigned: right-click the app → Open (Gatekeeper), then grant Accessibility + Microphone when prompted.\n\u003e\n\u003e **First time on Windows?** If SmartScreen warns you, click **More info → Run anyway**.\n\n---\n\n## What you can say\n\nHold the key and just **talk** — by default everything you say is typed at the cursor. YazSes also recognises a set of **voice commands** (a fast regex grammar; an optional ~0.5B SLM router catches phrasings the grammar misses) that map to editor/terminal **key sequences** instead of being typed:\n\n| Say something like… | What happens |\n|---|---|\n| *\"the quick brown fox\"* | Types the text at the cursor (dictation) |\n| *\"delete the last three words\"* | Deletes the last 3 words |\n| *\"undo that\"* / *\"undo five times\"* | Sends undo |\n| *\"save file\"* · *\"copy\"* · *\"paste\"* | Save / copy / paste |\n| *\"select all\"* · *\"select to end\"* | Selection commands |\n| *\"comment this line\"* | Toggles a comment |\n| *\"go to line 42\"* | Jumps to line 42 |\n| *\"go to function parse_config\"* | Jumps to the symbol (via LSP, opt-in) |\n| *\"run the tests\"* / *\"run the build\"* | Runs the editor/terminal action |\n| *\"rename this to user_id\"* | Renames the symbol |\n\nYou can also define multi-step **macros** and a personal **vocabulary** of mis-heard words — see the [CLI reference](docs/cli-reference.md).\n\n---\n\n## How it works\n\n```\nHold hotkey → record audio → VAD gate → faster-whisper (CPU) → clean + disfluency filter\n            → command grammar (Tier 1 regex, optional Tier 2 SLM router)\n            → dictate? type the text   ·   command? send the key sequence\n```\n\nEverything runs on your CPU — no GPU, no network. Transcription uses **faster-whisper** (int8). A fast regex grammar classifies each utterance as dictation or a command; when its confidence is low, an optional ~0.5B SLM router takes a second look. The result appears in the focused window within about a second on a modern laptop.\n\n**Models:**\n- **Speech-to-text:** faster-whisper — `tiny.en` (fast) / `base.en` / `small.en` (more accurate), int8 on CPU\n- **Command routing (optional):** Qwen2.5-0.5B SLM for Tier 2 intent classification — *not* required for dictation, fetched with `yazses model download`\n- **Dictation cleanup (optional, off by default):** a small offline LLM can tidy grammar/punctuation; length- and token-preservation guards stop it rewriting meaning\n\n---\n\n## Requirements\n\n| | |\n|---|---|\n| **OS** | Linux (primary) · macOS 11+ · Windows 10 (21H2)+ |\n| **RAM** | 4 GB minimum · 8 GB comfortable |\n| **Disk** | ~250 MB–1 GB for the faster-whisper model (downloaded on first run) |\n| **CPU** | 2+ cores · no GPU required |\n| **Mic** | Any USB or built-in microphone |\n\n---\n\n## Key features\n\n- **Fully offline** — no audio, no text, nothing leaves the machine by default; no cloud, API key, or subscription\n- **Hold-to-talk dictation** — type into any focused app on Linux, macOS, or Windows\n- **Voice commands** — editor/terminal actions (undo, save, go-to-line, run tests, rename…) via regex grammar + an optional SLM router\n- **Macros \u0026 personal vocabulary** — define multi-step commands and teach YazSes your mis-heard words\n- **Dysfluency-Friendly Mode** — opt-in collapse of stutters/repeats (`b-b-because` → `because`) for stuttered or dysarthric speech\n- **Self-improving** — opt-in, encrypted on-device learning corpus; `yazses tune` proposes accuracy fixes from your own corrections (nothing leaves the machine)\n- **Editor context** — optional Neovim / VS Code LSP context improves accuracy on code identifiers\n- **Accessibility** — VAD calibration wizard, mic-level tuning, and EMG (muscle-sensor) trigger support for motor-disability use\n- **Voice-activity overlay** — optional sonar rings near the cursor while you speak\n\n---\n\n## Limitations / when *not* to use YazSes\n\n- **Not an LLM agent.** YazSes dictates text and runs editor/terminal commands. It does **not** browse, reason over your files, set timers, or hold a conversation — that was the paused Rust exploration (see *Two versions* above).\n- **CPU faster-whisper, not a cloud service.** For the absolute lowest word-error rate on a noisy mic, a cloud STT may still beat it; the trade-off is that nothing leaves your machine.\n- **English-tuned by default.** It ships with `*.en` Whisper models; other languages need a different model.\n- **Desktop only.** No mobile or web build.\n\n---\n\n## CLI commands\n\n| Command | Description |\n|---|---|\n| `yazses start` | Start the YazSes daemon in the background (restarts cleanly if one is already running) |\n| `yazses restart` | Stop all daemons (including detached) and start exactly one |\n| `yazses stop` | Stop the running daemon |\n| `yazses status` | Show daemon status — queries the daemon over IPC when reachable |\n| `yazses doctor` | Check prerequisites (version, daemon, model, mic, injection backend, permissions) |\n| `yazses enroll` | Calibrate your microphone — tunes `vad_threshold` for your voice and room |\n| `yazses mic-level` | Measure mic speech level and recommend (or `--set`) the VAD threshold |\n| `yazses features` | List capabilities and toggle them (`enable`/`disable \u003cname\u003e`) |\n| `yazses vocab` | Personal dictionary of mis-heard words (`add`/`list`/`remove`) |\n| `yazses hotkey` | Show or change the hold-to-talk key (`set`) and the dedicated command key (`command`) |\n| `yazses overlay` | Launch the sonar voice-activity overlay (requires the `overlay` extra) |\n| `yazses inject TEXT` | Type arbitrary text into the focused window — test injection without speaking |\n| `yazses say TEXT` | Speak text aloud (offline TTS) |\n| `yazses test` | End-to-end self-test: focuses a window and types `YazSes OK` |\n| `yazses logs` | Show the daemon diagnostic log (metadata only — no dictated text is stored) |\n| `yazses mark-wrong` | Flag the last dictation as a misrecognition (feeds the learning corpus) |\n| `yazses tune` | Analyse the learning corpus and propose accuracy improvements; `--apply` to write changes |\n| `yazses corpus` | Manage the local learning corpus (`status`, `forget`, `destroy`) |\n| `yazses model` | List or download the optional SLM intent-routing model |\n| `yazses remote HOST` | Forward voice typing to a remote host over SSH |\n\n---\n\n## Configuration\n\nConfig file location:\n\n| OS | Path |\n|---|---|\n| Linux | `~/.config/yazses/config.toml` |\n| macOS | `~/Library/Application Support/yazses/config.toml` |\n| Windows | `%APPDATA%\\yazses\\config.toml` |\n\nPrefer `yazses features` / `yazses hotkey` / `yazses vocab` to edit config safely (they preserve comments). Essential settings:\n\n```toml\n[stt]\nmodel = \"small.en\"          # tiny.en (fast) | base.en | small.en (accurate); CPU int8\ninitial_prompt = \"\"         # vocabulary/context primed into Whisper\n\n[hotkey]\nkey = \"space\"               # hold-to-talk key (yazses hotkey set \u003ckey\u003e)\ncommand_key = \"\"            # optional dedicated key that forces command mode\nhold_threshold_ms = 500     # how long to hold before recording starts\n\n[audio]\nsample_rate = 16000\nmax_record_seconds = 90\n\n[injection]\nbackend = \"auto\"            # auto | xdotool | ydotool | wtype | clipboard\n\n[accessibility]\nvad_threshold = 0.0008      # lower for quiet speech, raise if room noise triggers (yazses mic-level --set)\n```\n\nSee the [CLI reference](docs/cli-reference.md) and [`examples/config.example.toml`](examples/config.example.toml) for all options.\n\n### Microphone not working?\n\nIf YazSes does nothing and the log shows `Silent audio -- discarding`, your speech is below the VAD threshold:\n\n```sh\nyazses mic-level --set   # measure your voice and set the right threshold\nyazses restart\n```\n\n---\n\n## All install options\n\n### Linux\n\n```bash\n# APT script — Debian / Ubuntu (recommended)\nbash \u003c(curl -fsSL https://raw.githubusercontent.com/MSKazemi/yazses/main/install-apt.sh)\n\n# Snap — any distro (strict confinement; keystroke injection works on X11.\n# On Wayland, prefer pipx below for full input access.)\nsudo snap install yazses\n\n# pipx — any distro with Python ≥ 3.11\n# Debian/Ubuntu runtime deps. libportaudio2 = audio capture (required);\n# xdotool/xclip = X11 injection+clipboard; wtype/ydotool/wl-clipboard = Wayland.\n# Installing all of them makes YazSes work on either session type.\nsudo apt install libportaudio2 xdotool ydotool wtype xclip wl-clipboard pipx\nsudo usermod -aG input \"$USER\"   # hotkey access — then log out and back in\npipx install yazses\n```\n\n### macOS\n\n```sh\n# pipx (Python ≥ 3.11)\npipx install yazses\n\n# App bundle (.dmg) — unsigned developer preview\n# https://github.com/MSKazemi/yazses/releases/latest\n```\n\n### Windows\n\n```powershell\n# pipx (Python ≥ 3.11)\npipx install yazses\n\n# Installer (.exe) — unsigned developer preview\n# https://github.com/MSKazemi/yazses/releases/latest\n```\n\n---\n\n## Documentation\n\n| | |\n|---|---|\n| [Install on Linux](docs/install-linux.md) | Detailed Linux guide — permissions, injection backends, service setup |\n| [Install on macOS](docs/macos-install.md) | Gatekeeper, Accessibility, Microphone permissions |\n| [Install on Windows](docs/windows-install.md) | SmartScreen, antivirus exceptions, privacy settings |\n| [CLI reference](docs/cli-reference.md) | All commands and flags (incl. macros \u0026 vocabulary for custom voice commands) |\n| [Privacy statement](docs/privacy-statement.md) | What stays on-device, what is never collected |\n\n---\n\n## Development\n\nYazSes (Part 1) is a Python project managed with `uv`:\n\n```bash\ngit clone https://github.com/MSKazemi/yazses\ncd yazses\nuv sync\nuv run python -m pytest tests/ -v\nbash scripts/install-local.sh        # install locally + run as a user service\n```\n\n### Rust HCI exploration (archived)\n\nThe early-stage Rust rewrite lives on the **`archive/rust-hci-v1`** branch, not on\n`main`. It is not built or installed by anything here — see *Two versions of\nYazSes* above for what it does and doesn't have. To look at it:\n\n```bash\ngit checkout archive/rust-hci-v1\ncargo build \u0026\u0026 cargo test --workspace   # optional backends: whisper, moonshine, llama-cpp, ollama, silero\n```\n\n---\n\n## License\n\nApache 2.0 — see [LICENSE](LICENSE).\n\nIf YazSes is useful to you, a ⭐ on GitHub and a mention in your project, blog, or talk is the best way to support continued development.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnovafabric%2Fyazses","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnovafabric%2Fyazses","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnovafabric%2Fyazses/lists"}