{"id":50626944,"url":"https://github.com/matthewjhunter/dicta","last_synced_at":"2026-06-27T07:00:25.069Z","repository":{"id":358378102,"uuid":"1241174598","full_name":"matthewjhunter/dicta","owner":"matthewjhunter","description":"Linux/Wayland voice dictation daemon in pure Go. Single-key activation, Wyoming/whisper.cpp/OpenAI backends, no PTT, no wakeword.","archived":false,"fork":false,"pushed_at":"2026-06-04T09:25:21.000Z","size":350,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-06T16:22:45.206Z","etag":null,"topics":["accessibility","golang","linux","pipewire","speech-to-text","voice-dictation","wayland","wyoming-protocol"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matthewjhunter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-17T03:40:29.000Z","updated_at":"2026-06-04T09:25:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/matthewjhunter/dicta","commit_stats":null,"previous_names":["matthewjhunter/dicta"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/matthewjhunter/dicta","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matthewjhunter%2Fdicta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matthewjhunter%2Fdicta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matthewjhunter%2Fdicta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matthewjhunter%2Fdicta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matthewjhunter","download_url":"https://codeload.github.com/matthewjhunter/dicta/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matthewjhunter%2Fdicta/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34844346,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-27T02:00:06.362Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","golang","linux","pipewire","speech-to-text","voice-dictation","wayland","wyoming-protocol"],"created_at":"2026-06-06T16:01:53.369Z","updated_at":"2026-06-27T07:00:25.063Z","avatar_url":"https://github.com/matthewjhunter.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dicta\n\n[![CI](https://github.com/matthewjhunter/dicta/actions/workflows/ci.yml/badge.svg)](https://github.com/matthewjhunter/dicta/actions/workflows/ci.yml)\n[![Go Reference](https://pkg.go.dev/badge/github.com/matthewjhunter/dicta.svg)](https://pkg.go.dev/github.com/matthewjhunter/dicta)\n[![Go Report Card](https://goreportcard.com/badge/github.com/matthewjhunter/dicta)](https://goreportcard.com/report/github.com/matthewjhunter/dicta)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n\nA Linux/Wayland-first voice dictation daemon written in pure Go.\n\ndicta is two things:\n\n1. **Type-mode** -- press **Pause**, talk, and the daemon types the\n   transcribed text into whatever window has focus, committing each\n   utterance on VAD silence. Press **Pause** again to stop.\n2. **Clip-mode** -- press **Scroll Lock**, talk, and a small editable\n   panel appears with the cleaned transcript. Press **Enter** to copy\n   the buffer to the clipboard, **Shift+Enter** to insert a newline,\n   **Esc** to cancel.\n\nThere is no PTT, no wakeword, no always-on listening. Capture starts\nwhen you press a key and stops when the session ends.\n\n## Status\n\nPre-1.0; latest tag is `v0.1.1`. The full v1 build (phases 1-13 of the\ndesign) is functional. Use it, but expect rough edges and please file\nissues.\n\n## Why\n\nSpeech-to-text is one of the few accessibility tools where Linux still\nhas gaps. Existing options either depend on commercial cloud APIs,\nrequire Python toolchains and GPU model files, or assume X11. dicta is\na single static Go binary that:\n\n- Runs anywhere Wayland and PipeWire run.\n- Talks to any Wyoming-protocol ASR server (faster-whisper et al.) by\n  default -- no model download in v1.\n- Optionally talks to a local `whisper-server` (subprocess-managed),\n  or any OpenAI-compatible transcription endpoint.\n- Optionally cleans transcripts with any OpenAI-compatible LLM\n  (llama.cpp's server, vLLM, OpenAI itself).\n\n## Architecture in one diagram\n\n```\n   ┌──────────────┐    ┌──────────────┐    ┌────────────────┐    ┌──────────────┐\n   │   Pause /    │ →  │    dictad    │ →  │   asrclient    │ →  │  Wyoming /   │\n   │ Scroll Lock  │    │              │    │  (Go module)   │    │  whispercpp/ │\n   │ (compositor) │    │  audio + VAD │ ←  │                │ ←  │    OpenAI    │\n   └──────────────┘    │  state mach. │    └────────────────┘    └──────────────┘\n                       │  control sock│\n                       │              │    ┌──────────────┐\n                       │              │ →  │   ydotool    │  (type-mode)\n                       │              │    └──────────────┘\n                       │              │    ┌──────────────┐    ┌──────────────┐\n                       │              │ ↔  │ dicta-preview│ →  │   wl-copy    │  (clip-mode)\n                       │              │    │   (Gio UI)   │    └──────────────┘\n                       └──────────────┘    └──────────────┘\n```\n\n`dictad` is the daemon (long-lived). `dicta` is a thin CLI that talks\nto the daemon over a Unix socket. `dicta-preview` is the clip-mode\npanel, spawned on demand. ydotoold and the ASR backend are external.\n\n## Quick start\n\n### 1. Install build deps\n\n```sh\n# Ubuntu / Debian\nsudo ./scripts/install-deps-ubuntu.sh\n\n# Fedora\nsudo ./scripts/install-deps-fedora.sh\n\n# Arch\nsudo ./scripts/install-deps-arch.sh\n```\n\nThese install: Go 1.25+, the Gio system libraries (Wayland, xkbcommon,\nGLES, EGL, libvulkan, libXcursor) for the preview panel, ydotool, and\nwl-clipboard.\n\n### 2. Build everything\n\n```sh\ntask build:all\n```\n\nProduces `bin/dictad`, `bin/dicta`, and `bin/dicta-preview`.\n\n### 3. Install into your home directory\n\n```sh\ntask install:user\n```\n\nInstalls to `~/.local/bin` and drops the systemd user unit into\n`~/.config/systemd/user/`.\n\n### 4. Bring up an ASR backend\n\nThe default backend is Wyoming. You can run any Wyoming-compatible\nservice -- most users want\n[wyoming-faster-whisper](https://github.com/rhasspy/wyoming-faster-whisper).\nA common setup is its Docker image listening on `tcp://localhost:10300`.\n\nOther backends:\n\n- `--asr-backend whispercpp` -- dicta supervises a local\n  `whisper-server` subprocess. Requires you to install\n  `whisper.cpp/whisper-server` and a model.\n- `--asr-backend openai` -- point at any OpenAI-compatible\n  `/v1/audio/transcriptions` endpoint. Requires an API key.\n\nSee [CONFIGURATION.md](CONFIGURATION.md) for every flag.\n\n### 5. Configure flags\n\n```sh\nsystemctl --user edit dictad.service\n```\n\n```ini\n[Service]\nExecStart=\nExecStart=%h/.local/bin/dictad \\\n    --asr-backend wyoming \\\n    --asr-wyoming-addr tcp://localhost:10300 \\\n    --preview-binary %h/.local/bin/dicta-preview\n```\n\n### 6. Enable and start\n\n```sh\nsystemctl --user enable --now dictad.service\njournalctl --user -u dictad.service -f\n```\n\n### 7. Bind compositor shortcuts\n\n| Key | What it does | Command |\n|-----|--------------|---------|\n| Pause | Toggle type-mode session | `dicta toggle_talk --mode type` |\n| Scroll Lock | Toggle clip-mode panel | `dicta toggle_talk --mode clip` |\n\nOn GNOME, the bindings are scripted -- `task install:keybindings` (or\n`scripts/setup-keybindings-gnome.sh` directly) sets both via `gsettings`,\nidempotently, preserving any other custom keybindings. It bypasses the\nSettings GUI, which nudges you toward chord shortcuts, so you keep the\nunmodified single keys; each binding also wraps `systemctl --user start\ndictad` so the daemon auto-launches on first press. Re-run with\n`--uninstall` to remove. For Sway/Hyprland/KDE, bind in the compositor\nconfig (see [CONFIGURATION.md](CONFIGURATION.md)).\n\n## Heads-up: ydotoold needs a tweak for type-mode\n\nType-mode drives `ydotool`, which talks to a long-running `ydotoold`\nuser daemon. Out of the box, `ydotoold` leaks accept'd client sockets\nand wedges in roughly a week of normal use -- typing silently stops\nworking (audio still captures, transcripts still land in the audit log\nif enabled). Tracked upstream; the workaround is two example unit\nfiles plus a daily restart timer.\n\nSee [packaging/systemd/README.md](packaging/systemd/README.md#ydotoold-fd-leak-workaround).\nA one-time `systemctl --user restart ydotoold.service` unsticks an\nalready-wedged daemon; the timer prevents recurrence.\n\n## Optional: LLM cleanup\n\nOff by default. To enable in clip-mode (the preview panel will display\ncleaned text the user can still edit before pressing Enter):\n\n```ini\nExecStart=%h/.local/bin/dictad \\\n    ... \\\n    --cleanup-enabled \\\n    --cleanup-endpoint http://my-llama-server.lan:8080/v1 \\\n    --cleanup-model qwen3-7b-instruct\n```\n\nThe mechanical system prompt is a code constant (cannot be templated\nby user input). Cleanup is **only** invoked in clip-mode; type-mode\nalways sends the raw transcript to ydotool.\n\n## Optional: audit log (debug mode)\n\nOff by default. JSONL transcripts (and optionally WAV captures) under\n`$XDG_DATA_HOME/dicta/YYYY-MM-DD/`:\n\n```ini\nExecStart=%h/.local/bin/dictad \\\n    ... \\\n    --audit-enabled \\\n    --audit-keep-audio \\\n    --audit-retention-days 7\n```\n\nBoth `--audit-enabled` and `--audit-keep-audio` are required to capture\naudio. Both default off because both are sensitive by definition.\n\n## Hotkey philosophy\n\nv1 ships exactly two compositor bindings (D17 in the design doc): Pause\nfor type-mode, Scroll Lock for clip-mode. There is no global commit or\ncancel hotkey -- clip-mode commits via panel-local Enter and type-mode\ncommits per-utterance via VAD silence. PTT (push-to-talk) and wakeword\nare **out of scope for v1** and are tracked in §14 of the design doc.\n\n## Documentation\n\n- [dicta-design.md](dicta-design.md) -- the design spec (v0.2). Read\n  this before opening a non-trivial PR.\n- [CONFIGURATION.md](CONFIGURATION.md) -- every flag.\n- [SECURITY.md](SECURITY.md) -- security model and the code paths that\n  enforce it.\n- [packaging/systemd/README.md](packaging/systemd/README.md) -- systemd\n  unit install and override patterns.\n\n## Building from source (no Taskfile)\n\n```sh\n# Daemon + CLI (pure Go, static)\nCGO_ENABLED=0 go build -o bin/dictad ./cmd/dictad\nCGO_ENABLED=0 go build -o bin/dicta ./cmd/dicta\n\n# Preview panel (CGo, Wayland)\ngo build -tags nox11 -o bin/dicta-preview ./cmd/dicta-preview\n```\n\nThe daemon and CLI MUST build with `CGO_ENABLED=0` (D13). The\n`MemoryDenyWriteExecute=true` flag in the systemd unit relies on this.\n\n## Testing\n\n```sh\ntask test       # unit tests\ntask test:race  # with race detector + goleak\ntask vet        # go vet\ntask check      # all of the above\n```\n\n`internal/control` ships a fuzz target for the wire-protocol parser:\n\n```sh\ngo test -fuzz=FuzzCommandUnmarshal -fuzztime=1m ./internal/control\n```\n\n## Contributing\n\nThe design doc's §13 lists the open decision points; everything else\nis locked. If you want to change a locked decision, file an issue\nexplaining why before writing code -- these were deliberate.\n\nBugs, typos, packaging contributions: PRs welcome.\n\n## License\n\nApache-2.0 -- see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatthewjhunter%2Fdicta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatthewjhunter%2Fdicta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatthewjhunter%2Fdicta/lists"}