{"id":31646480,"url":"https://github.com/michabbb/omarchy-speech-to-text","last_synced_at":"2026-05-14T21:37:03.973Z","repository":{"id":315252537,"uuid":"1058599467","full_name":"michabbb/omarchy-speech-to-text","owner":"michabbb","description":"Add Speech to Text to your Omarchy (Arch Linux) System","archived":false,"fork":false,"pushed_at":"2025-09-17T13:45:23.000Z","size":12,"stargazers_count":23,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-07T05:57:48.454Z","etag":null,"topics":["ai","archlinux","omarchy","speech2text"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michabbb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-17T09:53:27.000Z","updated_at":"2025-10-07T01:35:55.000Z","dependencies_parsed_at":"2025-09-17T15:45:53.436Z","dependency_job_id":null,"html_url":"https://github.com/michabbb/omarchy-speech-to-text","commit_stats":null,"previous_names":["michabbb/omarchy-speech-to-text"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/michabbb/omarchy-speech-to-text","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michabbb%2Fomarchy-speech-to-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michabbb%2Fomarchy-speech-to-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michabbb%2Fomarchy-speech-to-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michabbb%2Fomarchy-speech-to-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michabbb","download_url":"https://codeload.github.com/michabbb/omarchy-speech-to-text/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michabbb%2Fomarchy-speech-to-text/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33044364,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","archlinux","omarchy","speech2text"],"created_at":"2025-10-07T05:51:09.386Z","updated_at":"2026-05-14T21:37:03.967Z","avatar_url":"https://github.com/michabbb.png","language":"Python","funding_links":[],"categories":["Development Tools"],"sub_categories":[],"readme":"# Offline Speech-to-Text for Arch Linux (Wayland)\n\n\u003e **⚠️ Heads-up**: this project was vibe-coded together with AI helpers (Claude Code and Codex). I am not a Python developer. If you hit issues, please debug with AI or your own expertise, fix them, and send a PR. Treat this repo as a “here’s how it *can* work” manual rather than a guaranteed turnkey solution. It works on my Omarchy (Arch Linux) setup, and I’m sharing the path that got me there.\n\nThis project reproduces the hands-free dictation setup used on Arch Linux with a Wayland compositor. A dedicated key listener records audio while you hold a hotkey and forwards the audio to [Faster Whisper](https://github.com/guillaumekln/faster-whisper). Once transcription finishes, the recognised text is typed into the focused window via [`ydotool`](https://github.com/ReimuNotMoe/ydotool).\n\nThe repository contains ready-to-use Python scripts, configuration templates, and systemd unit files so you can replicate the complete workflow on your own machine.\n\n---\n\n## Features\n\n- **Hold-to-talk workflow** – press and hold a configurable key (e.g., Right Ctrl) to record; release to transcribe and type the text.\n- **Wayland-compatible typing** – uses `ydotool` instead of `xdotool`, so it works on Sway, Hyprland, GNOME, KDE, etc.\n- **Offline transcription** – powered by Faster Whisper running locally on CPU (can be upgraded to GPU if desired).\n- **Systemd integration** – both the key listener and `ydotoold` daemon are managed as services and start automatically after boot.\n\n---\n\n## Repository Layout\n\n```\n.\n├── config.example.py        # Template with all tunable settings\n├── key_listener.py          # Root hotkey listener (records audio, launches STT)\n├── requirements.txt         # Python dependencies\n├── speech_to_text.py        # Faster Whisper transcription + ydotool typing\n├── systemd/\n│   ├── speech-to-text-listener.service  # Service for key_listener.py\n│   └── ydotoold.service                  # ydotool daemon with boot sequencing fix\n└── LICENSE\n```\n\nCopy `config.example.py` to `config.py` and adjust it for your environment before starting the services.\n\n---\n\n## Prerequisites (Arch Linux)\n\n1. **Audio \u0026 input utilities**\n   ```bash\n   sudo pacman -S alsa-utils python-evdev\n   ```\n2. **Wayland automation tools**\n   ```bash\n   sudo pacman -S ydotool\n   ```\n   \u003e `ydotool` lives in the `community` repository. If you are using another distribution, install it from your package manager or build from source.\n3. **Optional key remapping** – if you plan to trigger dictation with a mouse button or unusual key, install a remapper such as `input-remapper` or Sway/Hyprland keybinds.\n4. **Python 3.10+** – required for the virtual environment and Faster Whisper.\n\n\u003e **GPU acceleration (optional):** install CUDA / ROCm drivers and replace the Python dependencies with the GPU build of PyTorch plus `faster-whisper` configured for your accelerator. The README covers CPU-only setup for reliability.\n\n---\n\n## Installation\n\n### 1. Clone the repository\n\n```bash\nsudo mkdir -p /opt\nsudo chown \"$USER\" /opt\ncd /opt\ngit clone https://github.com/omarchy/speech-to-text.git\ncd speech-to-text\n```\n\nFeel free to adjust the target path, but remember to update the systemd unit files accordingly.\n\n### 2. Configure Python environment\n\n```bash\npython -m venv venv\nsource venv/bin/activate\npip install --upgrade pip wheel\npip install -r requirements.txt\n```\n\nThe default `requirements.txt` installs a CPU version of Faster Whisper (`faster-whisper`, `numpy`, `soundfile`, `evdev`).\n\n### 3. Prepare `config.py`\n\n```bash\ncp config.example.py config.py\n```\n\nEdit `config.py` and review every option:\n\n- `TARGET_USER` – the desktop user that owns the Wayland session (receives typed text).\n- `DEVICE_PATH` – the `/dev/input/event*` device that should trigger recording. Use `sudo evtest` to discover the correct device and key codes.\n- `TRIGGER_KEYCODE` – the key code reported by `evtest` while you press the hotkey (default: `KEY_RIGHTCTRL`).\n- `AUDIO_FILE` – temporary WAV file location (default `/tmp/recorded_audio.wav`).\n- `PYTHON_VENV` \u0026 `SPEECH_TO_TEXT_SCRIPT` – paths to the interpreter and transcription script. Defaults assume the project lives in `/opt/speech-to-text`.\n- `WHISPER_MODEL_SIZE` / `WHISPER_COMPUTE_TYPE` – pick another model (e.g. `tiny`, `medium`) or precision if desired.\n- `YDOTOOL_SOCKET` – matches the socket path created by the systemd unit (`/run/user/\u003cuid\u003e/.ydotool_socket`).\n\n### 4. Install systemd units\n\nCopy the service files and adjust them for your UID/GID and project path.\n\n```bash\nsudo install -m 0644 systemd/ydotoold.service /etc/systemd/system/ydotoold.service\nsudo install -m 0644 systemd/speech-to-text-listener.service /etc/systemd/system/speech-to-text-listener.service\n```\n\nEdit `/etc/systemd/system/ydotoold.service`:\n\n- Replace every occurrence of `1000` with your user’s numeric UID and GID (see `id -u`, `id -g`).\n- Update the socket path if you changed it in `config.py`.\n\nEdit `/etc/systemd/system/speech-to-text-listener.service`:\n\n- Update `WorkingDirectory` and `ExecStart` so they match the absolute project path and Python interpreter inside your virtual environment.\n\nReload systemd and enable the services:\n\n```bash\nsudo systemctl daemon-reload\nsudo systemctl enable --now ydotoold.service\nsudo systemctl enable --now speech-to-text-listener.service\n```\n\n### 5. Verify services\n\n- Ensure `ydotoold` created the socket:\n  ```bash\n  ls -l /run/user/\u003cuid\u003e/.ydotool_socket\n  ```\n- Monitor logs:\n  ```bash\n  journalctl -u ydotoold.service -b\n  journalctl -u speech-to-text-listener.service -b\n  ```\n\nThe key listener should log that it is watching `KEY_RIGHTCTRL` (or whichever key you configure) and transitions through recording and transcription when you test it.\n\n---\n\n## How It Works\n\n```\n┌──────────────────────────┐\n│ key_listener.py (root)   │\n│  • watches DEVICE_PATH   │\n│  • starts/stops arecord  │\n│  • calls speech_to_text  │\n└────────────┬─────────────┘\n             │ WAV file\n             ▼\n┌──────────────────────────┐\n│ speech_to_text.py (root) │\n│  • loads Faster Whisper  │\n│  • transcribes segments  │\n│  • uses ydotool type     │\n└────────────┬─────────────┘\n             │ text events via ydotool\n             ▼\n      Active application\n```\n\nKey points:\n\n- `key_listener.py` must run as root to read `/dev/input` and to interact with `sudo -u \u003cuser\u003e arecord`. The actual audio capture happens as the unprivileged desktop user, so PulseAudio/PipeWire routing behaves normally.\n- `speech_to_text.py` runs as root but inherits the user’s runtime environment (`XDG_RUNTIME_DIR`, Wayland display) so `ydotool` can access the compositor socket. The service fixes a boot timing race by ensuring the user runtime directory exists before `ydotoold` starts.\n\n---\n\n## Testing Without systemd\n\nYou can run everything manually before enabling the units:\n\n```bash\nsudo ./venv/bin/python key_listener.py\n```\n\nThen hold the configured hotkey. You should see logs similar to:\n\n```\nINFO: Starting audio recording\nINFO: Recording started with PID ...\nINFO: Stopping audio recording\nINFO: Running speech-to-text\nINFO: Recognised: ...\nINFO: Typed text successfully\n```\n\nIf typing fails, check that `ydotoold` is running and the socket path matches `config.py`.\n\n---\n\n## Troubleshooting\n\n- **`Error: [Errno 19] No such device`** – `DEVICE_PATH` in `config.py` is wrong or the device id changes between boots. Re-run `sudo evtest` and update the path.\n- **`failed to connect socket '/run/user/1000/.ydotool_socket'`** – `ydotoold` did not start or the runtime directory was re-created after boot. Confirm the service uses the modified unit provided here.\n- **`arecord` command fails** – install `alsa-utils` and confirm the microphone works (`arecord -f S16_LE -r 16000 test.wav`).\n- **Whisper model loads slowly** – larger models can take several seconds. Consider the `tiny` or `base` model for faster start, or configure GPU acceleration.\n- **Typing lag** – `ydotool` sends events sequentially. If performance is an issue, experiment with the `ydotool type --delay` flag by modifying `speech_to_text.py`.\n\n---\n\n## Security Notes\n\n- Both services run as root. Restrict access to the repository directory and review the scripts before installing on production machines.\n- `key_listener.py` invokes `sudo -u \u003cTARGET_USER\u003e arecord ...`. Ensure the root account can run `sudo` without prompting (the default for root).\n- The scripts type whatever Faster Whisper recognises. Consider adding keyword filtering if you plan to use it in sensitive contexts.\n\n---\n\n## Extending / Customising\n\n- Change the trigger key by editing the `KEY_RIGHTCTRL` check in `key_listener.py` or remap your preferred key/button to Right Ctrl using `input-remapper` or compositor keybinds.\n- To support multiple hotkeys or languages, extend `speech_to_text.py` to pick models dynamically or to send the text to other applications (e.g., copy to clipboard instead of typing).\n- GPU users can install `torch` + `faster-whisper` with `device=\"cuda\"` in `speech_to_text.py` and adjust `WHISPER_COMPUTE_TYPE` to `float16` for a large speed boost.\n\n---\n\n## License \u0026 Credits\n\nDistributed under the MIT License (see `LICENSE`). The original idea and much of the inspiration comes from [CDNsun’s “Speech-to-Text for Ubuntu” article](https://blog.cdnsun.com/speech-to-text-for-ubuntu/); this repository adapts that work for Arch Linux + Wayland with additional boot-order fixes.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichabbb%2Fomarchy-speech-to-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichabbb%2Fomarchy-speech-to-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichabbb%2Fomarchy-speech-to-text/lists"}