{"id":49553795,"url":"https://github.com/mka-codelake/wispy","last_synced_at":"2026-05-06T21:01:34.074Z","repository":{"id":354999258,"uuid":"1207914294","full_name":"mka-codelake/wispy","owner":"mka-codelake","description":"Minimalist push-to-talk dictation tool for Windows. Faster Whisper, local, offline.","archived":false,"fork":false,"pushed_at":"2026-05-01T12:00:26.000Z","size":74,"stargazers_count":0,"open_issues_count":10,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-01T12:07:52.497Z","etag":null,"topics":["cuda","dictation","faster-whisper","local","offline","portable","push-to-talk","python","speech-to-text","stt","transcription","voice-input","whisper","windows"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mka-codelake.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-11T15:17:53.000Z","updated_at":"2026-05-01T12:00:30.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mka-codelake/wispy","commit_stats":null,"previous_names":["mka-codelake/wispy"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/mka-codelake/wispy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mka-codelake%2Fwispy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mka-codelake%2Fwispy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mka-codelake%2Fwispy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mka-codelake%2Fwispy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mka-codelake","download_url":"https://codeload.github.com/mka-codelake/wispy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mka-codelake%2Fwispy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32711965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T19:35:05.142Z","status":"ssl_error","status_checked_at":"2026-05-06T19:35:03.996Z","response_time":117,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","dictation","faster-whisper","local","offline","portable","push-to-talk","python","speech-to-text","stt","transcription","voice-input","whisper","windows"],"created_at":"2026-05-03T01:03:48.530Z","updated_at":"2026-05-06T21:01:34.068Z","avatar_url":"https://github.com/mka-codelake.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"./etc/logo.svg\" width=\"400\" align=\"right\" alt=\"wispy\"/\u003e\n\n# wispy\n\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](LICENSE)\n[![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue)](https://www.python.org/downloads/)\n\n\u003e [!NOTE]\n\u003e **wispy** is in Beta. Configuration format and command-line options may change between minor versions.\n\nMinimalist push-to-talk dictation tool for Windows. Press a hotkey, speak, release -- the text appears wherever your cursor is (Notepad, browser, VS Code, anywhere). Fully local, no cloud, no subscription.\n\n- **Backend:** [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2/CUDA), model `large-v3-turbo`\n- **Language:** German (configurable via config)\n- **Footprint:** ~250 LOC, 6 direct dependencies, no GUI\n\n## Overview\n\nwispy solves a specific problem: speech input without cloud dependency, without privacy concerns, and without latency from network round-trips. If you dictate a lot and have an NVIDIA GPU, wispy provides an offline solution that transcribes faster and more accurately than most online services -- and not a single syllable ever leaves your local machine.\n\nwispy is designed as a personal productivity tool. There is no GUI, no tray app, no cloud integration. It runs as a console process in the background and waits for a hotkey.\n\n**Who is it for?** Windows users with an NVIDIA GPU who want to dictate offline and without subscription costs -- in any application that accepts keyboard input.\n\n## Features\n\n- **Push-to-Talk or Toggle** -- Start and stop recording via hotkey, freely configurable (`hold` or `toggle` mode)\n- **Fully offline** -- Transcription runs entirely locally via `faster-whisper` (CTranslate2/CUDA), no network access after the initial model download\n- **Works everywhere** -- Text output via clipboard paste (Ctrl+V simulation), compatible with any Windows application including umlauts and special characters\n- **Clipboard protection** -- Previous clipboard content is automatically restored after pasting\n- **Multilingual** -- Language configurable via ISO code in `config.yaml` (`de`, `en`, `fr`, ...)\n- **Audio feedback** -- Beep tones signal recording start (800 Hz) and end (400 Hz) without screen distraction\n- **Portable build** -- PyInstaller bundle (`build/build.ps1`) produces a self-contained `dist/wispy/` directory including CUDA DLLs; no Python required on the target machine\n- **Flexible model path** -- Model is stored by default next to the source code in `models/`; freely configurable via `model_path` in `config.yaml`\n\n---\n\n## Requirements\n\n| | |\n|---|---|\n| **Operating System** | Windows 10/11 **native** -- not WSL2 (due to microphone, hotkey, and keyboard simulation requirements) |\n| **Python** | 3.10, 3.11, or 3.12 |\n| **GPU** | NVIDIA GPU with ~3 GB free VRAM (for `large-v3-turbo` + `float16`) |\n| **CUDA Toolkit** | **Version 12.x -- NOT 13.x.** `faster-whisper` uses `CTranslate2`, which currently only supports CUDA 12 (with cuDNN 9 -\u003e CUDA \u003e= 12.3). Recommended: **CUDA 12.9.1** (latest 12.x series, June 2025) or 12.6/12.8. Install manually -- it includes `cudart`, `cuBLAS`, and `cuDNN`, which `faster-whisper` needs at runtime. **Direct download (Windows x86_64):** [cuda_12.9.1_576.57_windows.exe](https://developer.download.nvidia.com/compute/cuda/12.9.1/local_installers/cuda_12.9.1_576.57_windows.exe) (~3.56 GB) or the archive page [developer.nvidia.com/cuda-12-9-1-download-archive](https://developer.nvidia.com/cuda-12-9-1-download-archive). A selection of all 12.x versions can be found in the [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive). |\n| **Admin rights** | Not required. The hotkey works in all normal apps (browsers, editors, Office, terminals). Only elevated foreground windows like Task Manager or `regedit` are not observable without admin -- start wispy as Administrator manually if you need that. |\n| **Microphone** | Check privacy settings: *Settings -\u003e Privacy -\u003e Microphone -\u003e Allow desktop apps* |\n| **Disk space** | ~4 GB (model ~1.5 GB + venv + dependencies) |\n\n---\n\n## Setup\n\n```powershell\n# 1. Clone the repo\ncd C:\\path\\to\\wispy\n\n# 2. Create and activate a venv\npython -m venv .venv\n.\\.venv\\Scripts\\activate\n\n# 3. Install wispy as an editable package (pulls all dependencies from pyproject.toml)\npip install -e .\n\n# 4. First run (downloads the model ~1.6 GB on first launch)\npython -m wispy\n```\n\n\u003e On the **first** launch, `src/wispy/model_fetch.py` downloads the `large-v3-turbo` model (~1.6 GB) via `huggingface_hub.snapshot_download` directly into `\u003crepo-root\u003e\\models\\large-v3-turbo\\`. No HuggingFace cache in your user profile -- the model sits next to the source code and moves with it if you relocate the folder. The target path is determined by `src/wispy/paths.py::resolve_model_path`; you can set a custom directory via `model_path` in `config.yaml`.\n\n---\n\n## Portable Build (optional)\n\nIf you don't want a Python installation on the target machine, you can build wispy as a portable one-folder bundle. The script `build/build.ps1` invokes PyInstaller with `build/wispy.spec` and produces `dist/wispy/` with `wispy.exe` plus an `_internal/` directory.\n\n```powershell\n# In the repo root, in a PowerShell:\n.\\build\\build.ps1\n```\n\nThe resulting `dist/wispy/` folder is self-contained:\n\n- **No CUDA Toolkit** required on the target machine -- the bundle includes `cudart64_12.dll`, `cublas64_12.dll`, and `cudnn_*.dll` in `_internal/`. Only a current NVIDIA driver is needed (for `nvcuda.dll` and the kernel module, which must come from the system).\n- **No installer.** Copy the folder, run `wispy.exe`, done.\n- **Portable.** The folder can be moved to a USB drive or another machine; the already downloaded model travels along in `models/`.\n\nEnd-user documentation for the bundle is located in `build/README.txt` and is copied by PyInstaller to `dist/wispy/README.txt`.\n\n---\n\n## Controls\n\n**Default mode: Hold (Push-to-Talk)**\n\n| Action | Key press |\n|---|---|\n| Start recording | **Hold F9** -\u003e Beep 800 Hz |\n| Stop recording + transcribe | Release F9 -\u003e Beep 400 Hz -\u003e Text is inserted at cursor |\n| Recording discarded | Released in \u003c 0.3 s -\u003e `(too short, skipped)` in console |\n| Quit wispy | **Ctrl+C** in the console window |\n\n**Toggle mode** (set in `config.yaml`: `record_mode: toggle`)\n\n| Action | Key press |\n|---|---|\n| Start recording | Press F9 once |\n| Stop recording + transcribe | Press F9 again |\n\nText is inserted via clipboard + simulated Ctrl+V -- works in any application, including umlauts and special characters. The previous clipboard content is restored after insertion (can be disabled via `restore_clipboard: false`).\n\n---\n\n## Configuration\n\nAll settings are in `config.yaml`. The most important ones:\n\n```yaml\nhotkey: \"F9\"              # Any key -- \"F9\", \"F12\", \"ctrl+space\", ...\nrecord_mode: \"hold\"       # \"hold\" or \"toggle\"\nlanguage: \"de\"            # ISO code -- \"de\", \"en\", \"fr\", ...\nmodel_name: \"large-v3-turbo\"   # Also: \"small\", \"medium\", \"large-v3\"\ndevice: \"cuda\"            # \"cuda\" or \"cpu\"\ncompute_type: \"float16\"   # \"float16\" (GPU) / \"int8\" (CPU)\naudio_device: null        # null = default microphone, otherwise index\nrestore_clipboard: true   # Restore old clipboard content after insertion\n```\n\nLoad a custom config:\n\n```powershell\npython -m wispy --config C:\\path\\to\\my-config.yaml\n```\n\n---\n\n## Vocabulary (Hotwords)\n\nWhisper sometimes mis-transcribes technical terms, file names, or proper names (e.g. `wispy` → `Whispy`, `.gitignore` → `Gitignore`). The vocabulary file lets you bias the model towards recognising specific terms correctly.\n\n**Location:** `hotwords.txt` next to `config.yaml` (same folder as `wispy.exe` or the repo root when running from source).\n\n**Format:** plain text, one term per line. Lines starting with `#` and blank lines are ignored.\n\n```text\n# wispy vocabulary\nwispy\n.gitignore\npyproject.toml\nMyCompanyName\n```\n\n**How it works:** Terms are passed to `faster-whisper`'s `hotwords` parameter on every transcription call. This is a soft bias — it makes the model *prefer* these spellings but does not guarantee them. For hard replacements, a post-processing step is planned separately.\n\n**Hot-reload:** Not supported. Restart wispy after editing `hotwords.txt`.\n\n**Startup feedback:** wispy prints the number of loaded terms at startup:\n```\n[wispy] vocabulary  = 3 term(s) loaded\n```\n\n---\n\n## Project Structure\n\n```\nwispy/\n├── src/wispy/\n│   ├── __init__.py       # Package marker, __version__\n│   ├── __main__.py       # Entry point for `python -m wispy`\n│   ├── main.py           # Main loop, orchestration\n│   ├── audio.py          # Microphone recording (sounddevice/PortAudio)\n│   ├── transcribe.py     # Whisper model loading and transcription\n│   ├── hotkey.py         # Global hotkey listener (pynput, hold + toggle)\n│   ├── hotkey_match.py   # Pure hotkey parser + match state machine\n│   ├── output.py         # Text output via clipboard paste\n│   ├── feedback.py       # Beep sounds (winsound)\n│   ├── config.py         # Config dataclass + YAML loader\n│   ├── paths.py          # Model path resolution (src-aware + frozen)\n│   └── model_fetch.py    # First-run download via HuggingFace Hub\n├── build/\n│   ├── build.ps1         # Portable build script (uv + PyInstaller)\n│   ├── wispy.spec        # PyInstaller spec\n│   └── README.txt        # End-user documentation for the bundle\n├── etc/\n│   └── logo.svg          # Project logo\n├── config.yaml           # Default configuration\n├── hotwords.txt          # Vocabulary list for transcription biasing (hotwords)\n└── pyproject.toml        # Package metadata and dependencies\n```\n\nInternal imports use relative imports (`from .audio import Recorder`). Exception: `__main__.py` uses an absolute import so that PyInstaller can correctly load the entry script as top-level.\n\n---\n\n## Auto-Update\n\nwispy checks for updates in the background on every start and notifies you if a newer release is available. It never downloads anything without your explicit consent.\n\n### How the update flow works\n\n1. **Version check (automatic):** At every start, wispy queries the GitHub release API in a background thread. Dictation is immediately ready — the check does not block startup. If a newer version is available, a message appears in the console:\n   ```\n   [update] Update available: v0.2.0 -\u003e v0.3.0\n   [update] To download, start wispy again with --update\n   ```\n\n2. **Download (explicit, with `--update`):** When you want to fetch the new version, start wispy once with `--update`:\n   ```powershell\n   wispy.exe --update\n   ```\n   The release ZIP is downloaded to `update-staging/` next to `wispy.exe`. Dictation works normally for the rest of that session.\n\n3. **Apply on next normal start (automatic):** On the next regular start (without `--update`), wispy detects the staged ZIP, unpacks it, and launches a PowerShell helper script that performs the swap while wispy is not running. The new version then starts automatically.\n\n### Protected files — never touched during an update\n\nThe following files and folders are always excluded from the swap:\n\n| Path | What it contains |\n|---|---|\n| `config.yaml` | Your configuration |\n| `models/` | Downloaded Whisper model (~1.6 GB) |\n| `hotwords.txt` | Your vocabulary list |\n\n### Disable update check\n\nSet `update_check: false` in `config.yaml`:\n\n```yaml\nupdate_check: false\n```\n\nWhen disabled, wispy performs no background check at startup, no staging, no swap, and `--update` has no effect (displays a message instead).\n\n### Authentication (optional)\n\nIf the repository is private or you hit GitHub's anonymous rate limit, set the `GITHUB_TOKEN` environment variable. wispy uses it automatically as a Bearer token for all API and download requests.\n\n---\n\n## Contributing\n\nContributions are welcome. Please read [CONTRIBUTING.md](CONTRIBUTING.md) first for guidelines on branching, commit conventions, and the pull request process.\n\n---\n\n## Troubleshooting\n\n| Symptom | Cause | Solution |\n|---|---|---|\n| `Could not load library cudnn_*.dll` / `cublas64_*.dll` | CUDA Toolkit missing, is **version 13.x** (incompatible with CTranslate2), or not in PATH | Install CUDA Toolkit **12.x** (recommended: 12.9.1), then restart the console |\n| Hotkey does not respond in Task Manager / regedit / other elevated apps | UIPI blocks low-level hooks from observing input directed at elevated windows | Start wispy as Administrator manually if you need to dictate into elevated apps. Normal apps (browsers, editors, Office) work without admin. |\n| `Failed to query device 0` / no audio | No microphone detected or permission missing | Check Windows privacy settings, try a different `audio_device` in config |\n| `(too short, skipped)` on every press | Hotkey held too briefly (\u003c 0.3 s) | Hold longer or reduce `MIN_DURATION_SEC` in `src/wispy/main.py` |\n| First transcription takes very long | Model is being downloaded (~1.5 GB) | One-time process, cached afterwards |\n| Transcription wrong or empty | Wrong language, poor microphone signal, speaking too quietly | Check `language` in config, move closer to the microphone |\n\n---\n\n## License\n\nCopyright 2026 Michael Kagel\n\nwispy is free software and licensed under the **GNU General Public License v3.0 or (at your option) any later version**. See [LICENSE](LICENSE) for the full license text.\n\nwispy is distributed in the hope that it will be useful, but **WITHOUT ANY WARRANTY**; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\nContributions are also licensed under GPL v3 -- details in [CONTRIBUTING.md](CONTRIBUTING.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmka-codelake%2Fwispy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmka-codelake%2Fwispy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmka-codelake%2Fwispy/lists"}