{"id":50573203,"url":"https://github.com/rtfirst/voice-to-text","last_synced_at":"2026-06-04T20:01:02.087Z","repository":{"id":349489324,"uuid":"1202519511","full_name":"rtfirst/voice-to-text","owner":"rtfirst","description":"Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.","archived":false,"fork":false,"pushed_at":"2026-04-06T06:28:49.000Z","size":40,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-06T08:42:44.662Z","etag":null,"topics":["cuda","macos","push-to-talk","python","speech-to-text","voice-input","whisper","windows"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rtfirst.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-06T05:28:47.000Z","updated_at":"2026-04-06T06:28:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rtfirst/voice-to-text","commit_stats":null,"previous_names":["rtfirst/voice-to-text"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/rtfirst/voice-to-text","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtfirst%2Fvoice-to-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtfirst%2Fvoice-to-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtfirst%2Fvoice-to-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtfirst%2Fvoice-to-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rtfirst","download_url":"https://codeload.github.com/rtfirst/voice-to-text/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtfirst%2Fvoice-to-text/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33917184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-04T02:00:06.755Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","macos","push-to-talk","python","speech-to-text","voice-input","whisper","windows"],"created_at":"2026-06-04T20:01:01.163Z","updated_at":"2026-06-04T20:01:02.080Z","avatar_url":"https://github.com/rtfirst.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Voice-to-Text\n\nA cross-platform background application for speech input via Push-to-Talk. Transcribes locally using OpenAI Whisper (GPU-accelerated) and optionally corrects text via the Anthropic API (Haiku). The transcribed text is automatically pasted into the active application — editor, browser, or terminal.\n\nSupports **Windows 11** and **macOS**.\n\n## Features\n\n- **Push-to-Talk** — hold a configurable hotkey to record, release to transcribe and paste\n- **Local transcription** — runs Whisper on your GPU, no cloud dependency for speech-to-text\n- **Optional AI correction** — fixes grammar and punctuation via Anthropic API\n- **Live VU meter** — pill-shaped overlay at the bottom of the screen reacts to your voice\n- **Smart paste** — auto-detects window type and uses the appropriate paste method\n- **Configurable via tray menu** — hotkey, model size, language, auto-correction\n- **Persistent settings** — saved to `settings.json`, survives restarts\n- **Cross-platform** — Windows (CUDA) and macOS (MPS / CPU)\n\n## Requirements\n\n- Python 3.13+\n- [OpenAI Whisper](https://github.com/openai/whisper) with PyTorch\n\n### Windows\n- NVIDIA GPU with CUDA support (tested with RTX 3070, 8 GB VRAM)\n- PyTorch with CUDA\n- Packages: `pywin32`\n\n### macOS\n- Apple Silicon (MPS acceleration) or Intel (CPU fallback)\n- PyTorch with MPS support (macOS 12.3+)\n- Accessibility permissions required (System Settings → Privacy \u0026 Security → Accessibility)\n\n### Both platforms\n- `openai-whisper`, `anthropic`, `pystray`, `Pillow`, `numpy`, `sounddevice`\n\n## Installation\n\n```bash\npip install sounddevice\n```\n\nAll other dependencies should already be present (PyTorch, Whisper, Anthropic SDK, etc.).\n\n### macOS additional setup\n\nGrant Accessibility permissions to your terminal or Python to allow global hotkey detection and keystroke simulation:\n\n**System Settings → Privacy \u0026 Security → Accessibility** → add Terminal / iTerm2 / Python\n\n### API Key\n\nFor optional text correction, an Anthropic API key is required. Create a `.env` file in the project root:\n\n```\nANTHROPIC_API_KEY=sk-ant-...\n```\n\n## Usage\n\n```bash\n# With console output (for debugging)\npython main.py\n\n# Without console window (Windows only)\npythonw main.py\n```\n\n| Action | Description |\n|--------|-------------|\n| **Hold hotkey** | Start recording (VU meter reacts live) |\n| **Release hotkey** | Stop recording, transcribe, paste into active app |\n| **Right-click tray icon** | Settings and quit |\n\n### VU Meter Overlay\n\n| State | Display |\n|-------|---------|\n| Idle | Dark segments, semi-transparent |\n| Recording | Live level: green → yellow → red |\n| Transcribing | All segments yellow |\n| Done | All segments green (briefly) |\n\n## Settings (Tray Menu)\n\n- **Hotkey** — Ctrl+Space, Alt+Space, Ctrl+F9, F13, and more\n- **Model** — Whisper model size: tiny, small, medium, large-v3, turbo\n- **Language** — auto, Deutsch, English\n- **Auto-Correction** — text correction via Anthropic API on/off\n\nSettings are saved to `settings.json` and persist across restarts.\n\n## GPU Acceleration\n\nThe application auto-detects the best available compute device:\n\n| Platform | Device | Notes |\n|----------|--------|-------|\n| Windows + NVIDIA | CUDA | Best performance |\n| macOS Apple Silicon | MPS | Good performance on M1/M2/M3 |\n| Any | CPU | Fallback, slower |\n\n## Autostart\n\n```bash\n# Enable\npython setup_autostart.py\n\n# Disable\npython setup_autostart.py --disable\n```\n\n- **Windows**: adds a registry entry under `HKCU\\...\\Run`\n- **macOS**: creates a LaunchAgent plist in `~/Library/LaunchAgents/`\n\n## Project Structure\n\n```\nvoice-to-text/\n  main.py                                Entry point\n  setup_autostart.py                     Autostart management (Windows + macOS)\n  requirements.txt\n  src/\n    voice_to_text/\n      __init__.py\n      __main__.py                        python -m voice_to_text\n      app.py                             Main orchestration\n      audio.py                           Microphone recording (sounddevice)\n      config.py                          Configuration + persistent settings\n      hotkey.py                          Hotkey dispatcher\n      overlay.py                         VU meter overlay (tkinter)\n      paste.py                           Paste dispatcher\n      transcriber.py                     Whisper + Anthropic API\n      tray.py                            System tray menu (pystray)\n      platform/\n        __init__.py                      Platform detection\n        hotkey_win.py                    Windows: GetAsyncKeyState polling\n        hotkey_mac.py                    macOS: Quartz event tap\n        paste_win.py                     Windows: win32 clipboard + SendInput\n        paste_mac.py                     macOS: pbcopy + osascript\n        overlay_win.py                   Windows: Win32 layered window\n        overlay_mac.py                   macOS: tkinter alpha\n```\n\n## Paste Detection (Windows)\n\nOn Windows, the tool detects the foreground window type:\n\n- **Standard apps** (editor, browser, IDE) → Ctrl+V\n- **Windows Terminal** → Ctrl+Shift+V\n- **mintty / Git Bash** → Shift+Insert\n- **Legacy cmd.exe** → Shift+Insert\n\nOn macOS, Cmd+V is used universally.\n\n## License\n\n[MIT](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frtfirst%2Fvoice-to-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frtfirst%2Fvoice-to-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frtfirst%2Fvoice-to-text/lists"}