{"id":50468178,"url":"https://github.com/git-blame-dev/vox-golem","last_synced_at":"2026-06-01T09:00:35.575Z","repository":{"id":351783685,"uuid":"1212470783","full_name":"git-blame-dev/vox-golem","owner":"git-blame-dev","description":"🎙️Local AI voice assistant for hands-free coding workflows with wake-word capture, local transcription, and configurable coding backends.","archived":false,"fork":false,"pushed_at":"2026-06-01T07:10:32.000Z","size":3564,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T09:00:15.430Z","etag":null,"topics":["bun","eslint","llama-cpp","opencode","react","rust","tauri","typescript","vite","vitest"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/git-blame-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-16T12:13:26.000Z","updated_at":"2026-06-01T06:52:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/git-blame-dev/vox-golem","commit_stats":null,"previous_names":["git-blame-dev/vox-golem"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/git-blame-dev/vox-golem","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/git-blame-dev%2Fvox-golem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/git-blame-dev%2Fvox-golem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/git-blame-dev%2Fvox-golem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/git-blame-dev%2Fvox-golem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/git-blame-dev","download_url":"https://codeload.github.com/git-blame-dev/vox-golem/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/git-blame-dev%2Fvox-golem/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33767437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bun","eslint","llama-cpp","opencode","react","rust","tauri","typescript","vite","vitest"],"created_at":"2026-06-01T09:00:16.426Z","updated_at":"2026-06-01T09:00:35.570Z","avatar_url":"https://github.com/git-blame-dev.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎙️ Vox Golem\n\n[![CI](https://github.com/git-blame-dev/vox-golem/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/git-blame-dev/vox-golem/actions/workflows/ci.yml)\n[![Latest release](https://img.shields.io/github/v/release/git-blame-dev/vox-golem?label=release)](https://github.com/git-blame-dev/vox-golem/releases/latest)\n\nA Windows-only local AI voice assistant for hands-free coding workflows with wake-word capture, local transcription, and configurable coding backends.\n\n![Vox Golem desktop voice assistant showing transcript, response state, and command output](docs/assets/demo.gif)\n\n## 🔎 Overview\n\nVox Golem is a desktop voice assistant for developers who want to drive coding tasks without leaving their editor flow. It combines wake-word listening, silence-based utterance capture, local speech-to-text, and a chat-style command transcript in a Windows Tauri app.\n\nThe app is built around local components and explicit runtime configuration. Backends can be selected through `%APPDATA%\\VoxGolem\\config.toml`, including an `opencode` command path or a local `llama.cpp` server profile.\n\n## ✨ Features\n\n- Wake-word voice capture with automatic stop after silence.\n- Typed prompt fallback in the same chat-style interface.\n- Transcript, response state, and command output displayed in one desktop UI.\n- Configurable response backend with command execution or local `llama.cpp` profiles.\n- Local model/profile switching for fast and quality response modes when configured.\n- Optional local text-to-speech configuration for spoken responses.\n\n## 🛠️ Tech Stack\n\n- **Desktop shell:** Tauri 2 Windows app.\n- **Frontend/tooling:** React 19, TypeScript, Vite, Bun, Vitest, ESLint.\n- **Rust core:** Rust workspace for Tauri commands, audio/model/platform crates, and local process orchestration.\n- **Voice pipeline:** wake-word detection, voice activity detection, local Parakeet transcription, and app-managed capture state.\n- **Local integrations:** `opencode` command execution and `llama.cpp` server profiles.\n- **CI / release:** Linux-hosted GitHub Actions for checks, Windows cross-build artifacts, and release publishing.\n\n## 🧠 Engineering Highlights\n\n- Uses a typed runtime state machine so listening, processing, executing, error, and recovery states are explicit in the UI flow.\n- Keeps local asset paths and backend selection in `%APPDATA%\\VoxGolem\\config.toml` instead of hard-coding user machine paths.\n- Parses structured `opencode` JSON events into labeled assistant/system output for reviewer-friendly command traces.\n- Separates frontend parsing/state tests from Rust runtime checks and Linux-hosted Windows artifact packaging in CI.\n- Packages Windows release artifacts with a config template and verifies expected runtime DLLs before publishing.\n\n## 🏗️ Architecture\n\n```text\nMicrophone / typed prompt\n        |\n        v\nReact 19 + Tauri 2 desktop shell\n        |\n        v\nRust runtime commands\n        |\n        +--\u003e Voice pipeline: wake word -\u003e speech activity -\u003e local transcription\n        |\n        +--\u003e Response backend: opencode or local llama.cpp profile\n        |\n        v\nTranscript, runtime status, and command output in the UI\n```\n\nThe frontend owns interaction state and renders the transcript. Tauri commands bridge UI events to Rust runtime code, which resolves local config, initializes voice components, and routes prompts to the selected backend.\n\nKey directories:\n\n- `frontend/` - React UI, transcript rendering, interaction state, and typed prompt flow.\n- `apps/windows-tauri/` - Tauri desktop shell and UI-to-runtime command bridge.\n- `crates/audio/` - wake-word, capture, and voice activity pipeline boundaries.\n- `crates/model/` - local model and transcription-related runtime code.\n- `crates/core/` - prompt execution, backend routing, and shared runtime behavior.\n- `crates/platform/` - platform-specific runtime integration.\n- `Makefile` - canonical local and CI command surface for checks, Windows cross-builds, and release staging.\n\n## 🚀 Getting Started\n\n### Prerequisites\n\n- Windows runtime environment.\n- Bun for frontend development scripts.\n- Rust stable toolchain for workspace checks and Tauri builds.\n- Linux build tools for Windows cross-builds: `cargo-xwin`, Tauri CLI, CMake, Ninja, LLVM/Clang 19+ tools, `lld`, `curl`, and `unzip`.\n- Local model/runtime assets referenced by `%APPDATA%\\VoxGolem\\config.toml`.\n\n### Configure local assets\n\n1. Create the config directory:\n\n   ```powershell\n   New-Item -ItemType Directory -Force \"$env:APPDATA\\VoxGolem\"\n   ```\n\n2. Copy [`config.example.toml`](config.example.toml):\n\n   ```powershell\n   Copy-Item .\\config.example.toml \"$env:APPDATA\\VoxGolem\\config.toml\"\n   ```\n\n3. Update the paths in `config.toml` for your local wake-word, transcription, VAD, backend, model, and optional TTS assets.\n\n### Install and verify\n\n```bash\nmake test\n```\n\nRun the frontend development shell when you need live UI iteration:\n\n```bash\nmake app-dev\n```\n\nBuild the portable Windows app from Linux:\n\n```bash\nmake pc\n```\n\nStage the user-testable release files locally:\n\n```bash\nmake dist\n```\n\nStaged files are written to `dist/VoxGolem/`, matching the GitHub Actions artifact layout.\n\n## ✅ Testing\n\nLocal checks cover frontend type safety, linting, UI state, startup parsing, runtime control, prompt execution parsing, voice-flow behavior, and production frontend build output.\n\n```bash\nmake test\n```\n\nCI runs `make test` on Linux and builds the Windows artifact with `make pc-dist` on Linux.\n\nCI proves formatting, linting, tests, and Linux-hosted Windows build/package creation; final microphone, audio-device, WebView, GPU/runtime, and model behavior still requires manual validation on a Windows machine.\n\n## 📦 Releases / Artifacts\n\n[GitHub Releases](https://github.com/git-blame-dev/vox-golem/releases) publish `vox-golem-\u003cversion\u003e.zip` and `SHA256SUMS` from successful CI runs on `main` when release-relevant files change.\n\nThe packaged artifact includes the Windows executable, `config.toml` template, CUDA/cuDNN runtime DLLs, and required runtime DLLs verified by the workflow.\n\nLocal staging uses the same layout:\n\n```text\ndist/VoxGolem/config.toml\ndist/VoxGolem/vox-golem.exe\ndist/VoxGolem/*.dll\n```\n\n## ⚠️ Limitations\n\n- Windows-only runtime target; Linux is used to build and package the Windows artifact but is not the supported app runtime.\n- Requires local model/runtime assets that are not bundled in the source tree.\n- Voice quality, latency, and backend behavior depend on the user's configured models and machine.\n- Windows runtime readiness should be validated from the generated Windows artifact, not inferred from frontend-only checks.\n- Voice input and generated outputs should be treated as local user data; avoid committing recordings, model files, or generated artifacts.\n\n## 🧯 Troubleshooting\n\n- **Missing config:** ensure `%APPDATA%\\VoxGolem\\config.toml` exists and is based on [`config.example.toml`](config.example.toml).\n- **Missing model or executable:** check that every configured path points to an existing file, directory, or executable on the Windows machine.\n- **Backend does not respond:** confirm `response_backend` matches the configured `[opencode]` or `[llama_cpp]` table.\n- **Unsure which release files to use:** download the latest GitHub Release zip and verify it against `SHA256SUMS`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgit-blame-dev%2Fvox-golem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgit-blame-dev%2Fvox-golem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgit-blame-dev%2Fvox-golem/lists"}