https://github.com/hackjutsu/vibe-speech
Local near real-time voice AI assistant
https://github.com/hackjutsu/vibe-speech
ai aiassistant chatbot llm ollama python whisper
Last synced: about 1 month ago
JSON representation
Local near real-time voice AI assistant
- Host: GitHub
- URL: https://github.com/hackjutsu/vibe-speech
- Owner: hackjutsu
- Created: 2025-11-27T04:26:09.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-12-08T00:27:11.000Z (6 months ago)
- Last Synced: 2025-12-08T12:01:17.742Z (6 months ago)
- Topics: ai, aiassistant, chatbot, llm, ollama, python, whisper
- Language: Python
- Homepage:
- Size: 64.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Vibe Speech (WIP)
Cross-platform, local-first voice helper that listens to the mic, runs Whisper locally, and feeds the transcript to a local LLM assistant. The assistant replies with your configured personality and the response is spoken aloud via `xsst2`. Designed for macOS and Windows, with a fallback automation layer that works anywhere `pyautogui` does.
## Current status
- Push-to-talk hotkey with tail padding; buffers while held, then transcribes once and routes the transcript to an LLM assistant.
- Faster-Whisper backend (configurable model; defaults to `large-v3-turbo` unless you change it). Beam size, language, and an optional initial prompt are configurable.
- Optional rewriter (Ollama or llama.cpp) for grammar polish before the assistant sees the text.
- Assistant replies are generated by a local/Ollama LLM with a customizable personality and spoken through `xsst2`.
- Assistant prompts can include a configurable window of recent conversation history (`assistant.history_length`).
- Spinner/colored timing logs so you can see when transcription/rewriting/assistant work is running.
## Architecture (plain text)
```
Mic -> AudioCapture (chunk/tail) -> WhisperEngine (transcribe)
-> Processor (cleanup/correct + optional Rewriter)
-> Assistant (LLM; uses conversation history window)
-> SpeechSynthesizer (TTS) + Automation (typing)
```
## Project layout
- `src/vibe_speech/cli.py` – entrypoint (`vibe-speech` script) with `serve` and `doctor`.
- `src/vibe_speech/config.py` – config models and loader.
- `src/vibe_speech/runtime.py` – audio capture, hotkey handling, buffering, transcription, output, and logging.
- `src/vibe_speech/whisper_engine.py` – Whisper wrapper (faster-whisper).
- `src/vibe_speech/automation.py` – text output automation (pyautogui).
- `src/vibe_speech/processor.py` – processing modes (`raw`, cleanup, optional correction/rewriter).
- `src/vibe_speech/rewriter.py` – optional grammar rewriter (Ollama/local llama.cpp).
- `config.sample.yaml` – defaults for local development.
## Quick start
1) Python 3.11+, `ffmpeg` on PATH.
2) Install: `python -m venv .venv && source .venv/bin/activate && pip install -e .`
3) Copy and edit config: `cp config.sample.yaml config.yaml`
- Set `audio.device_name` to your mic.
- Choose a Whisper model (`whisper.model_size`), beam size, `initial_prompt` if desired.
- Set `whisper.remote_url` to offload transcription to your remote `/transcribe` service (leave empty to use local Whisper).
- Set the assistant provider/model/personality/history length in `assistant.*`; adjust `speech.*` if your `xsst2` path or args differ.
- Enable/disable the optional rewriter as needed.
### Prompt shape (with history)
```
assistant.system_prompt
Personality:
[last N user/assistant turns, up to assistant.history_length]
User:
Assistant:
User:
Assistant:
```
4) Run: `vibe-speech --config config.yaml serve` (use `--dry-run` to log without speaking).
5) Hold `ctrl+shift+space` (default) while speaking; release to transcribe, send to the assistant, and hear the reply. Spinner shows work in progress; logs include timing and raw/final text.
Virtualenv activation (if you created `.venv` above):
- macOS/Linux (bash/zsh): `source .venv/bin/activate`
- Windows PowerShell: `.venv\\Scripts\\Activate.ps1`
- Windows cmd: `.venv\\Scripts\\activate.bat`
## Debug logging
- CLI flags: `vibe-speech --log-level DEBUG --config config.yaml serve` (or `python -m vibe_speech.cli --log-level DEBUG --config config.yaml serve`).
- Config file: set `log_level: DEBUG` in `config.yaml` and run normally.
## Platform notes
- macOS: grant Accessibility for your terminal/editor so typing works; mic permission for the terminal. Tail padding helps avoid clipping; adjust in config.
- Windows/Linux: relies on `pyautogui` for typing; focus targeting not yet implemented.
## Troubleshooting
- Mic errors (AUHAL -50, etc.): set `audio.device_name` to a valid input from `sounddevice.query_devices()`, and ensure mic permission is granted.
- Model downloads: set `whisper.offline: false` for the first run to cache; then flip to `true` for offline use.
- Accuracy vs speed: smaller models/beam=1–3 for speed; larger/beam=5 for accuracy.
## Notes
- Streaming/partials are not implemented; the app buffers until hotkey release (with optional tail capture).
- The rewriter can change phrasing; set `processing.mode: raw` and `rewriter.enabled: false` for unaltered Whisper output.