An open API service indexing awesome lists of open source software.

https://github.com/hackjutsu/vibe-speech

Local near real-time voice AI assistant
https://github.com/hackjutsu/vibe-speech

ai aiassistant chatbot llm ollama python whisper

Last synced: about 1 month ago
JSON representation

Local near real-time voice AI assistant

Awesome Lists containing this project

README

          

# Vibe Speech (WIP)

Cross-platform, local-first voice helper that listens to the mic, runs Whisper locally, and feeds the transcript to a local LLM assistant. The assistant replies with your configured personality and the response is spoken aloud via `xsst2`. Designed for macOS and Windows, with a fallback automation layer that works anywhere `pyautogui` does.

## Current status
- Push-to-talk hotkey with tail padding; buffers while held, then transcribes once and routes the transcript to an LLM assistant.
- Faster-Whisper backend (configurable model; defaults to `large-v3-turbo` unless you change it). Beam size, language, and an optional initial prompt are configurable.
- Optional rewriter (Ollama or llama.cpp) for grammar polish before the assistant sees the text.
- Assistant replies are generated by a local/Ollama LLM with a customizable personality and spoken through `xsst2`.
- Assistant prompts can include a configurable window of recent conversation history (`assistant.history_length`).
- Spinner/colored timing logs so you can see when transcription/rewriting/assistant work is running.

## Architecture (plain text)
```
Mic -> AudioCapture (chunk/tail) -> WhisperEngine (transcribe)
-> Processor (cleanup/correct + optional Rewriter)
-> Assistant (LLM; uses conversation history window)
-> SpeechSynthesizer (TTS) + Automation (typing)
```

## Project layout
- `src/vibe_speech/cli.py` – entrypoint (`vibe-speech` script) with `serve` and `doctor`.
- `src/vibe_speech/config.py` – config models and loader.
- `src/vibe_speech/runtime.py` – audio capture, hotkey handling, buffering, transcription, output, and logging.
- `src/vibe_speech/whisper_engine.py` – Whisper wrapper (faster-whisper).
- `src/vibe_speech/automation.py` – text output automation (pyautogui).
- `src/vibe_speech/processor.py` – processing modes (`raw`, cleanup, optional correction/rewriter).
- `src/vibe_speech/rewriter.py` – optional grammar rewriter (Ollama/local llama.cpp).
- `config.sample.yaml` – defaults for local development.

## Quick start
1) Python 3.11+, `ffmpeg` on PATH.
2) Install: `python -m venv .venv && source .venv/bin/activate && pip install -e .`
3) Copy and edit config: `cp config.sample.yaml config.yaml`
- Set `audio.device_name` to your mic.
- Choose a Whisper model (`whisper.model_size`), beam size, `initial_prompt` if desired.
- Set `whisper.remote_url` to offload transcription to your remote `/transcribe` service (leave empty to use local Whisper).
- Set the assistant provider/model/personality/history length in `assistant.*`; adjust `speech.*` if your `xsst2` path or args differ.
- Enable/disable the optional rewriter as needed.

### Prompt shape (with history)
```
assistant.system_prompt
Personality:

[last N user/assistant turns, up to assistant.history_length]
User:
Assistant:

User:
Assistant:
```
4) Run: `vibe-speech --config config.yaml serve` (use `--dry-run` to log without speaking).
5) Hold `ctrl+shift+space` (default) while speaking; release to transcribe, send to the assistant, and hear the reply. Spinner shows work in progress; logs include timing and raw/final text.

Virtualenv activation (if you created `.venv` above):
- macOS/Linux (bash/zsh): `source .venv/bin/activate`
- Windows PowerShell: `.venv\\Scripts\\Activate.ps1`
- Windows cmd: `.venv\\Scripts\\activate.bat`

## Debug logging
- CLI flags: `vibe-speech --log-level DEBUG --config config.yaml serve` (or `python -m vibe_speech.cli --log-level DEBUG --config config.yaml serve`).
- Config file: set `log_level: DEBUG` in `config.yaml` and run normally.

## Platform notes
- macOS: grant Accessibility for your terminal/editor so typing works; mic permission for the terminal. Tail padding helps avoid clipping; adjust in config.
- Windows/Linux: relies on `pyautogui` for typing; focus targeting not yet implemented.

## Troubleshooting
- Mic errors (AUHAL -50, etc.): set `audio.device_name` to a valid input from `sounddevice.query_devices()`, and ensure mic permission is granted.
- Model downloads: set `whisper.offline: false` for the first run to cache; then flip to `true` for offline use.
- Accuracy vs speed: smaller models/beam=1–3 for speed; larger/beam=5 for accuracy.

## Notes
- Streaming/partials are not implemented; the app buffers until hotkey release (with optional tail capture).
- The rewriter can change phrasing; set `processing.mode: raw` and `rewriter.enabled: false` for unaltered Whisper output.