{"id":35214466,"url":"https://github.com/zkhan93/respeaker-openai-assistant","last_synced_at":"2026-04-13T20:32:50.244Z","repository":{"id":330747896,"uuid":"1123527129","full_name":"zkhan93/respeaker-openai-assistant","owner":"zkhan93","description":"Pi Realtime Voice - A voice assistant for Raspberry Pi with local  hotword detection and OpenAI Realtime API integration. Features event-driven  architecture, ReSpeaker 4-Mic Array support, and bidirectional voice conversations.","archived":false,"fork":false,"pushed_at":"2026-02-24T21:25:57.000Z","size":163,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-25T01:50:51.443Z","etag":null,"topics":["edge-computing","iot","openai","python","raspberry-pi","realtime-api","respeaker","respeaker-4mics-array","speech-recognition","voice-ai","voice-assistant","wake-word-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zkhan93.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-27T04:12:23.000Z","updated_at":"2026-02-23T19:41:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zkhan93/respeaker-openai-assistant","commit_stats":null,"previous_names":["zkhan93/respeaker-openai-assistant"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zkhan93/respeaker-openai-assistant","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkhan93%2Frespeaker-openai-assistant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkhan93%2Frespeaker-openai-assistant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkhan93%2Frespeaker-openai-assistant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkhan93%2Frespeaker-openai-assistant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zkhan93","download_url":"https://codeload.github.com/zkhan93/respeaker-openai-assistant/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zkhan93%2Frespeaker-openai-assistant/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31770720,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T20:17:16.280Z","status":"ssl_error","status_checked_at":"2026-04-13T20:17:08.216Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["edge-computing","iot","openai","python","raspberry-pi","realtime-api","respeaker","respeaker-4mics-array","speech-recognition","voice-ai","voice-assistant","wake-word-detection"],"created_at":"2025-12-29T21:19:50.325Z","updated_at":"2026-04-13T20:32:50.230Z","avatar_url":"https://github.com/zkhan93.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Voice Assistant for ReSpeaker 4-Mic Array\n\nA voice assistant service for Raspberry Pi using the ReSpeaker 4-Mic Array, with local hotword detection (\"alexa\") and OpenAI integration.\n\n## Features\n\n- 🎤 **Local Hotword Detection**: openWakeWord for offline \"alexa\" wake word detection\n- 🔊 **ReSpeaker 4-Mic Array**: Full support for AC108 device (paInt16 mono)\n- 🤖 **OpenAI Integration**: Speech-to-text with Whisper, future Realtime API support\n- 🎯 **Real-Time Audio**: Multi-consumer architecture with callback-based capture\n- 📡 **Event-Driven**: Pub-sub system for decoupled components\n- 🐍 **Modern Python**: Built with Python 3.11+ using `uv` package manager\n\n## Quick Start\n\n```bash\n# 1. Install dependencies\nexport PATH=\"$HOME/.local/bin:$PATH\"\nuv sync\n\n# 2. Configure\ncp config/config.yaml.example config/config.yaml\nnano config/config.yaml  # Add your OpenAI API key\n\n# 3. Download models\nuv run voice-assistant download-models\n\n# 4. Test event system (no OpenAI needed)\nuv run voice-assistant test-events\n\n# 5. Test speech-to-text (requires OpenAI API key)\nuv run voice-assistant test-stt\n```\n\n## Hardware Requirements\n\n- Raspberry Pi 4B (2GB or more)\n- ReSpeaker 4-Mic Array for Raspberry Pi\n- Internet connection for OpenAI API\n\n## Installation\n\n### System Dependencies\n\nThese should already be installed on Raspberry Pi OS:\n- `portaudio19-dev` - Audio I/O\n- `libasound2-dev` - ALSA support\n- `python3-dev` - Python headers\n- `libffi-dev` - FFI library\n\n### Python Setup\n\n```bash\ncd /home/pi/llm-assistant/voice-assistant\nexport PATH=\"$HOME/.local/bin:$PATH\"\nuv sync\n```\n\n### Configuration\n\n```bash\ncp config/config.yaml.example config/config.yaml\nnano config/config.yaml\n```\n\nAdd your OpenAI API key:\n```yaml\nopenai:\n  api_key: \"sk-...\"  # Your actual API key\n```\n\n###  Download Hotword Models\n\n```bash\nuv run voice-assistant download-models\n```\n\n### Verify Installation\n\n```bash\nuv run voice-assistant verify\n```\n\n## CLI Commands\n\n### Core Commands\n\n```bash\n# Run the voice assistant (future)\nuv run voice-assistant run [--log-level DEBUG]\n\n# Show configuration\nuv run voice-assistant config\n\n# Verify setup\nuv run voice-assistant verify\n\n# Download hotword models\nuv run voice-assistant download-models\n```\n\n### Test Commands\n\n```bash\n# Monitor all events in real-time (diagnostic tool)\nuv run voice-assistant test-events\n\n# Test speech-to-text (event-driven demo)\nuv run voice-assistant test-stt\n\n# Test OpenAI Realtime API (full voice conversation)\nuv run voice-assistant test-realtime\n\n# Test hotword detection (records 5s after \"alexa\")\nuv run voice-assistant test-hotword [--debug]\n\n# Test with native paInt16 mono (verification)\nuv run voice-assistant test-hotword-native\n\n# Test audio recording (15s capture \u0026 playback)\nuv run voice-assistant record [--duration 15]\n\n# Test audio hardware\nuv run voice-assistant test-audio\n```\n\n**Recommended Testing Flow:**\n1. **`test-events`** - See all events in real-time (no API key needed)\n   - Verify hotword detection works\n   - Check voice activity detection\n   - Understand event timing\n2. **`test-stt`** - Test full STT pipeline (requires API key)\n   - Verifies OpenAI integration\n   - Tests complete event-driven flow\n   - **How it works:**\n     - Say \"alexa\" → System starts recording\n     - Continue speaking → Audio captured\n     - Stop speaking (~1s pause) → Transcription sent to OpenAI\n     - Wait 1-3 seconds → Transcription displayed\n   - **Important:** Only speech after \"alexa\" is transcribed\n   - If you speak without saying \"alexa\", it's ignored (by design)\n   - **Multiple hotwords:** If you say \"alexa\" again before stopping, the recording restarts (allows correction/new command)\n\n3. **`test-realtime`** - Test OpenAI Realtime API (requires API key)\n   - Full bidirectional voice conversation with AI\n   - **How it works:**\n     - Say \"alexa\" → Connects to OpenAI Realtime API\n     - Speak your question/command → Streams audio to OpenAI\n     - Stop speaking (~1s pause) → AI processes and responds\n     - **AI speaks back!** → Plays audio response through speakers\n   - **Features:**\n     - Real-time audio streaming (no waiting for transcription)\n     - AI responds with voice (not just text)\n     - Natural conversation flow\n     - Say \"alexa\" again to interrupt/start new command\n   - **Best for:** Interactive conversations, questions that need spoken responses\n\n## Architecture\n\n### Overview\n\n```\n┌───────────────────────────────────────────────────────────┐\n│              Event-Driven Architecture                    │\n├───────────────────────────────────────────────────────────┤\n│                                                           │\n│  Audio Stream (Callback Thread)                          │\n│         ↓                                                 │\n│    ┌────────────────────┐                                 │\n│    │  AudioHandler      │  Emits VAD events               │\n│    │  (Producer + VAD)  │  Broadcasts audio to:           │\n│    └─────┬──────────┬───┘  • hotword_queue (skip-ahead)  │\n│          │          │      • audio_queue (buffered)       │\n│          │ VAD      │ Audio                               │\n│          │ Events   │ Frames                              │\n│          ↓          ↓                                      │\n│    ┌─────────┐  ┌──────────────────┐                      │\n│    │EventBus │  │VoiceDetection    │                      │\n│    │         │←─│Service           │                      │\n│    │         │  │(Hotword Loop)    │                      │\n│    └────┬────┘  └──────────────────┘                      │\n│         │                                                  │\n│         │ Events:                                          │\n│         │  • hotword_detected                              │\n│         │  • voice_activity_started                        │\n│         │  • voice_activity_stopped                        │\n│         │                                                  │\n│         ↓                                                  │\n│    ┌────────────┬────────────┬────────────┐               │\n│    ↓            ↓            ↓            ↓               │\n│ Consumer1   Consumer2    Consumer3    Consumer4           │\n│ (STT)       (Realtime)   (Recording)  (Custom)            │\n│                                                           │\n└───────────────────────────────────────────────────────────┘\n```\n\n### Key Concepts\n\n#### 1. Multi-Consumer Audio\n\nOne audio stream broadcasts to multiple queues:\n\n- **hotword_queue** (size=3): Small, skip-ahead for low latency detection\n- **audio_queue** (size=100): Large, buffered for complete audio capture\n\n```python\n# Hotword detection (skip-ahead)\naudio = audio_handler.read_hotword_chunk()\n\n# Complete audio for streaming/transcription (buffered)\naudio = audio_handler.read_audio_chunk()\n```\n\n#### 2. Event-Driven\n\nComponents communicate via events, not direct calls:\n\n**Available Events:**\n\n1. **hotword_detected** - Wake word detected\n```python\nevent = HotwordEvent(\n    timestamp=now,\n    hotword=\"alexa\",\n    score=0.95,\n    audio_queue_size=42\n)\nevent_bus.publish(\"hotword_detected\", event)\n```\n\n2. **voice_activity_started** - User started speaking\n```python\nevent = VoiceActivityEvent(\n    timestamp=now,\n    activity_type='started'\n)\nevent_bus.publish(\"voice_activity_started\", event)\n```\n\n3. **voice_activity_stopped** - User stopped speaking\n```python\nevent = VoiceActivityEvent(\n    timestamp=now,\n    activity_type='stopped',\n    duration=3.2  # seconds\n)\nevent_bus.publish(\"voice_activity_stopped\", event)\n```\n\n**Subscribing to Events:**\n\n```python\n# Subscribe to events\nevent_bus.subscribe(\"hotword_detected\", on_hotword)\nevent_bus.subscribe(\"voice_activity_stopped\", on_voice_stopped)\n\n# Example: Capture exact duration of user speech\ndef on_hotword(event: HotwordEvent):\n    self.recording = True\n    # Start background thread to collect audio\n\ndef on_voice_stopped(event: VoiceActivityEvent):\n    self.recording = False\n    # Transcribe collected audio (exact duration!)\n```\n\n#### 3. Real-Time Performance\n\n- **Callback mode**: Audio captured in background thread (non-blocking)\n- **Skip-ahead**: Hotword queue drops old frames to stay current\n- **Parallel consumers**: All process independently, no blocking\n\n#### 4. Voice Detection Service (Core Loop)\n\nThe `VoiceDetectionService` is a reusable orchestration component that:\n- Runs the main detection loop (hotword detection)\n- Publishes hotword events\n- Integrates with AudioHandler (which publishes voice activity events)\n- Can be used by any command to build different functionality\n\n**Why separate from commands?**\n- Commands are UI/entry points\n- Core loop is reusable business logic\n- Different commands can use the same detection service with different consumers\n\n**Example Usage:**\n\n```python\n# Create core components\nevent_bus = EventBus()\naudio_handler = AudioHandler(event_bus=event_bus)  # VAD events enabled\nhotword_detector = HotwordDetector()\ndetection_service = VoiceDetectionService(audio_handler, event_bus, hotword_detector)\n\n# Register consumers (they subscribe to events)\nstt_consumer = SpeechToTextConsumer(event_bus, audio_handler, api_key)\nrealtime_consumer = RealtimeConsumer(event_bus, audio_handler, api_key)\n\n# Start audio stream\naudio_handler.start_stream()\n\n# Run detection loop (blocks until stopped)\ndetection_service.start()\n```\n\n### Code Structure\n\n```\nsrc/voice_assistant/\n├── core/                        # Core components (producers \u0026 orchestration)\n│   ├── audio_handler.py         # Audio capture + VAD event emission\n│   ├── detection_service.py     # Detection loop (hotword + orchestration)\n│   ├── event_bus.py             # Pub-sub event system\n│   └── hotword_detector.py      # Wake word detection\n│\n├── consumers/                   # Event subscribers\n│   └── stt_consumer.py          # Speech-to-text consumer\n│\n├── services/                    # External services\n│   ├── openai_client.py         # OpenAI Realtime API client\n│   └── state_machine.py         # State management\n│\n├── commands/                    # CLI commands (use core components)\n│   ├── run.py                   # Main service command\n│   ├── test_stt.py              # Test STT consumer\n│   ├── test_hotword.py          # Hotword detection test\n│   └── ...                      # Other utilities\n│\n├── cli.py                   # Command-line interface\n├── config.py                # Configuration management\n└── main.py                  # Service orchestrator (future)\n```\n\n## Configuration\n\nEdit `config/config.yaml`:\n\n### Audio Settings\n\n```yaml\naudio:\n  device: \"ac108\"        # ALSA device name\n  sample_rate: 16000     # Hz\n  channels: 1            # Mono (works best with openWakeWord)\n  chunk_size: 1280       # 80ms chunks (required by openWakeWord)\n```\n\n### Hotword Detection\n\n```yaml\nhotword:\n  model: \"alexa\"\n  threshold: 0.5         # 0.0-1.0 (lower = more sensitive)\n```\n\n**Tuning**:\n- Lower (0.3-0.4): More sensitive, may have false positives\n- Higher (0.6-0.7): Less sensitive, may miss wake word\n- Use `--debug` to see scores and tune\n\n**Debouncing**: The system automatically prevents multiple hotword events for a single utterance using a 2-second cooldown period. This means after detecting \"alexa\" once, it won't fire another event for 2 seconds, even if the detection continues (which is normal as you speak the word).\n\n### Voice Activity Detection\n\n```yaml\nvad:\n  aggressiveness: 3      # 0-3 (recommended: 3)\n  speech_threshold: 3    # Consecutive frames (filters noise)\n  silence_threshold: 15  # Frames before stopping (~1 second)\n```\n\n**Aggressiveness** (0-3):\n- **0**: Least aggressive - detects any sound (not recommended)\n- **3**: Most aggressive - only clear speech (recommended)\n- Use 3 to avoid false triggers from taps, movements, background noise\n\n**Speech Threshold** (consecutive frames):\n- **3** (default): Requires 3 consecutive speech frames (~240ms)\n- **Higher (5-7)**: Stricter - ignores very brief sounds\n- **Lower (1-2)**: More sensitive - may trigger on brief noises\n- Filters out taps, clicks, and momentary sounds\n\n**Silence Threshold** (frames):\n- **15** (default): ~1 second of silence before stopping\n- **Higher (20-25)**: Waits longer before considering speech ended\n- **Lower (10-12)**: Faster response but may cut off pauses\n\n## How It Works\n\n### Hotword Detection\n\n1. Audio captured in background thread (callback mode)\n2. Broadcasted to `hotword_queue` (skip-ahead) and `audio_queue` (buffered)\n3. Hotword detector reads from `hotword_queue`\n4. When \"alexa\" detected and cooldown period passed → publishes `HotwordEvent`\n   - **Debouncing**: 2-second cooldown prevents duplicate events\n   - Single word \"alexa\" = single event (even though detection spans multiple frames)\n5. All subscribed consumers react independently\n\n### Speech-to-Text Consumer\n\n1. Subscribes to `hotword_detected` and `voice_activity_stopped` events\n2. When hotword detected:\n   - Starts recording from `audio_queue` in background thread\n   - If another hotword detected: restarts recording (allows correction)\n3. When voice activity stops:\n   - Stops recording and sends audio to OpenAI Whisper API\n   - Displays transcription\n4. Behavior:\n   - Only transcribes speech AFTER hotword\n   - Speech without hotword is ignored\n   - Multiple \"alexa\" → uses last one before voice stops\n\n### Realtime API Consumer\n\n1. Subscribes to `hotword_detected` and `voice_activity_stopped` events\n2. When hotword detected:\n   - Connects to OpenAI Realtime API via WebSocket\n   - Starts streaming audio from `audio_queue` in real-time\n   - If another hotword detected: cancels current response, restarts\n3. When voice activity stops:\n   - Commits audio buffer and requests AI response\n   - Receives audio response from OpenAI\n   - Plays audio back through speakers\n4. Features:\n   - Bidirectional audio streaming (send + receive)\n   - Low latency (real-time processing)\n   - AI speaks back with voice\n   - Interruption support (say \"alexa\" to cancel/restart)\n5. Architecture:\n   - Runs async event loop in background thread\n   - Audio streaming in separate thread\n   - Audio playback in separate thread\n   - All synchronized via event bus\n\n### Adding Custom Consumers\n\n```python\nfrom voice_assistant.core import EventBus, HotwordEvent, AudioHandler\n\nclass MyConsumer:\n    def __init__(self, event_bus, audio_handler):\n        self.event_bus = event_bus\n        self.audio_handler = audio_handler\n        event_bus.subscribe(\"hotword_detected\", self.on_hotword)\n    \n    def on_hotword(self, event: HotwordEvent):\n        # React to hotword\n        audio = self.audio_handler.read_audio_chunk()\n        # ... process audio ...\n    \n    def cleanup(self):\n        self.event_bus.unsubscribe(\"hotword_detected\", self.on_hotword)\n```\n\n## Tuning Voice Activity Detection\n\n### Problem: False Triggers (taps, movements, background noise)\n\n**Symptoms:**\n- Voice activity events when you tap the desk\n- Events triggered by keyboard typing\n- Background sounds causing false starts\n\n**Solutions (in order of effectiveness):**\n\n1. **Increase aggressiveness to 3** (default)\n   ```yaml\n   vad:\n     aggressiveness: 3  # Most strict\n   ```\n\n2. **Increase speech threshold**\n   ```yaml\n   vad:\n     speech_threshold: 5  # Require 5 consecutive frames (~400ms)\n   ```\n   - Brief sounds (taps, clicks) won't trigger\n   - Real speech sustained longer than 400ms will trigger\n\n3. **Test with `test-events`**\n   ```bash\n   uv run voice-assistant test-events\n   # Tap desk, type, make noise\n   # Should NOT trigger voice activity events\n   # Only speaking should trigger\n   ```\n\n### Problem: Speech Not Detected\n\n**Symptoms:**\n- You speak but no voice activity event\n- Hotword detected but no speech activity\n\n**Solutions:**\n\n1. **Speak louder/clearer**\n   - VAD aggressiveness=3 requires clear speech\n   - Move closer to microphone\n\n2. **Lower speech threshold**\n   ```yaml\n   vad:\n     speech_threshold: 2  # More sensitive\n   ```\n\n3. **Lower aggressiveness (not recommended)**\n   ```yaml\n   vad:\n     aggressiveness: 2  # Less strict (may cause false triggers)\n   ```\n\n### Understanding the Settings\n\n```\naggressiveness: 3  ← Filters out non-speech sounds\n       ↓\nspeech_threshold: 3  ← Requires sustained sound (not brief tap)\n       ↓\nVoice Activity Started! 🗣️\n       ↓\n(user speaks...)\n       ↓\nsilence_threshold: 15  ← Waits for pause before stopping\n       ↓\nVoice Activity Stopped! 🔇\n```\n\n**Recommended defaults** (already set):\n- `aggressiveness: 3` - Only clear speech\n- `speech_threshold: 3` - Filters taps/clicks (~240ms)\n- `silence_threshold: 15` - Natural pause (~1 second)\n\n## Troubleshooting\n\n### Realtime API Errors\n\n**Error: \"Unknown parameter: 'session.modalities'\"**\n\nThis error occurred in older versions. The fix:\n- Removed `modalities` from session configuration\n- The API now infers modalities from context\n- **Already fixed** in current version\n\n**Error: \"Invalid authentication\" or \"401 Unauthorized\"**\n- Check your OpenAI API key in `config/config.yaml`\n- Ensure you have access to the Realtime API (requires payment method)\n- The model name should be `gpt-4o-realtime-preview-2024-12-17`\n\n**No audio playback from AI:**\n- Check speaker volume and connections\n- Verify ALSA playback device is working: `speaker-test -t wav -c 2`\n- OpenAI outputs 24kHz audio - ensure your speakers support it\n\n**High latency or delays:**\n- Check internet connection speed\n- Realtime API requires stable, low-latency connection\n- Consider using Ethernet instead of WiFi\n\n### Hotword Not Detecting\n\n**Check scores**:\n```bash\nuv run voice-assistant test-hotword --debug\n```\n\nLook for lines like:\n```\nDebug: Max score = 0.0129 (alexa), threshold = 0.5\n```\n\n**If scores are always 0.0000**:\n- Run `uv run voice-assistant download-models`\n- Check model file: `ls -lh models/`\n\n**If scores are low (0.01-0.3)**:\n- Speak louder or closer to mic\n- Lower threshold in config\n- Check audio levels: `alsamixer`\n\n**If scores are good but no detection**:\n- Check threshold setting\n- Ensure using correct audio format (paInt16 mono)\n\n### Audio Issues\n\n**Test audio capture**:\n```bash\nuv run voice-assistant record --duration 10\n```\n\nThis records 10s and plays it back.\n\n**Check audio device**:\n```bash\narecord -l\nuv run voice-assistant config\n```\n\n**Low audio levels**:\n```bash\nalsamixer\n# Adjust \"Capture\" or \"ADC\" levels\n```\n\n### Import Errors After Reorganization\n\nUpdate imports:\n```python\n# Old\nfrom voice_assistant.audio_handler import AudioHandler\n\n# New\nfrom voice_assistant.core import AudioHandler\n# or\nfrom voice_assistant.core.audio_handler import AudioHandler\n```\n\n### Performance Issues\n\nCheck queue status in logs:\n```\nhotword_queue: 0-3 frames (good - skip-ahead working)\naudio_queue: 10-50 frames (good - buffering)\n```\n\nIf audio_queue grows \u003e80 frames, system may be falling behind.\n\n### Audio Playback Issues\n\n**Test speaker first:**\n```bash\n# Quick speaker test (plays a 440Hz beep)\nuv run python test_speaker.py\n\n# Or use system tools\nspeaker-test -t sine -f 440 -l 1\n```\n\n**No audio from AI response:**\n1. Check if audio chunks are being received:\n   - Look for `\"Received audio delta\"` in logs\n   - Should see `\"🔊 AI is responding...\"` message\n\n2. Verify default output device:\n   ```bash\n   aplay -l  # List playback devices\n   ```\n\n3. Check ALSA mixer:\n   ```bash\n   alsamixer\n   # Press F6 to select sound card\n   # Adjust \"Master\" or \"PCM\" volume\n   ```\n\n4. Test with different output device:\n   ```python\n   # In realtime_consumer.py, specify device index\n   self.playback_stream = self.audio.open(\n       ...\n       output_device_index=0,  # Try different indices\n   )\n   ```\n\n5. Check logs for OpenAI events:\n   - `response.audio.delta` - Audio chunks received\n   - `response.audio.done` - Audio response complete\n   - `Event: response.content_part.added` - Response structure\n\n**Audio choppy or distorted:**\n- Increase `frames_per_buffer` in playback stream (try 2048 or 4096)\n- Check system load: `top` or `htop`\n\n## Running as Service (Future)\n\n```bash\n# Install\nsudo cp voice-assistant.service /etc/systemd/system/\nsudo systemctl daemon-reload\nsudo systemctl enable voice-assistant\nsudo systemctl start voice-assistant\n\n# Monitor\nsudo journalctl -u voice-assistant -f\n```\n\n## Development\n\n### Project Setup\n\n```bash\n# Clone\ngit clone \u003crepo\u003e\ncd voice-assistant\n\n# Install\nuv sync\n\n# Run tests\nuv run pytest\n\n# Lint\nuv run ruff check src/\n```\n\n### Code Style\n\n- Use `ruff` for linting and formatting\n- Follow PEP 8\n- Type hints where appropriate\n- Docstrings for public APIs\n\n### Adding Features\n\n1. **New Consumer**: Add to `src/voice_assistant/consumers/`\n2. **New Service**: Add to `src/voice_assistant/services/`\n3. **New Command**: Add to `src/voice_assistant/commands/` and `cli.py`\n4. **Core Component**: Add to `src/voice_assistant/core/`\n\n## Technical Details\n\n### Audio Format\n\n- **Capture**: paInt16 mono @ 16kHz\n- **Chunks**: 1280 samples (80ms) - required by openWakeWord\n- **Queues**: Separate for hotword (skip-ahead) and consumers (buffered)\n\n### Hotword Detection\n\n- **Library**: openWakeWord (TensorFlow Lite)\n- **Model**: alexa_v0.1.tflite\n- **Input**: int16 numpy array (not float32!)\n- **Stateful**: Needs every frame for context\n\n### Real-Time Performance\n\n**Before optimization**:\n- Blocking read: 62ms\n- Detection: 18ms\n- Total: 80ms (falling behind 0.13ms/frame)\n\n**After optimization**:\n- Callback mode: ~0ms (background thread)\n- Detection: 18ms\n- Total: 18ms (real-time capable!)\n\n## Known Issues\n\n1. **NumPy 2.x incompatibility**: Constrained to numpy \u003c2.0 for tflite-runtime\n2. **ALSA warnings**: Harmless warnings about unavailable devices (ignore)\n3. **GPU discovery warning**: Normal on Raspberry Pi (uses CPU)\n\n## Future Enhancements\n\n- [ ] OpenAI Realtime API consumer (bidirectional streaming)\n- [ ] Recording consumer (save conversations)\n- [ ] Analytics consumer (usage tracking)\n- [ ] Web UI for monitoring\n- [ ] Multi-hotword support\n- [ ] Custom wake word training\n\n## Credits\n\n- [openWakeWord](https://github.com/dscripka/openWakeWord) - Local hotword detection\n- [OpenAI](https://platform.openai.com/) - Whisper API \u0026 Realtime API\n- [ReSpeaker 4-Mic Array](https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/) - Hardware\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzkhan93%2Frespeaker-openai-assistant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzkhan93%2Frespeaker-openai-assistant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzkhan93%2Frespeaker-openai-assistant/lists"}