{"id":50328909,"url":"https://github.com/levivoelz/openclaw-plugin-voice-chat","last_synced_at":"2026-05-29T08:05:35.796Z","repository":{"id":358545985,"uuid":"1241204462","full_name":"levivoelz/openclaw-plugin-voice-chat","owner":"levivoelz","description":"Voice that behaves like chat for OpenClaw — STT/TTS bracket your real agent so it keeps its models, memory, and skills.","archived":false,"fork":false,"pushed_at":"2026-05-17T23:32:06.000Z","size":220,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-17T23:37:12.275Z","etag":null,"topics":["ai-agents","elevenlabs","openai","openclaw","openclaw-plugin","plugin","realtime","stt","tts","voice"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/levivoelz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-17T04:48:43.000Z","updated_at":"2026-05-17T23:32:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/levivoelz/openclaw-plugin-voice-chat","commit_stats":null,"previous_names":["levivoelz/openclaw-plugin-voice-chat"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/levivoelz/openclaw-plugin-voice-chat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levivoelz%2Fopenclaw-plugin-voice-chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levivoelz%2Fopenclaw-plugin-voice-chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levivoelz%2Fopenclaw-plugin-voice-chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levivoelz%2Fopenclaw-plugin-voice-chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/levivoelz","download_url":"https://codeload.github.com/levivoelz/openclaw-plugin-voice-chat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/levivoelz%2Fopenclaw-plugin-voice-chat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33642338,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","elevenlabs","openai","openclaw","openclaw-plugin","plugin","realtime","stt","tts","voice"],"created_at":"2026-05-29T08:05:35.069Z","updated_at":"2026-05-29T08:05:35.791Z","avatar_url":"https://github.com/levivoelz.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# openclaw-plugin-voice-chat\n\nVoice that behaves like chat. STT and TTS bracket the real OpenClaw agent —\nthe transcript becomes a real user turn, and the agent's reply streams back\nthrough TTS as it generates. Same model, same memory, same skills, same\npermissions. The voice layer is just I/O.\n\nBuilt for the Mac Studio + Iris setup. Not a public OpenClaw plugin — uses\nlocal secrets-daemon plumbing.\n\n## How it works\n\n```\n                          (this plugin)\n                                │\n  mic ─► sox ─► WS frames ──► VAD ──► STT provider ──► transcript\n                                │                          │\n                                │                          ▼\n                                │             runtime.channel.turn.runPrepared\n                                │                          │\n                                │       (real OpenClaw agent session — sonnet/opus/whatever)\n                                │                          │\n                                │                          ▼\n                                ◄── sentence buffer ◄── reply stream\n                                │\n                                ▼\n                          TTS provider ──► audio chunks ──► WS frames ──► speaker\n```\n\n- Registers as an OpenClaw **channel** (`voice-chat`).\n- Hosts its **own** WebSocket on `127.0.0.1:18790` (separate from gateway 18789).\n- A CLI client streams mic audio in, plays TTS audio back.\n- Transcript runs as a real channel turn via `runtime.channel.turn.runPrepared`\n  — the agent inherits its configured model, memory, skills, and permissions.\n- Streaming throughout: STT emits as you talk, agent reply streams as deltas,\n  sentence buffer emits to TTS as sentences complete, TTS chunks play as they\n  arrive. End-to-end latency is dominated by your network + the LLM's\n  first-token time, not the pipeline.\n\n## Quick start\n\n```bash\n# Build\ncd ~/openclaw-plugin-voice-chat\nnpm pack\n\n# Deploy to iris's openclaw install\nscp -i ~/.ssh/iris-local -o IdentitiesOnly=yes \\\n  levivoelz-openclaw-plugin-voice-chat-0.1.0.tgz iris@localhost:/tmp/\nssh -i ~/.ssh/iris-local -o IdentitiesOnly=yes iris@localhost \\\n  \"openclaw plugins install /tmp/levivoelz-openclaw-plugin-voice-chat-0.1.0.tgz \\\n   --force --dangerously-force-unsafe-install \\\n   \u0026\u0026 openclaw gateway restart\"\n\n# Install local STT (default — Parakeet TDT via MLX, runs on-device)\nuv tool install parakeet-mlx\n\n# Talk\nnode ~/openclaw-plugin-voice-chat/dist/cli/index.js doctor\nnode ~/openclaw-plugin-voice-chat/dist/cli/index.js \\\n  --agent iris --gateway ws://127.0.0.1:18790\n```\n\nThe `--dangerously-force-unsafe-install` is needed because the plugin uses\n`child_process` (`macos-say` TTS provider, parakeet daemon spawn).\n\n## STT providers\n\n| Provider id | Backend | When to use |\n|---|---|---|\n| `voice-chat/parakeet-local` ★ default | Parakeet TDT via MLX, served by `daemon/parakeet-daemon.py` over a Unix socket. Auto-spawns on first utterance, keeps model warm | On-device, free, fast first-token. Apple Silicon only |\n| `voice-chat/openai-realtime` | OpenAI Realtime API (GA) | Lowest latency, cloud cost, fewer hallucinations on noisy audio |\n| `voice-chat/openai-whisper` | OpenAI Whisper REST | Simple, slower than realtime, no streaming partials |\n\n### Parakeet daemon\n\nSee `daemon/README.md`. The daemon avoids the ~1s Python+MLX cold-start\nthat would otherwise hit every utterance. Auto-spawns; manual start is\ndocumented in the sibling README.\n\n## TTS providers\n\n| Provider id | Backend | When to use |\n|---|---|---|\n| `voice-chat/openai` ★ default | OpenAI TTS (`tts-1`, voice `shimmer` by default) | Quality + latency balance, paid |\n| `voice-chat/elevenlabs` | ElevenLabs | Best voice quality, needs `elevenlabs.apiKey` (pulled from iris-secrets-daemon) |\n| `voice-chat/macos-say` | macOS `say` command | Zero cost, zero deps, robotic. Useful for dev / offline |\n\nAudio formats supported: `mp3` (default), `pcm16`, `opus`.\n\n## CLI usage\n\n```bash\nopenclaw-voice [resume] [options]\n```\n\nEvery invocation starts a NEW chat session by default. Use `resume` to\ncontinue the most recent session for the given agent.\n\n| Flag | What |\n|---|---|\n| `--gateway \u003curl\u003e` | WS URL of the plugin (default `ws://127.0.0.1:18790`, or `$OPENCLAW_GATEWAY`) |\n| `--agent \u003cid\u003e` | Target agent id (default = gateway's default agent) |\n| `--mode \u003cptt\\|vad\u003e` | `ptt` = push-to-talk (default), `vad` = voice-activity detection |\n| `--stt \u003cprovider\u003e` | Override STT provider (e.g. `voice-chat/openai-realtime`) |\n| `--stt-model \u003cname\u003e` | Model id within that provider |\n| `--tts \u003cprovider\u003e` | Override TTS provider |\n| `--tts-model \u003cname\u003e` | Model id |\n| `--voice \u003cname\u003e` | TTS voice |\n| `--format \u003cmp3\\|pcm16\\|wav\u003e` | Audio format |\n| `--no-tts` | Transcript-only, no audio playback |\n| `--no-stt` | Type input instead of speaking |\n| `--print` | Echo transcripts and replies to stderr/stdout |\n| `--audio-cues \u003cvoice\\|off\u003e` | A short sci-fi \"working\" cue (Zarvox) plays on the first thinking/tool event of a turn so you know iris is alive during long waits |\n| `--device-token \u003ctok\u003e` | Auth token (or `$OPENCLAW_DEVICE_TOKEN`) |\n| `--debug` | Verbose logging |\n\n### Subcommands\n\n| Command | What |\n|---|---|\n| `resume` | Resume the last voice session for `--agent` |\n| `doctor` | Check `sox`, mic, player, gateway reachability, plugin install |\n| `sessions` | List chat sessions via gateway API |\n| `pair` | Device pairing (stub) |\n\n### UX behaviors worth knowing\n\n- **Push-to-talk:** space to talk, release to send. Esc to interrupt in-flight TTS.\n- **Barge-in:** in VAD mode, starting to speak cancels iris's current TTS and the\n  in-flight turn — feels like a real conversation. Mic VAD ducks while local\n  playback is active so the speaker bleed doesn't trigger phantom utterances.\n- **Working cues:** if iris goes quiet because she's thinking or running a tool,\n  a brief sci-fi tone plays so you don't think the line dropped.\n- **Auto-reconnect:** if the WebSocket drops, the CLI reconnects with\n  exponential backoff and resumes the same client id.\n- **Utterance stitching:** consecutive utterances inside an 800ms gap merge so\n  pausing mid-sentence doesn't fragment the transcript.\n\n### Exit codes\n\n```\n0  clean exit\n2  gateway unreachable\n3  auth failed\n4  no mic / sox missing\n5  plugin not installed\n```\n\n## Plugin config\n\nLives in iris's `~/.openclaw/openclaw.json` under `channels.voice-chat`:\n\n```jsonc\n{\n  \"channels\": {\n    \"voice-chat\": {\n      \"enabled\": true,\n      \"host\": \"127.0.0.1\",\n      \"port\": 18790,\n\n      \"stt\": {\n        \"provider\": \"voice-chat/parakeet-local\",\n        \"model\":    \"mlx-community/parakeet-tdt-0.6b-v3\",\n        \"language\": \"en\"\n      },\n      \"tts\": {\n        \"provider\": \"voice-chat/openai\",\n        \"model\":    \"tts-1\",\n        \"voice\":    \"shimmer\",\n        \"format\":   \"mp3\"\n      },\n\n      \"openai\":     { \"apiKey\": \"...\", \"baseUrl\": \"...\" },\n      \"elevenlabs\": { \"apiKey\": \"...\" },\n\n      \"mode\":      \"ptt\",   // or \"vad\"\n      \"interrupt\": true,    // cancel in-flight TTS on user speech\n\n      // Per-agent overrides keyed by agent id\n      \"perAgent\": {\n        \"iris\": { \"tts\": { \"voice\": \"nova\" } }\n      }\n    }\n  }\n}\n```\n\nAPI keys can also be wired through the iris-secrets-daemon (see daemon\nintegration notes — credential plumbing is intentionally per-deploy).\n\n## Repo layout\n\n```\nsrc/\n  plugin.ts                 channel plugin entry\n  channel-runtime.ts        host runtime store\n  types.ts                  WS protocol frames\n  core/\n    voice-session.ts        per-WS orchestrator (turn lifecycle, streaming)\n    sentence-buffer.ts      stream-aware sentence emitter (chunks the LLM\n                            output into TTS-ready sentences as deltas arrive)\n    speculative.ts          speculative LLM dispatch helpers\n    resolve-config.ts       merge defaults/per-agent/hints\n  providers/\n    registry.ts\n    daemon.ts               shared daemon-client helpers\n    stt/  parakeet-local.ts, openai-realtime.ts, openai-whisper.ts\n    tts/  openai.ts, elevenlabs.ts, macos-say.ts\n  cli/\n    index.ts                CLI entry + arg parsing\n    talk.ts                 main interactive loop\n    doctor.ts               environment + reachability checks\n    sessions.ts             list/manage chat sessions\n    pair.ts                 device pairing (stub)\n    vad.ts                  client-side VAD\n    audio-mac.ts            sox capture + afplay playback (macOS)\n    audio-linux.ts          sox capture + ffplay/aplay playback (Linux)\n    client-id.ts            persistent per-CLI client id\n    ws.ts                   reconnecting WebSocket\n  ui/\ndaemon/\n  parakeet-daemon.py        long-lived Parakeet MLX inference daemon\n  README.md\ntest/                       unit tests\ntypes/openclaw.d.ts         ambient SDK shim\nopenclaw.plugin.json        OpenClaw plugin manifest + UI hints\n```\n\n## Dev loop\n\n```bash\nnpx tsc --noEmit          # typecheck\nnpx tsc                   # build to dist/\nnpx tsx --test test/*.ts  # run tests\nnpm pack                  # produce tarball\n```\n\n## What makes it fast\n\nEnd-to-end latency from \"release space\" to \"first TTS audio playing\" is\ndominated by LLM first-token time. Pipeline contributions are below ~150ms\non a warm path:\n\n- **Speculative LLM dispatch** — kick off the agent turn as soon as the\n  transcript looks final-shaped, before the user has fully stopped speaking\n- **Streaming TTS** — first complete sentence in the reply goes to TTS\n  immediately; we don't wait for the agent to finish\n- **Parakeet daemon keepalive** — model stays warm in MLX, ~1s cold start\n  amortized to zero across utterances\n- **16kHz mic + small stream chunks** — minimum viable for STT, smallest\n  meaningful frame size for streaming\n- **TTS prewarm** — first sentence triggers the TTS connection ahead of audio bytes\n- **VAD ducking during playback** — mic stays open, just gets less sensitive,\n  so barge-in works without phantom-utterance corruption\n\n## Status\n\n- Plugin loads cleanly as a channel on OpenClaw 2026.5.12+.\n- STT verified live: Parakeet local (MLX), OpenAI Realtime (GA), OpenAI Whisper.\n- TTS verified live: OpenAI (multiple voices/models), ElevenLabs (with paid\n  key, now wired through iris-secrets-daemon), macOS `say`.\n- End-to-end driven against the live gateway: transcript → real agent turn →\n  streaming reply → TTS chunks → playback. ✓\n- Thinking + tool events surface to the client via real SDK hooks\n  (`replyOptions` callbacks) — drives the working-cue UX.\n\n## Known host limitations (OpenClaw 2026.5.x)\n\n- **No plugin-UI registration API.** Control UI is a monolithic SPA with\n  hardcoded renderers per channel; third-party channels get a generic default\n  view. Anything visual ships as a separate sibling-origin app, or not at all.\n- **Channels require `auth: \"gateway\"` for WS upgrades.** Bypassed locally via\n  `gateway.controlUi.dangerouslyDisableDeviceAuth: true`. Re-enable once the\n  CLI implements the device-token challenge.\n\n## License\n\nMIT (in repo for future portability).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flevivoelz%2Fopenclaw-plugin-voice-chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flevivoelz%2Fopenclaw-plugin-voice-chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flevivoelz%2Fopenclaw-plugin-voice-chat/lists"}