https://github.com/bubustack/livekit-turn-detector-engram
Turn detector Engram for bobrapet โ buffers PCM audio, runs WebRTC VAD, and emits speech turn payloads.
https://github.com/bubustack/livekit-turn-detector-engram
batch bubustack engram go kubernetes livekit streaming turn-detection vad
Last synced: 25 days ago
JSON representation
Turn detector Engram for bobrapet โ buffers PCM audio, runs WebRTC VAD, and emits speech turn payloads.
- Host: GitHub
- URL: https://github.com/bubustack/livekit-turn-detector-engram
- Owner: bubustack
- License: apache-2.0
- Created: 2025-12-16T09:44:47.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-05-02T07:11:06.000Z (about 1 month ago)
- Last Synced: 2026-05-02T09:18:42.571Z (about 1 month ago)
- Topics: batch, bubustack, engram, go, kubernetes, livekit, streaming, turn-detection, vad
- Language: Go
- Homepage: https://bubustack.io/
- Size: 74.2 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Support: SUPPORT.md
Awesome Lists containing this project
README
# ๐๏ธ LiveKit Turn Detector Engram
This engram buffers the short PCM chunks emitted by the `livekit-bridge-engram` ingress,
runs WebRTC VAD to detect speech activity, and forwards utterance-sized audio
turns (by default as WAV files) downstream. It is intended to sit between the
ingress and STT steps so that transcription services receive properly framed
audio instead of hundreds of 10โฏms packets.
## ๐ Highlights
- Uses the pure-Go WebRTC VAD port (no cgo).
- Configurable frame size, aggressiveness, silence tail, and min/max turn
durations.
- Emits `speech.vad.active` payloads (the shared Tractatus stream type for VAD turns)
that contain a sequence number, duration, and the buffered audio (WAV or PCM).
## ๐ Quick Start
```bash
make lint
go test ./...
make docker-build
```
Apply `Engram.yaml` and place the turn detector between your ingress audio step
and downstream STT/VAD consumers.
## โ๏ธ Configuration (`Engram.spec.with`)
| Name | Default | Description |
|--------------------|---------|-------------|
| `frameDurationMs` | `20` | Frame size fed into the VAD (10/20/30โฏms supported). |
| `silenceDurationMs`| `400` | Required trailing silence before a turn is closed. |
| `minDurationMs` | `200` | Minimum speech duration for a valid turn. |
| `maxDurationMs` | `10000` | Hard cap for a single buffered turn. |
| `aggressiveness` | `2` | WebRTC VAD mode (0 = relaxed, 3 = very aggressive). |
| `outputEncoding` | `wav` | `wav` wraps the PCM chunk in a WAVE header; `pcm` keeps raw PCM16. |
## ๐ฅ Inputs
Input messages must be the `speech.audio.v1` packets emitted by the
ingress engram (PCM16 mono, typically 48โฏkHz).
## ๐ค Outputs
Output messages have `type=speech.vad.active` and contain the buffered audio
inline plus metadata about the participant, room, duration, and a
monotonically increasing sequence number per speaker.
## ๐ Streaming Mode
| Stream type | Description |
|-------------|-------------|
| `speech.vad.active` | Emitted whenever WebRTC VAD detects a contiguous speech turn. Includes buffered audio. |
Downstream steps can simply filter on the `speech.vad.*` namespace regardless of which
detector produced the signal, enabling mix-and-match pipelines.
## ๐งช Local Development
- `make lint` โ Run the shared lint and static-analysis checks.
- `go test ./...` โ Run the turn-detector test suite.
- `make docker-build` โ Build the engram image for local clusters.
- Toggle `BUBU_DEBUG=true` to log sanitized packet summaries and turn boundaries without dumping raw PCM into the logs.
## ๐ค Community & Support
- [Contributing](./CONTRIBUTING.md)
- [Support](./SUPPORT.md)
- [Security Policy](./SECURITY.md)
- [Code of Conduct](./CODE_OF_CONDUCT.md)
- [Discord](https://discord.gg/dysrB7D8H6)
## ๐ License
Copyright 2025 BubuStack.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.