https://github.com/zserge/govad
Silero VAD (Voice Activity Detector) in Pure Go
https://github.com/zserge/govad
Last synced: 10 days ago
JSON representation
Silero VAD (Voice Activity Detector) in Pure Go
- Host: GitHub
- URL: https://github.com/zserge/govad
- Owner: zserge
- License: mit
- Created: 2026-03-30T15:46:18.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-30T15:54:05.000Z (3 months ago)
- Last Synced: 2026-03-30T17:39:12.948Z (3 months ago)
- Language: Go
- Size: 1.03 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# govad
[](https://github.com/zserge/govad/actions/workflows/ci.yml)
[](https://pkg.go.dev/github.com/zserge/govad)
[](https://goreportcard.com/report/github.com/zserge/govad)
Pure Go voice activity detection using the [Silero VAD](https://github.com/snakers4/silero-vad) neural network.
No CGo. No ONNX runtime. No external dependencies.
The model weights are embedded in the binary.
## Features
- Pure Go inference (~300 lines), zero C dependencies
- Processes 512-sample frames (32 ms at 16 kHz)
- Stateful LSTM — feed frames sequentially, get speech probabilities
- Embedded model weights — no extra files to ship
- Validated against the ONNX reference (max diff < 0.001)
## Installation
```
go get github.com/zserge/govad@latest
```
## Usage
```go
package main
import (
"fmt"
"github.com/zserge/govad"
)
func main() {
// Create a VAD detector (uses embedded weights)
v, err := govad.New()
if err != nil {
panic(err)
}
// Feed 512 float32 samples at 16 kHz per call
samples := make([]float32, govad.SamplesPerFrame)
// ... fill samples from your audio source ...
prob := v.Process(samples)
if prob > 0.5 {
fmt.Println("Speech detected!")
}
// Call Reset() between unrelated audio streams
v.Reset()
}
```
## Live microphone example
The `examples/live-vad` directory contains a complete real-time VAD demo
using [malgo](https://github.com/gen2brain/malgo) (miniaudio bindings):
```
cd examples/live-vad
go run . -threshold 0.5
```
It captures audio from your default microphone and prints speech/silence
transitions in real time.
## API
| Function | Description |
|----------|-------------|
| `govad.New()` | Create a detector with embedded default weights |
| `govad.NewFromFile(path)` | Load weights from a file |
| `govad.NewFromReader(r)` | Load weights from an `io.Reader` |
| `v.Process(samples)` | Run inference on 512 samples, returns probability `[0, 1]` |
| `v.Reset()` | Clear LSTM state for a new audio stream |
## Performance
On Apple M1:
```
BenchmarkProcess-8 1911 632370 ns/op 10112 B/op 7 allocs/op
```
~632 µs per 32 ms frame — roughly 50× faster than real time.
## Model
The weights are exported from `silero_vad_half.onnx` (Silero VAD v5, 16 kHz only).
The architecture is:
```
Audio (512 samples, 16 kHz)
→ Reflect pad (64 right)
→ Conv-STFT (n_fft=256, hop=128)
→ Magnitude spectrum
→ Conv1d(129→128, k=3) + ReLU
→ Conv1d(128→64, k=3, stride=2) + ReLU
→ Conv1d(64→64, k=3, stride=2) + ReLU
→ Conv1d(64→128, k=3) + ReLU
→ LSTMCell(128)
→ ReLU → Linear(128→1) → Sigmoid
→ Speech probability
```
## License
The Go code is MIT licensed. The model weights are from [Silero VAD](https://github.com/snakers4/silero-vad), also MIT licensed.