https://github.com/zserge/govad

Silero VAD (Voice Activity Detector) in Pure Go
https://github.com/zserge/govad

Last synced: 10 days ago
JSON representation

Silero VAD (Voice Activity Detector) in Pure Go

Host: GitHub
URL: https://github.com/zserge/govad
Owner: zserge
License: mit
Created: 2026-03-30T15:46:18.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-30T15:54:05.000Z (3 months ago)
Last Synced: 2026-03-30T17:39:12.948Z (3 months ago)
Language: Go
Size: 1.03 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # govad

[![CI](https://github.com/zserge/govad/actions/workflows/ci.yml/badge.svg)](https://github.com/zserge/govad/actions/workflows/ci.yml)

[![GoDoc](https://pkg.go.dev/badge/github.com/zserge/govad.svg)](https://pkg.go.dev/github.com/zserge/govad)

[![Go Report Card](https://goreportcard.com/badge/github.com/zserge/govad)](https://goreportcard.com/report/github.com/zserge/govad)

Pure Go voice activity detection using the [Silero VAD](https://github.com/snakers4/silero-vad) neural network.

No CGo. No ONNX runtime. No external dependencies.

The model weights are embedded in the binary.

## Features

- Pure Go inference (~300 lines), zero C dependencies

- Processes 512-sample frames (32 ms at 16 kHz)

- Stateful LSTM — feed frames sequentially, get speech probabilities

- Embedded model weights — no extra files to ship

- Validated against the ONNX reference (max diff < 0.001)

## Installation

```

go get github.com/zserge/govad@latest

```

## Usage

```go

package main

import (

	"fmt"

	"github.com/zserge/govad"

)

func main() {

	// Create a VAD detector (uses embedded weights)

	v, err := govad.New()

	if err != nil {

		panic(err)

	}

	// Feed 512 float32 samples at 16 kHz per call

	samples := make([]float32, govad.SamplesPerFrame)

	// ... fill samples from your audio source ...

	prob := v.Process(samples)

	if prob > 0.5 {

		fmt.Println("Speech detected!")

	}

	// Call Reset() between unrelated audio streams

	v.Reset()

}

```

## Live microphone example

The `examples/live-vad` directory contains a complete real-time VAD demo

using [malgo](https://github.com/gen2brain/malgo) (miniaudio bindings):

```

cd examples/live-vad

go run . -threshold 0.5

```

It captures audio from your default microphone and prints speech/silence

transitions in real time.

## API

| Function | Description |

|----------|-------------|

| `govad.New()` | Create a detector with embedded default weights |

| `govad.NewFromFile(path)` | Load weights from a file |

| `govad.NewFromReader(r)` | Load weights from an `io.Reader` |

| `v.Process(samples)` | Run inference on 512 samples, returns probability `[0, 1]` |

| `v.Reset()` | Clear LSTM state for a new audio stream |

## Performance

On Apple M1:

```

BenchmarkProcess-8    1911    632370 ns/op    10112 B/op    7 allocs/op

```

~632 µs per 32 ms frame — roughly 50× faster than real time.

## Model

The weights are exported from `silero_vad_half.onnx` (Silero VAD v5, 16 kHz only).

The architecture is:

```

Audio (512 samples, 16 kHz)

  → Reflect pad (64 right)

  → Conv-STFT (n_fft=256, hop=128)

  → Magnitude spectrum

  → Conv1d(129→128, k=3) + ReLU

  → Conv1d(128→64,  k=3, stride=2) + ReLU

  → Conv1d(64→64,   k=3, stride=2) + ReLU

  → Conv1d(64→128,  k=3) + ReLU

  → LSTMCell(128)

  → ReLU → Linear(128→1) → Sigmoid

  → Speech probability

```

## License

The Go code is MIT licensed. The model weights are from [Silero VAD](https://github.com/snakers4/silero-vad), also MIT licensed.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zserge/govad

Awesome Lists containing this project

README