Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nitely/nim-silero-vad

Silero VAD (Voice Activity Detection)
https://github.com/nitely/nim-silero-vad

Last synced: 21 days ago
JSON representation

Silero VAD (Voice Activity Detection)

Host: GitHub
URL: https://github.com/nitely/nim-silero-vad
Owner: nitely
License: mit
Created: 2024-12-11T21:31:38.000Z (22 days ago)
Default Branch: master
Last Pushed: 2024-12-11T22:17:43.000Z (22 days ago)
Last Synced: 2024-12-11T22:29:04.139Z (22 days ago)
Language: C++
Size: 0 Bytes
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

        # Silero VAD

Voice Activity Detection using the [Silero VAD](https://github.com/snakers4/silero-vad) ONNX model. Port of [silero-vad-go](https://github.com/streamer45/silero-vad-go).

## Install

```

nimble install silerovad

```

## Compatibility

- Nim +2.2.0

- [ONNX Runtime v1.20.1](https://github.com/microsoft/onnxruntime/releases/tag/v1.20.1)

## Install ONNX

[Onnxruntime dynamic library](https://github.com/microsoft/onnxruntime/releases/tag/v1.20.1).

Use `-d:silerovadNoDynLib` if you want to avoid dynamic linking.

## Usage

```nim

import pkg/silerovad

let samples = readWav("./samples/jfk.wav")

let cfg = newDetectorConfig(

  modelPath = "./models/silero_vad.onnx",

  sampleRate = 16000,

  threshold = 0.5,

  minSilenceDurationMs = 100,

  speechPadMs = 30,

  logLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_WARNING

)

var dtr = newDetector(cfg)

doAssert dtr.detect(samples) ==

  @[

    Segment(startAt: 0.29, endAt: 2.238),  # And so my fellow Americans

    Segment(startAt: 3.586, endAt: 3.774),  # ask

    Segment(startAt: 4.002, endAt: 4.382),  # not

    Segment(startAt: 5.378, endAt: 7.678),  # what your country can do for you

    Segment(startAt: 8.162, endAt: 10.654)  # ask what you can do for your country

  ]

```

Note last segment endAt is 0 if the data does not have silence at the end.

## Examples

- [Real-time speech to text](https://github.com/nitely/speech-to-text/blob/master/src/app.nim)

## Notes

This library expects 16kHz samplerate and mono audio.

Use this command to convert audio files into the expected format:

```

ffmpeg -i audio_src.wav -ar 16000 -ac 1 audio_dest.wav

```

## LICENSE

MIT