https://github.com/numq/voice-activity-detection
JVM library for voice activity detection written in Kotlin based on C library fvad and Silero
https://github.com/numq/voice-activity-detection
cpp fvad java jni jvm kotlin libfvad ml onnx silero silero-vad vad voice-activity-detection
Last synced: 7 months ago
JSON representation
JVM library for voice activity detection written in Kotlin based on C library fvad and Silero
- Host: GitHub
- URL: https://github.com/numq/voice-activity-detection
- Owner: numq
- License: apache-2.0
- Created: 2024-11-25T22:56:14.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2025-03-01T15:44:25.000Z (7 months ago)
- Last Synced: 2025-03-01T16:31:05.289Z (7 months ago)
- Topics: cpp, fvad, java, jni, jvm, kotlin, libfvad, ml, onnx, silero, silero-vad, vad, voice-activity-detection
- Language: Kotlin
- Homepage:
- Size: 2.49 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Voice Activity Detection
JVM library for voice activity detection written in Kotlin based on the C
library [libfvad](https://github.com/dpirch/libfvad) and ML model [Silero](https://github.com/snakers4/silero-vad)### See also
- [Stretch](https://github.com/numq/stretch) *to change the speed of audio without changing the pitch*
- [Speech recognition](https://github.com/numq/speech-recognition) *to transcribe audio to text*
- [Speech generation](https://github.com/numq/speech-generation) *to generate voice audio from text*
- [Text generation](https://github.com/numq/text-generation) *to generate text from prompt*
- [Noise reduction](https://github.com/numq/noise-reduction) *to remove noise from audio*
## When to use
> [!NOTE]
> For best results, it is recommended to apply noise reduction to the input data.### libfvad
Detects any audio activity, regardless of the sound type. The detection behavior depends on the selected mode. Suitable
for general voice activity detection.### Silero
Detects voice activity specifically containing human speech. Best for speech-focused tasks like transcription and
voice-controlled systems.## Features
- Detects voice activity in PCM audio data
- Supports any sampling rate and number of channels due to resampling and downmixing
- Supports different detection modes to balance between sensitivity and accuracy (fvad)## Installation
- Download latest [release](https://github.com/numq/voice-activity-detection/releases)
- Add library dependency
```kotlin
dependencies {
implementation(file("/path/to/jar"))
}
```### libfvad
- Unzip binaries
### Silero
- Add ONNX dependency
```kotlin
dependencies {
implementation("com.microsoft.onnxruntime:onnxruntime:1.20.0")
}
```## Usage
> See the [example](example) module for implementation details
### TL;DR
- Call `detect` to process the input data, use `isContinuous = true` with streaming audio
### Step-by-step
- Load binaries if you are going to use fvad
```kotlin
VoiceActivityDetection.Fvad.load(libfvad = "/path/to/libfvad", voiceActivityDetection = "/path/to/voice-activity-detection")
```- Create an instance
### fvad
```kotlin
VoiceActivityDetection.Fvad.create()
```### Silero
```kotlin
VoiceActivityDetection.Silero.create()
```- Call `inputSizeForMillis` to get the input data size for N milliseconds
- Call `minimumInputSize` to get the audio producer buffer size for real-time detection
- Call `detect` passing the input data, sample rate and number of channels as arguments
- Call `reset` to reset the internal state - for example when the audio source changes
- Call `close` to release resources
## Requirements
- JVM version 9 or higher
## License
This project is licensed under the [Apache License 2.0](LICENSE)
## Acknowledgments
- [libfvad](https://github.com/dpirch/libfvad)
- [Silero](https://github.com/snakers4/silero-vad)