https://github.com/helloooideeeeea/realtimecutvadlibrary
A real-time Voice Activity Detection (VAD) library for iOS and macOS using Silero models powered by ONNX Runtime. Includes advanced noise suppression and audio preprocessing with WebRTC APM, supporting seamless WAV data output with header metadata.
https://github.com/helloooideeeeea/realtimecutvadlibrary
ios macos onnxruntime silero-vad vad webrtc-audio-processing
Last synced: 6 months ago
JSON representation
A real-time Voice Activity Detection (VAD) library for iOS and macOS using Silero models powered by ONNX Runtime. Includes advanced noise suppression and audio preprocessing with WebRTC APM, supporting seamless WAV data output with header metadata.
- Host: GitHub
- URL: https://github.com/helloooideeeeea/realtimecutvadlibrary
- Owner: helloooideeeeea
- License: mit
- Created: 2025-02-03T12:55:14.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-04-06T08:34:39.000Z (6 months ago)
- Last Synced: 2025-04-12T08:59:16.778Z (6 months ago)
- Topics: ios, macos, onnxruntime, silero-vad, vad, webrtc-audio-processing
- Language: Swift
- Homepage:
- Size: 4.02 MB
- Stars: 7
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RealTime Silero VAD iOS/macOS Library
A real-time Voice Activity Detection (VAD) library for iOS and macOS using Silero models. This library helps detect human voice in real-time, allowing developers to implement efficient voice-based features in their applications.
---
## Features
- **Real-time Voice Activity Detection (VAD)**
- **Supports Silero Model Versions v4 and v5**
- **Customizable audio sample rates**
- **Outputs WAV data with automatic sample rate conversion to 16 kHz**
- **iOS and macOS support**
- **Supports CocoaPods and Swift Package Manager (SPM)**
- **🆕 Real-time PCM stream callback (`voiceDidContinueWithPCMFloatData`)**---
## Sample iOS App Demo
Check out the sample iOS app demonstrating real-time VAD:
[Sample iOS App Demo](https://github.com/user-attachments/assets/6e4d6ae5-4d34-4114-930b-f399bcf123ba)
---
## Installation
### Using CocoaPods
Add the following to your `Podfile` to integrate the library:
```ruby
pod 'RealTimeCutVADLibrary', '~> 1.0.9'
```Then, run:
```bash
pod install
```### Using Swift Package Manager (SPM)
You can also integrate the library using Swift Package Manager. Add the following to your `Package.swift` file:
```swift
.dependencies: [
.package(url: "https://github.com/helloooideeeeea/RealTimeCutVADLibrary.git", from: "1.0.9")
]
```Or, add the URL directly through Xcode's **File > Swift Packages > Add Package Dependency**.
---
## Usage
Import the library and set up VAD in your `ViewController`:
```swift
import RealTimeCutVADLibraryclass ViewController: UIViewController {
var vadManager: VADWrapper?override func viewDidLoad() {
super.viewDidLoad()// Initialize VAD Manager
vadManager = VADWrapper()// Set VAD delegate to receive callbacks
vadManager?.delegate = self// Set Silero model version (v4 or v5). Version v5 is recommended.
vadManager?.setSileroModel(.v5)// Calling setVADThreshold is optional. If not called, the recommended default values will be used.
// vadManager?.setThresholdWithVadStartDetectionProbability(0.7,0.7,0.5,0.95,10,57)// Set audio sample rate (8, 16, 24, or 48 kHz)
vadManager?.setSamplerate(.SAMPLERATE_48)// Retrieve audio channel data from Microphone
guard let channelData = buffer.floatChannelData else {
return
}// Extract frame length from the audio buffer
let frameLength = UInt(buffer.frameLength)// Select the first channel as mono audio data
let monoralData = channelData[0] // This is UnsafeMutablePointer// Send the audio data directly to VAD processing
vadManager?.processAudioData(withBuffer: monoralData, count: frameLength)// ❌ Deprecated Usage. Do NOT use this method: Slow due to NSNumber conversion
var monoralDataArray: [NSNumber] = []
for i in 0..,
vadEndDetectionProbability: <#T##Float#>,
voiceStartVadTrueRatio: <#T##Float#>,
voiceEndVadFalseRatio: <#T##Float#>,
voiceStartFrameCount: <#T##Int32#>,
voiceEndFrameCount: <#T##Int32#>)
```
By adjusting these parameters, you can fine-tune the strictness of voice segmentation to better suit your application needs.
- **Silero v5 Performance**:
The performance of Silero model v5 may vary, and adjusting the thresholds might be necessary to achieve optimal results. There are also discussions on this topic, such as [this one](https://github.com/SYSTRAN/faster-whisper/issues/934#issuecomment-2439340290).## Algorithm Explanation
### ONNX Runtime for Silero VAD
This library leverages **ONNX Runtime (C++)** to run the Silero VAD models efficiently. By utilizing ONNX Runtime, the library achieves high-performance inference across different platforms (iOS/macOS), ensuring fast and accurate voice activity detection.### Why Use WebRTC's Audio Processing Module (APM)?
This library utilizes WebRTC's APM for several key reasons:- **High-pass Filtering**: Removes low-frequency noise.
- **Noise Suppression**: Reduces background noise for clearer voice detection.
- **Gain Control**: Adaptive digital gain control enhances audio levels.
- **Sample Rate Conversion**: Silero VAD requires a sample rate of 16 kHz, and APM ensures conversion from other sample rates (8, 24, or 48 kHz).### Audio Processing Workflow
1. **Input Audio Configuration**: The library supports sample rates of 8 kHz, 16 kHz, 24 kHz, and 48 kHz.
2. **Audio Preprocessing**:
- The audio is split into chunks based on the sample rate.
- APM processes these chunks with filters and gain adjustments.
- Audio is converted to 16 kHz for Silero VAD compatibility.3. **Voice Activity Detection**:
- The processed audio chunks are passed to Silero VAD.
- VAD outputs a probability score indicating voice activity.4. **Algorithm for Voice Detection**:
- **Voice Start Detection**: When the VAD probability exceeds the threshold, a pre-buffer stores audio frames to capture speech onset.
- **Voice End Detection**: Once silence is detected over a set number of frames, recording stops, and the audio is output as WAV data.5. **Output**:
- The resulting audio data is provided as WAV with a sample rate of 16 kHz.### WebRTC APM Configuration
The following configurations are applied to optimize voice detection:
```cpp
config.gain_controller1.enabled = true;
config.gain_controller1.mode = webrtc::AudioProcessing::Config::GainController1::kAdaptiveDigital;
config.gain_controller2.enabled = true;
config.high_pass_filter.enabled = true;
config.noise_suppression.enabled = true;
config.transient_suppression.enabled = true;
config.voice_detection.enabled = false;
```---
## Additional Resources
- [RealTimeCutVADCXXLibrary](https://github.com/helloooideeeeea/RealTimeCutVADCXXLibrary)
---
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.