https://github.com/thewh1teagle/pyannote-rs
pyannote audio diarization in rust
https://github.com/thewh1teagle/pyannote-rs
asr diarization onnxruntime rust speech-recognition whisper
Last synced: 3 months ago
JSON representation
pyannote audio diarization in rust
- Host: GitHub
- URL: https://github.com/thewh1teagle/pyannote-rs
- Owner: thewh1teagle
- License: mit
- Created: 2024-07-31T18:38:31.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-13T11:38:44.000Z (5 months ago)
- Last Synced: 2025-02-04T07:38:32.952Z (4 months ago)
- Topics: asr, diarization, onnxruntime, rust, speech-recognition, whisper
- Language: Rust
- Homepage: http://crates.io/crates/pyannote-rs
- Size: 141 KB
- Stars: 49
- Watchers: 3
- Forks: 4
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pyannote-rs
[](https://crates.io/crates/pyannote-rs/)
[](https://github.com/thewh1teagle/pyannote-rs/blob/main/LICENSE)Pyannote audio diarization in Rust
## Features
- Compute 1 hour of audio in less than a minute on CPU.
- Faster performance with DirectML on Windows and CoreML on macOS.
- Accurate timestamps with Pyannote segmentation.
- Identify speakers with wespeaker embeddings.## Install
```console
cargo add pyannote-rs
```## Usage
See [Building](BUILDING.md)
## Examples
See [examples](examples)
How it works
pyannote-rs uses 2 models for speaker diarization:
1. **Segmentation**: [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) identifies when speech occurs.
2. **Speaker Identification**: [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM) identifies who is speaking.Inference is powered by [onnxruntime](https://onnxruntime.ai/).
- The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks).
- The embedding model processes filter banks (audio features) extracted with [knf-rs](https://github.com/thewh1teagle/knf-rs).Speaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.
## Credits
Big thanks to [pyannote-onnx](https://github.com/pengzhendong/pyannote-onnx) and [kaldi-native-fbank](https://github.com/csukuangfj/kaldi-native-fbank)