https://github.com/thewh1teagle/pyannote-rs

pyannote audio diarization in rust
https://github.com/thewh1teagle/pyannote-rs

asr diarization onnxruntime rust speech-recognition whisper

Last synced: 5 months ago
JSON representation

pyannote audio diarization in rust

Host: GitHub
URL: https://github.com/thewh1teagle/pyannote-rs
Owner: thewh1teagle
License: mit
Created: 2024-07-31T18:38:31.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-12-13T11:38:44.000Z (7 months ago)
Last Synced: 2025-02-04T07:38:32.952Z (5 months ago)
Topics: asr, diarization, onnxruntime, rust, speech-recognition, whisper
Language: Rust
Homepage: http://crates.io/crates/pyannote-rs
Size: 141 KB
Stars: 49
Watchers: 3
Forks: 4
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # pyannote-rs

[![Crates](https://img.shields.io/crates/v/pyannote-rs?logo=rust)](https://crates.io/crates/pyannote-rs/)

[![License](https://img.shields.io/github/license/thewh1teagle/pyannote-rs?color=00aaaa&logo=license)](https://github.com/thewh1teagle/pyannote-rs/blob/main/LICENSE)

Pyannote audio diarization in Rust

## Features

- Compute 1 hour of audio in less than a minute on CPU.

- Faster performance with DirectML on Windows and CoreML on macOS.

- Accurate timestamps with Pyannote segmentation.

- Identify speakers with wespeaker embeddings.

## Install

```console

cargo add pyannote-rs

```

## Usage

See [Building](BUILDING.md)

## Examples

See [examples](examples)

How it works

pyannote-rs uses 2 models for speaker diarization:

1. **Segmentation**: [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) identifies when speech occurs.

2. **Speaker Identification**: [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM) identifies who is speaking.

Inference is powered by [onnxruntime](https://onnxruntime.ai/).

- The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks).

- The embedding model processes filter banks (audio features) extracted with [knf-rs](https://github.com/thewh1teagle/knf-rs).

Speaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.

## Credits

Big thanks to [pyannote-onnx](https://github.com/pengzhendong/pyannote-onnx) and [kaldi-native-fbank](https://github.com/csukuangfj/kaldi-native-fbank)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thewh1teagle/pyannote-rs

Awesome Lists containing this project

README