{"id":15136464,"url":"https://github.com/thewh1teagle/pyannote-rs","last_synced_at":"2025-10-23T11:31:28.536Z","repository":{"id":251953548,"uuid":"836390054","full_name":"thewh1teagle/pyannote-rs","owner":"thewh1teagle","description":"pyannote audio diarization in rust","archived":false,"fork":false,"pushed_at":"2024-12-13T11:38:44.000Z","size":144,"stargazers_count":49,"open_issues_count":7,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-04T07:38:32.952Z","etag":null,"topics":["asr","diarization","onnxruntime","rust","speech-recognition","whisper"],"latest_commit_sha":null,"homepage":"http://crates.io/crates/pyannote-rs","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thewh1teagle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-31T18:38:31.000Z","updated_at":"2025-01-30T09:39:33.000Z","dependencies_parsed_at":"2024-08-06T20:14:36.055Z","dependency_job_id":"21638de2-8d67-4aaf-bf5e-0893cddbdbb0","html_url":"https://github.com/thewh1teagle/pyannote-rs","commit_stats":{"total_commits":66,"total_committers":1,"mean_commits":66.0,"dds":0.0,"last_synced_commit":"72d8056565d92cc1c9551bc90ad4cabdaee75772"},"previous_names":["thewh1teagle/pyannote-rs"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thewh1teagle%2Fpyannote-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thewh1teagle%2Fpyannote-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thewh1teagle%2Fpyannote-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thewh1teagle%2Fpyannote-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thewh1teagle","download_url":"https://codeload.github.com/thewh1teagle/pyannote-rs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237821552,"owners_count":19371784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","diarization","onnxruntime","rust","speech-recognition","whisper"],"created_at":"2024-09-26T06:21:57.384Z","updated_at":"2025-10-23T11:31:28.169Z","avatar_url":"https://github.com/thewh1teagle.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pyannote-rs\n\n[![Crates](https://img.shields.io/crates/v/pyannote-rs?logo=rust)](https://crates.io/crates/pyannote-rs/)\n[![License](https://img.shields.io/github/license/thewh1teagle/pyannote-rs?color=00aaaa\u0026logo=license)](https://github.com/thewh1teagle/pyannote-rs/blob/main/LICENSE)\n\nPyannote audio diarization in Rust\n\n## Features\n\n- Compute 1 hour of audio in less than a minute on CPU.\n- Faster performance with DirectML on Windows and CoreML on macOS.\n- Accurate timestamps with Pyannote segmentation.\n- Identify speakers with wespeaker embeddings.\n\n## Install\n\n```console\ncargo add pyannote-rs\n```\n\n## Usage\n\nSee [Building](BUILDING.md)\n\n## Examples\n\nSee [examples](examples)\n\n\u003cdetails\u003e\n\u003csummary\u003eHow it works\u003c/summary\u003e\n\npyannote-rs uses 2 models for speaker diarization:\n\n1. **Segmentation**: [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) identifies when speech occurs.\n2. **Speaker Identification**: [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM) identifies who is speaking.\n\nInference is powered by [onnxruntime](https://onnxruntime.ai/).\n\n- The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks).\n- The embedding model processes filter banks (audio features) extracted with [knf-rs](https://github.com/thewh1teagle/knf-rs).\n\nSpeaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.\n\u003c/details\u003e\n\n## Credits\n\nBig thanks to [pyannote-onnx](https://github.com/pengzhendong/pyannote-onnx) and [kaldi-native-fbank](https://github.com/csukuangfj/kaldi-native-fbank)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthewh1teagle%2Fpyannote-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthewh1teagle%2Fpyannote-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthewh1teagle%2Fpyannote-rs/lists"}