{"id":13434219,"url":"https://github.com/pyannote/pyannote-audio","last_synced_at":"2025-05-13T15:02:15.708Z","repository":{"id":37390612,"uuid":"53344691","full_name":"pyannote/pyannote-audio","owner":"pyannote","description":"Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding ","archived":false,"fork":false,"pushed_at":"2025-05-02T08:13:52.000Z","size":264172,"stargazers_count":7421,"open_issues_count":42,"forks_count":873,"subscribers_count":77,"default_branch":"main","last_synced_at":"2025-05-05T22:15:49.036Z","etag":null,"topics":["overlapped-speech-detection","pretrained-models","pytorch","speaker-change-detection","speaker-diarization","speaker-embedding","speaker-recognition","speaker-verification","speech-activity-detection","speech-processing","voice-activity-detection"],"latest_commit_sha":null,"homepage":"http://pyannote.github.io","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pyannote.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["hbredin"]}},"created_at":"2016-03-07T17:26:15.000Z","updated_at":"2025-05-05T20:38:05.000Z","dependencies_parsed_at":"2023-09-26T14:14:25.606Z","dependency_job_id":"947d3ba7-5c03-4733-9981-0a215b2ccf5c","html_url":"https://github.com/pyannote/pyannote-audio","commit_stats":{"total_commits":2138,"total_committers":50,"mean_commits":42.76,"dds":0.2483629560336763,"last_synced_commit":"11b56a137a578db9335efc00298f6ec1932e6317"},"previous_names":[],"tags_count":37,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyannote%2Fpyannote-audio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyannote%2Fpyannote-audio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyannote%2Fpyannote-audio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyannote%2Fpyannote-audio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pyannote","download_url":"https://codeload.github.com/pyannote/pyannote-audio/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253968000,"owners_count":21992252,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["overlapped-speech-detection","pretrained-models","pytorch","speaker-change-detection","speaker-diarization","speaker-embedding","speaker-recognition","speaker-verification","speech-activity-detection","speech-processing","voice-activity-detection"],"created_at":"2024-07-31T02:01:50.284Z","updated_at":"2025-05-13T15:02:15.636Z","avatar_url":"https://github.com/pyannote.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/hbredin"],"categories":["Software","Jupyter Notebook","Audio","Python","Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","By Language","Pytorch \u0026 related libraries","Speech Enhancement \u0026 Audio Processing","语音识别与合成_其他","Uncategorized","Speech, Voice \u0026 Alignment","Speech Processing","Audio Related Packages"],"sub_categories":["Framework","NLP \u0026 Speech Processing｜自然语言处理 \u0026 语音处理:","Data Science","NLP \u0026 Speech Processing:","Voice Activity Detection (VAD)","网络服务_其他","Uncategorized","IO router / deconstructed loops anchor","Speech-to-Text"],"readme":"Using `pyannote.audio` open-source toolkit in production?  \nConsider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.\n\n# `pyannote.audio` speaker diarization toolkit\n\n`pyannote.audio` is an open-source toolkit written in Python for speaker diarization. Based on [PyTorch](pytorch.org) machine learning framework, it comes with state-of-the-art [pretrained models and pipelines](https://hf.co/pyannote), that can be further finetuned to your own data for even better performance.\n\n\u003cp align=\"center\"\u003e\n \u003ca href=\"https://www.youtube.com/watch?v=37R_R82lfwA\"\u003e\u003cimg src=\"https://img.youtube.com/vi/37R_R82lfwA/0.jpg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n## TL;DR\n\n1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio`\n2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions\n3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions\n4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).\n\n```python\nfrom pyannote.audio import Pipeline\npipeline = Pipeline.from_pretrained(\n    \"pyannote/speaker-diarization-3.1\",\n    use_auth_token=\"HUGGINGFACE_ACCESS_TOKEN_GOES_HERE\")\n\n# send pipeline to GPU (when available)\nimport torch\npipeline.to(torch.device(\"cuda\"))\n\n# apply pretrained pipeline\ndiarization = pipeline(\"audio.wav\")\n\n# print the result\nfor turn, _, speaker in diarization.itertracks(yield_label=True):\n    print(f\"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}\")\n# start=0.2s stop=1.5s speaker_0\n# start=1.8s stop=3.9s speaker_1\n# start=4.2s stop=5.7s speaker_0\n# ...\n```\n\n## Highlights\n\n- :hugs: pretrained [pipelines](https://hf.co/models?other=pyannote-audio-pipeline) (and [models](https://hf.co/models?other=pyannote-audio-model)) on [:hugs: model hub](https://huggingface.co/pyannote)\n- :exploding_head: state-of-the-art performance (see [Benchmark](#benchmark))\n- :snake: Python-first API\n- :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/)\n\n## Documentation\n\n- [Changelog](CHANGELOG.md)\n- [Frequently asked questions](FAQ.md)\n- Models\n  - Available tasks explained\n  - [Applying a pretrained model](tutorials/applying_a_model.ipynb)\n  - [Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb)\n- Pipelines\n  - Available pipelines explained\n  - [Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb)\n  - [Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb)\n  - [Training a pipeline](tutorials/voice_activity_detection.ipynb)\n- Contributing\n  - [Adding a new model](tutorials/add_your_own_model.ipynb)\n  - [Adding a new task](tutorials/add_your_own_task.ipynb)\n  - Adding a new pipeline\n  - Sharing pretrained models and pipelines\n- Blog\n  - 2022-12-02 \u003e [\"How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges\"](tutorials/adapting_pretrained_pipeline.ipynb)\n  - 2022-10-23 \u003e [\"One speaker segmentation model to rule them all\"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)\n  - 2021-08-05 \u003e [\"Streaming voice activity detection with pyannote.audio\"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)\n- Videos\n  - [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min\n  - [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min\n  - [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min\n- Community contributions (not maintained by the core team)\n  - 2024-04-05 \u003e [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)\n\n## Benchmark\n\nOut of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x.\nThose numbers are diarization error rates (in %):\n\n| Benchmark                                                                                                                   | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [pyannoteAI](https://www.pyannote.ai) |\n| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ |\n| [AISHELL-4](https://arxiv.org/abs/2104.03603)                                                                               | 14.1                                                   | 12.2                                                   | 11.9                                             |\n| [AliMeeting](https://www.openslr.org/119/) (channel 1)                                                                      | 27.4                                                   | 24.4                                                   | 22.5                                             |\n| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)                                                                        | 18.9                                                   | 18.8                                                   | 16.6                                             |\n| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)                                                                        | 27.1                                                   | 22.4                                                   | 20.9                                             |\n| [AVA-AVD](https://arxiv.org/abs/2111.14448)                                                                                 | 66.3                                                   | 50.0                                                   | 39.8                                             |\n| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6                                                   | 28.4                                                   | 22.2                                             |\n| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))                             | 26.9                                                   | 21.7                                                   | 17.2                                             |\n| [Earnings21](https://github.com/revdotcom/speech-datasets)                                                                  | 17.0                                                   | 9.4                                                    | 9.0                                              |\n| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)                                                                            | 61.5                                                   | 51.2                                                   | 43.8                                             |\n| [MSDWild](https://github.com/X-LANCE/MSDWILD)                                                                               | 32.8                                                   | 25.3                                                   | 19.8                                             |\n| [RAMC](https://www.openslr.org/123/)                                                                                        | 22.5                                                   | 22.2                                                   | 18.4                                             |\n| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)                                                       | 8.2                                                    | 7.8                                                    | 7.6                                              |\n| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)                                                                | 11.2                                                   | 11.3                                                   | 9.4                                              |\n\n[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)\n\n## Citations\n\nIf you use `pyannote.audio` please use the following citations:\n\n```bibtex\n@inproceedings{Plaquet23,\n  author={Alexis Plaquet and Hervé Bredin},\n  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},\n  year=2023,\n  booktitle={Proc. INTERSPEECH 2023},\n}\n```\n\n```bibtex\n@inproceedings{Bredin23,\n  author={Hervé Bredin},\n  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},\n  year=2023,\n  booktitle={Proc. INTERSPEECH 2023},\n}\n```\n\n## Development\n\nThe commands below will setup pre-commit hooks and packages needed for developing the `pyannote.audio` library.\n\n```bash\npip install -e .[dev,testing]\npre-commit install\n```\n\n## Test\n\n```bash\npytest\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyannote%2Fpyannote-audio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpyannote%2Fpyannote-audio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyannote%2Fpyannote-audio/lists"}