https://github.com/pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
https://github.com/pyannote/pyannote-audio

overlapped-speech-detection pretrained-models pytorch speaker-change-detection speaker-diarization speaker-embedding speaker-recognition speaker-verification speech-activity-detection speech-processing voice-activity-detection

Last synced: about 1 month ago
JSON representation

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Host: GitHub
URL: https://github.com/pyannote/pyannote-audio
Owner: pyannote
License: mit
Created: 2016-03-07T17:26:15.000Z (over 9 years ago)
Default Branch: main
Last Pushed: 2025-05-02T08:13:52.000Z (about 2 months ago)
Last Synced: 2025-05-05T22:15:49.036Z (about 2 months ago)
Topics: overlapped-speech-detection, pretrained-models, pytorch, speaker-change-detection, speaker-diarization, speaker-embedding, speaker-recognition, speaker-verification, speech-activity-detection, speech-processing, voice-activity-detection
Language: Jupyter Notebook
Homepage: http://pyannote.github.io
Size: 252 MB
Stars: 7,421
Watchers: 77
Forks: 873
Open Issues: 42
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

awesome-github-repos - pyannote/pyannote-audio - Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding (Jupyter Notebook)
Awesome-pytorch-list-CNVersion - pyannote-audio
Awesome-pytorch-list - pyannote-audio
StarryDivineSky - pyannote/pyannote-audio
stars - pyannote/pyannote-audio - Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding (Jupyter Notebook)

README

        Using `pyannote.audio` open-source toolkit in production?  

Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.

# `pyannote.audio` speaker diarization toolkit

`pyannote.audio` is an open-source toolkit written in Python for speaker diarization. Based on [PyTorch](pytorch.org) machine learning framework, it comes with state-of-the-art [pretrained models and pipelines](https://hf.co/pyannote), that can be further finetuned to your own data for even better performance.



 



## TL;DR

1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio`

2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions

3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions

4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).

```python

from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained(

    "pyannote/speaker-diarization-3.1",

    use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)

import torch

pipeline.to(torch.device("cuda"))

# apply pretrained pipeline

diarization = pipeline("audio.wav")

# print the result

for turn, _, speaker in diarization.itertracks(yield_label=True):

    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")

# start=0.2s stop=1.5s speaker_0

# start=1.8s stop=3.9s speaker_1

# start=4.2s stop=5.7s speaker_0

# ...

```

## Highlights

- :hugs: pretrained [pipelines](https://hf.co/models?other=pyannote-audio-pipeline) (and [models](https://hf.co/models?other=pyannote-audio-model)) on [:hugs: model hub](https://huggingface.co/pyannote)

- :exploding_head: state-of-the-art performance (see [Benchmark](#benchmark))

- :snake: Python-first API

- :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/)

## Documentation

- [Changelog](CHANGELOG.md)

- [Frequently asked questions](FAQ.md)

- Models

  - Available tasks explained

  - [Applying a pretrained model](tutorials/applying_a_model.ipynb)

  - [Training, fine-tuning, and transfer learning](tutorials/training_a_model.ipynb)

- Pipelines

  - Available pipelines explained

  - [Applying a pretrained pipeline](tutorials/applying_a_pipeline.ipynb)

  - [Adapting a pretrained pipeline to your own data](tutorials/adapting_pretrained_pipeline.ipynb)

  - [Training a pipeline](tutorials/voice_activity_detection.ipynb)

- Contributing

  - [Adding a new model](tutorials/add_your_own_model.ipynb)

  - [Adding a new task](tutorials/add_your_own_task.ipynb)

  - Adding a new pipeline

  - Sharing pretrained models and pipelines

- Blog

  - 2022-12-02 > ["How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"](tutorials/adapting_pretrained_pipeline.ipynb)

  - 2022-10-23 > ["One speaker segmentation model to rule them all"](https://herve.niderb.fr/fastpages/2022/10/23/One-speaker-segmentation-model-to-rule-them-all)

  - 2021-08-05 > ["Streaming voice activity detection with pyannote.audio"](https://herve.niderb.fr/fastpages/2021/08/05/Streaming-voice-activity-detection-with-pyannote.html)

- Videos

  - [Introduction to speaker diarization](https://umotion.univ-lemans.fr/video/9513-speech-segmentation-and-speaker-diarization/) / JSALT 2023 summer school / 90 min

  - [Speaker segmentation model](https://www.youtube.com/watch?v=wDH2rvkjymY) / Interspeech 2021 / 3 min

  - [First release of pyannote.audio](https://www.youtube.com/watch?v=37R_R82lfwA) / ICASSP 2020 / 8 min

- Community contributions (not maintained by the core team)

  - 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)

## Benchmark

Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x.

Those numbers are diarization error rates (in %):

| Benchmark                                                                                                                   | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [pyannoteAI](https://www.pyannote.ai) |

| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ |

| [AISHELL-4](https://arxiv.org/abs/2104.03603)                                                                               | 14.1                                                   | 12.2                                                   | 11.9                                             |

| [AliMeeting](https://www.openslr.org/119/) (channel 1)                                                                      | 27.4                                                   | 24.4                                                   | 22.5                                             |

| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)                                                                        | 18.9                                                   | 18.8                                                   | 16.6                                             |

| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)                                                                        | 27.1                                                   | 22.4                                                   | 20.9                                             |

| [AVA-AVD](https://arxiv.org/abs/2111.14448)                                                                                 | 66.3                                                   | 50.0                                                   | 39.8                                             |

| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6                                                   | 28.4                                                   | 22.2                                             |

| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))                             | 26.9                                                   | 21.7                                                   | 17.2                                             |

| [Earnings21](https://github.com/revdotcom/speech-datasets)                                                                  | 17.0                                                   | 9.4                                                    | 9.0                                              |

| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)                                                                            | 61.5                                                   | 51.2                                                   | 43.8                                             |

| [MSDWild](https://github.com/X-LANCE/MSDWILD)                                                                               | 32.8                                                   | 25.3                                                   | 19.8                                             |

| [RAMC](https://www.openslr.org/123/)                                                                                        | 22.5                                                   | 22.2                                                   | 18.4                                             |

| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)                                                       | 8.2                                                    | 7.8                                                    | 7.6                                              |

| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)                                                                | 11.2                                                   | 11.3                                                   | 9.4                                              |

[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)

## Citations

If you use `pyannote.audio` please use the following citations:

```bibtex

@inproceedings{Plaquet23,

  author={Alexis Plaquet and Hervé Bredin},

  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},

  year=2023,

  booktitle={Proc. INTERSPEECH 2023},

}

```

```bibtex

@inproceedings{Bredin23,

  author={Hervé Bredin},

  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},

  year=2023,

  booktitle={Proc. INTERSPEECH 2023},

}

```

## Development

The commands below will setup pre-commit hooks and packages needed for developing the `pyannote.audio` library.

```bash

pip install -e .[dev,testing]

pre-commit install

```

## Test

```bash

pytest

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pyannote/pyannote-audio

Awesome Lists containing this project

README