awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
https://github.com/wq2012/awesome-diarization
Last synced: 5 days ago
JSON representation
-
Publications
-
Other
- Speaker diarization using deep neural network embeddings
- Unsupervised methods for speaker diarization: An integrated and iterative approach
- Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation
- DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding
- DOVER-Lap: A method for combining overlap-aware diarization outputs
- An End-to-End Speaker Diarization Service for improving Multimedia Content Access
- Spot the conversation: speaker diarisation in the wild
- Speaker Diarization with Region Proposal Network
- Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
- Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection
- Speaker diarization using latent space clustering in generative adversarial network
- A study of semi-supervised speaker diarization system using gan mixture model
- Learning deep representations by multilayer bootstrap networks for speaker diarization
- Enhancements for Audio-only Diarization Systems
- LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
- Speaker diarisation using 2D self-attentive combination of embeddings
- Speaker Diarization with Lexical Information
- Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks
- Joint Speaker Diarization and Recognition Using Convolutional and Recurrent Neural Networks
- Speaker Diarization with LSTM
- pyannote. metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
- Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
- Diarization resegmentation in the factor analysis subspace
- Speaker diarization with PLDA i-vector scoring and unsupervised calibration
- Artificial neural network features for speaker diarization
- Speaker diarization of meetings based on speaker role n-gram models
- An overview of automatic speaker diarization systems
- A spectral clustering approach to speaker diarization
- A study of the cosine distance-based mean shift for telephone speech diarization
- Stream-based speaker segmentation using speaker factors and eigenvoices
- End-to-end speaker segmentation for overlap-aware resegmentation
- Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks
- Meeting Transcription Using Virtual Microphone Arrays
- Neural speech turn segmentation and affinity propagation for speaker diarization
- Speaker diarization using deep neural network embeddings
- Speaker diarization using convolutional neural network for statistics accumulation refinement
- Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
- A Speaker Diarization System for Studying Peer-Led Team Learning Groups
- A study of the cosine distance-based mean shift for telephone speech diarization
- Unsupervised methods for speaker diarization: An integrated and iterative approach
- PLDA-based Clustering for Speaker Diarization of Broadcast Streams
- Speaker Diarization for Meeting Room Audio
- Stream-based speaker segmentation using speaker factors and eigenvoices
- AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
- A Speaker Diarization System for Studying Peer-Led Team Learning Groups
-
Special topics
- A review on speaker diarization systems and approaches
- DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
- Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
- Lexical speaker error correction: Leveraging language models for speaker diarization error correction
- DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
- TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization
- Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis
- End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
- Supervised online diarization with sample mean loss for multi-domain data
- Discriminative Neural Clustering for Speaker Diarisation
- End-to-End Neural Speaker Diarization with Permutation-Free Objectives
- End-to-End Neural Speaker Diarization with Self-attention
- Fully Supervised Speaker Diarization
- A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
- Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
- Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
- Joint Speech Recognition and Speaker Diarization via Sequence Transduction
- M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
- The Hitachi-JHU DIHARD III system: Competitive end-to-end neural diarization and x-vector clustering systems combined by DOVER-Lap
- Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Challenge
- DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization
- End-to-End Audio-Visual Neural Speaker Diarization
- MSDWild: Multi-modal Speaker Diarization Dataset in the Wild
- A Review of Speaker Diarization: Recent Advances with Deep Learning
- Speaker diarization: A review of recent research
- Says who? Deep learning models for joint speech recognition, segmentation and diarization
- Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox
- Online Speaker Diarization with Relation Network
- VoiceID on the Fly: A Speaker Recognition System that Learns from Scratch
- ODESSA at Albayzin Speaker Diarization Challenge 2018
- A Review of Speaker Diarization: Recent Advances with Deep Learning
- AVA-AVD: Audio-Visual Speaker Diarization in the Wild
-
-
Software
-
Framework
- MiniVox - source evaluation system for the online speaker diarization task. |
- SpeechBrain - source and all-in-one speech toolkit based on PyTorch. |
- pyAudioAnalysis
- AaltoASR - speech/speaker-diarization?style=social) | Python & Perl | Speaker diarization scripts, based on AaltoASR. |
- kaldi-speaker-diarization - lvl/kaldi-speaker-diarization?style=social) | Bash | Icelandic speaker diarization scripts using kaldi. |
- pyannote-audio - audio?style=social) | Python | Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding. |
- pyBK
- Speaker-Diarization - Diarization?style=social) | Python | Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers. |
- EEND - speech/EEND?style=social) | Python & Bash & Perl | End-to-End Neural Diarization. |
- VBx - vectors diarization. x-vector extractor [recipe](https://github.com/phonexiaresearch/VBx-training-recipe) |
- RE-VERB - re-verb/RE-VERB?style=social) | Python & JavaScript | RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when. |
- simple_diarizer
- Picovoice Falcon - diarization/#accuracy) speaker diarization engine written in C and available in Python, running on CPU with minimal overhead. |
- DiaPer - to-End Neural Diarization with Perceiver-Based Attractors](https://arxiv.org/pdf/2312.04324.pdf) including models pre-trained on free and public data. |
- sherpa-onnx - fsa/sherpa-onnx?style=social) | C++ & C & `C#` & Dart & Go & Java & JavaScript & Kotlin & Pascal & Python & Rust & Swift | Support speaker diarization, speech recognition, and text-to speech on various platforms with various language bindings. |
- FluidAudio - time audio processing with high accuracy. |
- SIDEKIT for diarization (s4d)
- LIUM SpkDiarization
- kaldi-asr - ci.com/kaldi-asr/kaldi.svg?branch=master)](https://travis-ci.com/kaldi-asr/kaldi) | Bash | Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation. |
- Alize LIA_SpkSeg
-
Evaluation
- pyannote-metrics - metrics?style=social) [](https://travis-ci.org/pyannote/pyannote-metrics) | Python| A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems. |
- SimpleDER
- dscore
- spyder
- CDER - phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines](https://arxiv.org/abs/2208.08042) |
- md-eval.pl - eval-v21.pl](https://github.com/jitendrab/btp/blob/master/c_code/single_diag_gaussian_no_viterbi/md-eval-v21.pl) from [jitendra](https://github.com/jitendrab); (3) [md-eval-22.pl](https://github.com/nryant/dscore/blob/master/scorelib/md-eval-22.pl) from [nryant](https://github.com/nryant) |
- Sequence Match Accuracy
- DiarizationLM - id?style=social) [](https://github.com/google/speaker-id/actions/workflows/python-app-diarizationlm.yml) | Python | Implements Word Error Rate (WER), Word Diarization Error Rate (WDER), and concatenated minimum-permutation Word Error Rate (cpWER). |
-
Clustering
- uis-rnn - rnn?style=social) [](https://travis-ci.org/google/uis-rnn) | Python & PyTorch | Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization. This clustering algorithm is **supervised**. |
- uis-rnn-sml - rnn-sml?style=social) | Python & PyTorch | A variant of UIS-RNN, for the paper Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data. |
- DNC - based Discriminative Neural Clustering (DNC) for Speaker Diarisation. Like UIS-RNN, it is **supervised**. |
- SpectralCluster - ci.org/wq2012/SpectralCluster.svg?branch=master)](https://travis-ci.org/wq2012/SpectralCluster) | Python | Spectral clustering with affinity matrix refinement operations, auto-tune, and speaker turn constraints. |
- PLDA
- PLDA - source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis). |
- Auto-Tuning Spectral Clustering - Tuning-Spectral-Clustering?style=social) | Python | Auto-tuning Spectral Clustering method that does not need development set or supervised tuning. |
- sklearn.cluster - ci.org/scikit-learn/scikit-learn.svg?branch=master)](https://travis-ci.org/scikit-learn/scikit-learn) | Python | scikit-learn clustering algorithms. |
-
Speaker embedding
- resemble-ai/Resemblyzer - ai/Resemblyzer?style=social) | d-vector | Python & PyTorch | PyTorch implementation of generalized end-to-end loss for speaker verification, which can be used for voice cloning and diarization. |
- Speaker_Verification - vector | Python & TensorFlow | Tensorflow implementation of generalized end-to-end loss for speaker verification. |
- PyTorch_Speaker_Verification - vector | Python & PyTorch | PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration. |
- Real-Time Voice Cloning - Time-Voice-Cloning?style=social) | d-vector | Python & PyTorch | Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time. |
- conformer-speaker-encoder - vector |Python & TFLite | Massively multilingual conformer-based speaker recognition models in TFLite format. |
- deep-speaker - speaker?style=social) | d-vector |Python & Keras | Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System. |
- x-vector-kaldi-tf - zeinali/x-vector-kaldi-tf?style=social) | x-vector | Python & TensorFlow & Perl | Tensorflow implementation of x-vector topology on top of Kaldi recipe. |
- kaldi-ivector - ivector?style=social) | i-vector | C++ & Perl | Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure. |
- voxceleb-ivector - ivector?style=social) | i-vector |Perl | Voxceleb1 i-vector based speaker recognition system. |
- pytorch_xvectors - vector | Python & PyTorch | PyTorch implementation of Voxceleb x-vectors. Additionaly, includes meta-learning architectures for embedding training. Evaluated with speaker diarization and speaker verification. |
- ASVtorch - vector | Python & PyTorch | ASVtorch is a toolkit for automatic speaker recognition. |
- asv-subtools - subtools?style=social) | i-vector & x-vector | Kaldi & PyTorch | ASV-Subtools is developed based on Pytorch and Kaldi for the task of speaker recognition, language identification, etc. The 'sub' of 'subtools' means that there are many modular tools and the parts constitute the whole. |
- ReDimNet
-
Speaker change detection
- change_detection - Term Memory Networks. |
- tidydiarize
-
Audio feature extraction
- LibROSA
- python_speech_features - speech-features.readthedocs.io/en/latest/ |
-
Audio data augmentation
- pyroomacoustics
- gpuRIR
- rir_simulator_python
- WavAugment
- EEND_dataprep - to-end diarization models. |
-
Other software
- VB Diarization - ci.org/wq2012/VB_diarization.svg?branch=master)](https://travis-ci.org/wq2012/VB_diarization) | Python | VB Diarization with Eigenvoice and HMM Priors. |
- DOVER-Lap - lap?style=social) | Python | Python package for combining diarization system outputs |
- Diar-az - di dataset. Kaldi to Gecko to Kaldi and corpus and back | |
-
-
Datasets
-
Diarization datasets
- VoxConverse - visual diarisation dataset consisting of over 50 hours of multispeaker clips of human speech, extracted from YouTube videos |
- The AliMeeting Corpus
- 2000 NIST Speaker Recognition Evaluation - 6 (Switchboard)](https://github.com/google/speaker-id/tree/master/publications/LstmDiarization/evaluation/NIST_SRE2000/Disk6_ground_truth), [Disk-8 (CALLHOME)](https://github.com/google/speaker-id/tree/master/publications/LstmDiarization/evaluation/NIST_SRE2000/Disk8_ground_truth) | Multiple | $2400.00 | [Evaluation Plan](https://www.nist.gov/sites/default/files/documents/2017/09/26/spk-2000-plan-v1.0.htm_.pdf) |
- 2003 NIST Rich Transcription Evaluation Data
- The ICSI Meeting Corpus
- The AMI Meeting Corpus
- Fisher English Training Speech Part 1 Speech
- Fisher English Training Part 2, Speech
- CALLHOME American English Speech - id/blob/master/publications/LstmDiarization/evaluation/CALLHOME_American_English/ch109_whitelist.txt) |
-
Speaker embedding training sets
- Multilingual LibriSpeech (MLS) - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. |
- NISP-Dataset
- VoxBlink2 - NC-SA 4.0 | Multilingual dataset from [VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark](https://arxiv.org/abs/2407.11510) |
- VCTK
- LibriSpeech - scale (1000 hours) corpus of read English speech. |
- The Spoken Wikipedia Corpora
- BookTubeSpeech - videos where people share their opinions on books - from YouTube. The dataset can be downloaded using [BookTubeSpeech-download](https://github.com/wq2012/BookTubeSpeech-download). |
- TIMIT
- Multilingual LibriSpeech (MLS) - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. |
- LibriVox
- DeepMine
- VoxCeleb 1&2 - visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube. |
-
Augmentation noise sources
-
-
Other learning materials
-
Books
-
Tech blogs
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Who spoke when! How to Build your own Speaker Diarization Module
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization with Kaldi
- Literature Review For Speaker Change Detection
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Halil Erdoğan
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
-
Online courses
-
Video tutorials
- pyannote audio: neural building blocks for speaker diarization
- Google's Diarization System: Speaker Diarization with LSTM
- Fully Supervised Speaker Diarization: Say Goodbye to clustering
- Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
- Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings
- Robust Speaker Diarization for Meetings: the ICSI system
- 【机器之心&博文视点】入门声纹技术|第二讲:声纹分割聚类与其他应用
-
-
Products
-
Video tutorials
- Recorder app
- Google Cloud Speech-to-Text API
- Watson Speech To Text API
- Speaker Diarization API
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Tingwu (听悟)
- Azure Conversation Transcription API
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
-
-
Star History
-
Video tutorials
- ![Star History Chart - history.com/#wq2012/awesome-diarization&Date)
-
Programming Languages
Categories
Sub Categories
Tech blogs
70
Video tutorials
59
Other
45
Special topics
32
Framework
20
Speaker embedding
13
Speaker embedding training sets
12
Diarization datasets
9
Clustering
8
Evaluation
8
Audio data augmentation
5
Other software
3
Augmentation noise sources
2
Speaker change detection
2
Audio feature extraction
2
Books
1
Online courses
1
Keywords
speaker-diarization
6
speaker-recognition
5
audio
5
pytorch
4
python
4
speaker-verification
4
asr
4
deep-learning
3
speech-to-text
3
acoustics
3
room-impulse-response
2
tensorflow
2
image-source-model
2
speaker-identification
2
speech-processing
2
speaker-embedding
2
ios
2
voice-activity-detection
2
macos
2
machine-learning
2
diarization
2
signal-processing
2
audio-analysis-tasks
1
voice-cloning
1
overlapped-speech-detection
1
pretrained-models
1
speaker-change-detection
1
scipy
1
speech-activity-detection
1
music
1
aarch64
1
librosa
1
android
1
arm32
1
dsp
1
cpp
1
csharp
1
tts
1
audio-data
1
voice-recognition
1
transformers
1
spoken-language-understanding
1
speechrecognition
1
speech-toolkit
1
pyaudioanalysis
1
speech-separation
1
speech-recognition
1
audio-processing
1
speech-enhancement
1
huggingface
1