awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
https://github.com/wq2012/awesome-diarization
Last synced: 14 days ago
JSON representation
-
Datasets
-
Augmentation noise sources
-
Diarization datasets
- 2000 NIST Speaker Recognition Evaluation - 6 (Switchboard)](https://github.com/google/speaker-id/tree/master/publications/LstmDiarization/evaluation/NIST_SRE2000/Disk6_ground_truth), [Disk-8 (CALLHOME)](https://github.com/google/speaker-id/tree/master/publications/LstmDiarization/evaluation/NIST_SRE2000/Disk8_ground_truth) | Multiple | $2400.00 | [Evaluation Plan](https://www.nist.gov/sites/default/files/documents/2017/09/26/spk-2000-plan-v1.0.htm_.pdf) |
- 2003 NIST Rich Transcription Evaluation Data
- CALLHOME American English Speech - id/blob/master/publications/LstmDiarization/evaluation/CALLHOME_American_English/ch109_whitelist.txt) |
- The ICSI Meeting Corpus
- The AMI Meeting Corpus
- Fisher English Training Speech Part 1 Speech
- Fisher English Training Part 2, Speech
- VoxConverse - visual diarisation dataset consisting of over 50 hours of multispeaker clips of human speech, extracted from YouTube videos |
- The AliMeeting Corpus
- The ICSI Meeting Corpus
- The AMI Meeting Corpus
-
Speaker embedding training sets
- TIMIT
- VCTK
- LibriSpeech - scale (1000 hours) corpus of read English speech. |
- Multilingual LibriSpeech (MLS) - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. |
- LibriVox
- The Spoken Wikipedia Corpora
- BookTubeSpeech - videos where people share their opinions on books - from YouTube. The dataset can be downloaded using [BookTubeSpeech-download](https://github.com/wq2012/BookTubeSpeech-download). |
- DeepMine
- Multilingual LibriSpeech (MLS) - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. |
- NISP-Dataset
- VoxBlink2 - NC-SA 4.0 | Multilingual dataset from [VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark](https://arxiv.org/abs/2407.11510) |
- VoxCeleb 1&2 - visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube. |
- DeepMine
-
-
Other learning materials
-
Books
-
Online courses
-
Tech blogs
- Literature Review For Speaker Change Detection
- Halil Erdoğan
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Who spoke when! How to Build your own Speaker Diarization Module
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization with Kaldi
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
- Speaker Diarization: Separation of Multiple Speakers in an Audio File
-
Video tutorials
- pyannote audio: neural building blocks for speaker diarization
- Google's Diarization System: Speaker Diarization with LSTM
- Fully Supervised Speaker Diarization: Say Goodbye to clustering
- Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
- Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings
- Robust Speaker Diarization for Meetings: the ICSI system
- 【机器之心&博文视点】入门声纹技术|第二讲:声纹分割聚类与其他应用
-
-
Products
-
Video tutorials
- Recorder app
- Google Cloud Speech-to-Text API
- Watson Speech To Text API
- Speaker Diarization API
- Tingwu (听悟)
- Azure Conversation Transcription API
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Recorder app
- Amazon Transcribe
- Tingwu (听悟)
- Azure Conversation Transcription API
-
-
Publications
-
Other
- Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation
- End-to-end speaker segmentation for overlap-aware resegmentation
- DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding
- DOVER-Lap: A method for combining overlap-aware diarization outputs
- Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks
- An End-to-End Speaker Diarization Service for improving Multimedia Content Access
- Spot the conversation: speaker diarisation in the wild
- Speaker Diarization with Region Proposal Network
- Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
- Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection
- Speaker diarization using latent space clustering in generative adversarial network
- A study of semi-supervised speaker diarization system using gan mixture model
- Learning deep representations by multilayer bootstrap networks for speaker diarization
- Enhancements for Audio-only Diarization Systems
- LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
- Meeting Transcription Using Virtual Microphone Arrays
- Speaker diarisation using 2D self-attentive combination of embeddings
- Speaker Diarization with Lexical Information
- Neural speech turn segmentation and affinity propagation for speaker diarization
- Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks
- Joint Speaker Diarization and Recognition Using Convolutional and Recurrent Neural Networks
- Speaker Diarization with LSTM
- Speaker diarization using deep neural network embeddings
- Speaker diarization using convolutional neural network for statistics accumulation refinement
- pyannote. metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
- Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
- Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
- A Speaker Diarization System for Studying Peer-Led Team Learning Groups
- Diarization resegmentation in the factor analysis subspace
- A study of the cosine distance-based mean shift for telephone speech diarization
- Speaker diarization with PLDA i-vector scoring and unsupervised calibration
- Artificial neural network features for speaker diarization
- Unsupervised methods for speaker diarization: An integrated and iterative approach
- PLDA-based Clustering for Speaker Diarization of Broadcast Streams
- Speaker diarization of meetings based on speaker role n-gram models
-
Categories
Sub Categories
Video tutorials
70
Tech blogs
69
Other
51
Special topics
33
Framework
22
Speaker embedding
15
Speaker embedding training sets
13
Diarization datasets
11
Clustering
9
Evaluation
8
Audio data augmentation
5
Other software
3
Augmentation noise sources
2
Speaker change detection
2
Audio feature extraction
2
Books
1
Online courses
1
Keywords
speaker-diarization
7
speaker-recognition
5
audio
5
python
4
diarization
4
asr
4
speaker-verification
4
pytorch
4
speech-to-text
3
deep-learning
3
acoustics
3
speaker-identification
2
tensorflow
2
room-impulse-response
2
speaker-embedding
2
voice-activity-detection
2
speech-processing
2
image-source-model
2
macos
2
signal-processing
2
machine-learning
2
ios
2
overlapped-speech-detection
1
pretrained-models
1
linux
1
lazarus
1
speaker-change-detection
1
speech-activity-detection
1
dotnet
1
aarch64
1
android
1
csharp
1
cpp
1
arm32
1
dsp
1
librosa
1
music
1
scipy
1
audio-analysis-tasks
1
audio-data
1
pyaudioanalysis
1
audio-processing
1
huggingface
1
language-model
1
speech-enhancement
1
speech-recognition
1
speech-separation
1
speech-toolkit
1
speechrecognition
1
spoken-language-understanding
1