https://github.com/ranchlai/awesome-speaker-embedding

A curated list of speaker-embedding speaker-verification, speaker-identification resources.
https://github.com/ranchlai/awesome-speaker-embedding

List: awesome-speaker-embedding

speaker-embedding speaker-recognition speaker-verification

Last synced: 7 months ago
JSON representation

A curated list of speaker-embedding speaker-verification, speaker-identification resources.

Host: GitHub
URL: https://github.com/ranchlai/awesome-speaker-embedding
Owner: ranchlai
Created: 2021-05-27T03:23:30.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2021-08-12T08:22:43.000Z (almost 4 years ago)
Last Synced: 2024-05-22T13:04:12.838Z (about 1 year ago)
Topics: speaker-embedding, speaker-recognition, speaker-verification
Homepage:
Size: 334 KB
Stars: 42
Watchers: 3
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

ultimate-awesome - awesome-speaker-embedding - A curated list of speaker-embedding speaker-verification, speaker-identification resources. . (Other Lists / TeX Lists)

README

        # awesome-speaker-embedding

A curated list of speaker embedding/verification resources

## Must-read papers

- \[01\] [Deep Speaker: an End-to-End Neural Speaker Embedding System](https://arxiv.org/abs/1705.02304), Baidu inc, 2017

- \[02\] [Text-Independent Speaker Verification Using 3D Convolutional Neural Networks](https://arxiv.org/abs/1705.09422), 2017

- \[03\] [Speaker Recognition from Raw Waveform with SincNet](https://arxiv.org/abs/1808.00158), Bengio team,  raw waveform, 2018

- \[04\] [VoxCeleb2: Deep Speaker Recognition](https://arxiv.org/abs/1806.05622) VGG group, Interspeech 2018

- \[05\] [Generalized End-to-End Loss for Speaker Verification](https://arxiv.org/abs/1710.10467), Google, ICASSP 2017

- \[06\] [Voxceleb: Large-scale speaker verification in the wild](https://www.robots.ox.ac.uk/~vgg/publications/2019/Nagrani19/nagrani19.pdf),VGG group, 2019

- \[07\] [Deep neural network embeddings for text-independent speaker verification](http://danielpovey.com/files/2017_interspeech_embeddings.pdf), Interspeech 2017, original TDNN paper from Johns Hopkins , MFCC/frame-based/time-delay/multi-class, softmax + cross-entropy loss

- \[08\] [Robust DNN Embeddings for Speaker Recognition](https://arxiv.org/pdf/1803.09153v1.pdf), ICASSP 2018, the X-vector paper Johns Hopkins,  based on TDNN, improved by adding Noise and reverberation for augmentation

- \[09\] [Front-end factor analysis for speaker verification](http://groups.csail.mit.edu/sls/archives/root/publications/2010/Dehak_IEEE_Transactions.pdf), 2011, IEEE TASLP,  the 'i-vector' paper from Johns Hopkins 

- \[10\] [TDNN-UBM Time delay deep recognition neural network-based universal background models for speaker](https://www.danielpovey.com/files/2015_asru_tdnn_ubm.pdf) , 2015 

- \[11\] [Deep neural networks for small footprint text-dependent speaker verification](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41939.pdf), The 'D-vector' paper from Johns Hopkins 

- \[12\] [Analysis of Score Normalization in Multilingual Speaker Recognition](http://www.fit.vutbr.cz/research/groups/speech/publi/2017/matejka_interspeech2017_IS170803.pdf), Interspeech 2017, The S-norm paper, useful for score normalization 

## Benchmarks (not very accurate)

Results reported (by the authors) on [Voxceleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt), [VoxCeleb1-E](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/list_test_all2.txt) and [VoxCeleb1-H](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/list_test_hard2.txt).

Voxceleb1 public results (continuously updating...)

| Name |  feature,model,activation/loss |  VoxCeleb1| VoxCeleb1-E| VoxCeleb1-H| Link |Affiliation|Year |

| ---- | -------- | -------- | ------- | -------  |-------  |-------  |--------  |

|X205| DPN68,Res2Net50| 0.7712%| 0.8968%| 1.637% |[report](https://arxiv.org/pdf/2011.00200.pdf) | AISpeech | 2020|

|Veridas| ResNet152|1.08%|-|-|[report](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/data_workshop_2020/veridas.pdf)|das-nano|2020

|DKU-DukeECE | Resnet,ECAPA-TDNN| 0.888%|1.133%|2.008%|[report](https://arxiv.org/pdf/2010.12731.pdf)|Duke University|2020|

|IDLAB | Resnet,ECAPA-TDNN| -|-|-|[report](https://arxiv.org/pdf/2010.12468.pdf)|Ghent University -|2020|

|speechbrain | ECAPA-TDNN| 0.69% |-|-|[link](https://github.com/speechbrain/speechbrain/tree/develop/recipes/VoxCeleb/SpeakerRec)| -|2021|

## Must-read technical reports

[VOXSRC 2019 reports](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/files/VoxSRC19.pdf)

## Datasets

Commonly-used speaker datasets: 

- [TIMIT](https://catalog.ldc.upenn.edu/LDC93S1): A small dataset for speaker and asr, non-free

- [Free ST](https://www.openslr.org/38/): Mandarin speech corpus for speaker and asr, free 

- [NIST SRE](https://sre.nist.gov/) NIST Speaker Recognition Evaluation, non-free

- [AIShell-1](https://www.openslr.org/33/): Mandarin speech corpus, divided into train/dev/test, free. 

- [AIShell-2](http://www.aishelltech.com/aishell_2): free for education, non-free for commercial

- [AIShell-3](https://www.openslr.org/93/): free, for speaker, asr and tts

- [AIShell-4](https://arxiv.org/abs/2104.03603), will be released soon

- [HI-MIA](https://www.openslr.org/85/): free, for far-field text-dependent  speaker verification and  keyword spotting

- [SITW](http://www.speech.sri.com/projects/sitw/) Speakers in the Wild, 

- [Voxceleb 1&2](https://www.openslr.org/82/), Celebrity interview video/audio extracted from Youtube

- [Cn-Celeb 1&2](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html), Multi-genres speaker dataset in the wild, utterances are from chinese celebrities. 

## Challenges

- [VoxCeleb Speaker Recognition Challenge (VoxSRC 2019)](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2019.html) [report](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/files/VoxSRC19.pdf)

- [VoxCeleb Speaker Recognition Challenge (VoxSRC 2020)](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2020.html)

- [VoxCeleb Speaker Recognition Challenge (VoxSRC 2021)](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2021.html)

- [Short-duration Speaker Verification (SdSV) Challenge 2020](https://sdsvc.github.io/2020/)

- [Short-duration Speaker Verification (SdSV) Challenge 2021](https://sdsvc.github.io/)

- [CTS Speaker Recognition Challenge 2020](https://sre.nist.gov/cts-challenge)

- [Far-Field Speaker Verification Challenge (FFSVC 2020)](http://2020.ffsvc.org/)

## Great Talks / Tutorials

- [X-vectors: Neural Speech Embeddings for Speaker Recognition](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/data_workshop_2020/keynote/daniel_talk.mp4), Daniel Garcia-Romero, 2020

- [2020声纹识别研究与应用学术讨论会](https://hub.baai.ac.cn/view/4289)

## Code/Tools/Frameworks/Libraries

- [VGGVox](https://github.com/a-nagrani/VGGVox) The first baseline system for voxceleb dataset, originally implementated in Matlab.

- [DeepSpeaker](https://github.com/philipperemy/deep-speaker]) An End-to-End Neural Speaker Embedding System.

- [SincNet](https://github.com/mravanelli/SincNet), also in [speechbrain](https://github.com/speechbrain/speechbrain)

- [3D CNN](https://github.com/astorfi/3D-convolutional-speaker-recognition) TensorFlow implementation of 3D Convolutional Neural Networks for Speaker Verification 

- [GE2E](https://github.com/HarryVolek/PyTorch_Speaker_Verification), implementation is also in [tensorlow](https://github.com/Janghyun1230/Speaker_Verification) 

- [asv-subtools](https://github.com/Snowdar/asv-subtools)  An Open Source Tools based on Pytorch and Kaldi for speaker recognition/language identification, XMU Speech Lab. 

- [Resemblyzer](https://github.com/resemble-ai/Resemblyzer), high-level representation of a voice through a deep learning model (referred to as the voice encoder).

- [voxceleb](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/) audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube

- [Triplet-loss](https://omoindrot.github.io/triplet-loss) Triplet Loss and Online Triplet Mining in TensorFlow. 

- [Res2Net](https://github.com/Res2Net/Res2Net-PretrainedModels) The Res2net architecture used commonly in VoxCeleb speaker recognition challenge. 

- [voxceleb_trainer](https://github.com/clovaai/voxceleb_trainer) A very good speaker framework written in pytorch with pretrained models. 

- [Speechbrain](https://github.com/speechbrain/speechbrain/tree/develop/recipes/VoxCeleb/SpeakerRec)  Voxceleb recipe. 

- [kaldi](https://github.com/kaldi-asr/kaldi/tree/master/egs/voxceleb) Kaldi recipe for voxceleb. 

- [pytorch_xvectors](https://github.com/manojpamk/pytorch_xvectors) pytorch implementation of x-vectors. 

### More-recent papers

- [Attention Back-end](https://arxiv.org/pdf/2104.01541.pdf), Compare PLDA and cosine with proposed attention Back-end, model: TDNN, Resnet, data: cn-celeb

### Wining solutions of Challenges

#### VoxSRC2019

- Rank 1:  FBank, "r-vectors" using resnet, AAM loss. From Brno University of Technolog, [REPORT](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/data_workshop/BUT_Zeinali_VoxSRC.pdf)

- Rank 2: 80-dim FBank features, E-TDNN/F-TDNN models, various classification loss including softmax/AM-softmax/PLDA-softmax. From Johns Hopkins University, [REPORT](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/data_workshop/JHU-HLTCOE_VoxSRC.pdf)

- Rank 3: FBank, resnet + attentive pooling + Phonetic attention, BLSTM + ResNET, loss unclear(?). From Microsoft, [REPORT](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/data_workshop/VoxSRC_TZ_microsoft.pdf)

#### VoxSRC2020

- Rank 1: 60-dim log-FBank, ECAPA-TDNN/SE-ResNet34, S-Norm, AAM-Softmax. From IDLab, [REPORT](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/data_workshop_2020/participants/JTBD.pdf)

- Rank 2: 40-dim FBank/mean-normalized, no VAD, resnet/Res2Net, S-Norm, CM-Softmax. From AI Speech, [REPORT](https://arxiv.org/pdf/2011.00200.pdf), kaldi [recipe](https://github.com/kaldi-asr/kaldi/tree/master/egs/voxceleb) for data-aug

- Rank 3: Report not available

Please let me know if your code/repo is not listed here (ranchlai at 163.com)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ranchlai/awesome-speaker-embedding

Awesome Lists containing this project

README