{"id":13668539,"url":"https://github.com/Snowdar/asv-subtools","last_synced_at":"2025-04-27T01:31:31.899Z","repository":{"id":38372104,"uuid":"244350981","full_name":"Snowdar/asv-subtools","owner":"Snowdar","description":"An Open Source Tools for Speaker Recognition","archived":false,"fork":false,"pushed_at":"2024-08-05T05:55:44.000Z","size":11396,"stargazers_count":599,"open_issues_count":21,"forks_count":134,"subscribers_count":22,"default_branch":"master","last_synced_at":"2024-11-11T05:38:01.291Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Snowdar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-02T11:12:06.000Z","updated_at":"2024-11-04T22:32:11.000Z","dependencies_parsed_at":"2024-11-11T05:41:17.966Z","dependency_job_id":null,"html_url":"https://github.com/Snowdar/asv-subtools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snowdar%2Fasv-subtools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snowdar%2Fasv-subtools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snowdar%2Fasv-subtools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snowdar%2Fasv-subtools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Snowdar","download_url":"https://codeload.github.com/Snowdar/asv-subtools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251076943,"owners_count":21532603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T08:00:40.378Z","updated_at":"2025-04-27T01:31:26.887Z","avatar_url":"https://github.com/Snowdar.png","language":"Python","funding_links":[],"categories":["Software","Code/Tools/Frameworks/Libraries","语音识别"],"sub_categories":["Speaker embedding","网络服务_其他"],"readme":"# ASV-Subtools: An Open Source Tools for Speaker Recognition\n\nASV-Subtools is developed based on [Pytorch](https://pytorch.org/) and [Kaldi](http://www.kaldi-asr.org/) for the task of speaker recognition, language identification, etc.  \nThe 'sub' of 'subtools' means that there are many modular tools and the parts constitute the whole. \n\n\u003e Copyright: [TalentedSoft-XMU Speech Lab] [XMU Speech Lab](https://speech.xmu.edu.cn/) (Xiamen University, China) [TalentedSoft](http://www.talentedsoft.com/) (TalentedSoft, China)\n\u003e Apache 2.0\n\u003e\n\u003e Author   : Miao Zhao (Email: snowdar@stu.xmu.edu.cn), Jianfeng Zhou, Zheng Li, Hao Lu, Fuchuan Tong, Dexin Liao, Tao Jiang  \n\u003e Current Maintainer: Tao Jiang (Email: sssyousen@163.com)  \n\u003e Co-author: Lin Li, Qingyang Hong\n\n\nCitation: \n\n```\n@inproceedings{tong2021asv,\n  title={{ASV-Subtools}: {Open} Source Toolkit for Automatic Speaker Verification},\n  author={Tong, Fuchuan and Zhao, Miao and Zhou, Jianfeng and Lu, Hao and Li, Zheng and Li, Lin and Hong, Qingyang},\n  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n  pages={6184--6188},\n  year={2021},\n  organization={IEEE}\n}\n```\n\n\n---\n- **Content**\n  * [Introduction](#introduction)\n    + [Project Structure](#project-structure)\n    + [Training Framework](#training-framework)\n    + [Data Pipeline](#data-pipeline)\n    + [Update Pipeline](#update-pipeline)\n    + [Support List](#support-list)\n  * [Ready to Start](#ready-to-start)\n    + [1. Install Kaldi](#1-install-kaldi)\n    + [2. Create Project](#2-create-project)\n    + [3. Clone ASV-Subtools](#3-clone-asv-subtools)\n    + [4. Install Python Requirements](#4-install-python-requirements)\n    + [5. Support Multi-GPU Training](#5-support-multi-gpu-training)\n    + [6. Extra Installation (Option)](#6-extra-installation-option)\n      - [Train A Multi-Task Learning Model Based on Kaldi](#train-a-multi-task-learning-model-based-on-kaldi)\n      - [Accelerate X-vector Extractor of Kaldi](#accelerate-x-vector-extractor-of-kaldi)\n      - [Add A MMI-GMM Classifier for The Back-End](#add-a-mmi-gmm-classifier-for-the-back-end)\n  * [Training Model](#training-model)\n  * [Recipe](#recipe)\n    + [[1] Voxceleb Recipe [Speaker Recognition]](#1-voxceleb-recipe-speaker-recognition)\n    + [[2] OLR Challenge 2020 Baseline Recipe [Language Identification]](#2-olr-challenge-2020-baseline-recipe-language-identification)\n    + [[3] OLR Challenge 2021 Baseline Recipe [Language Identification]](#3-olr-challenge-2021-baseline-recipe-language-identification)\n    + [[4] CNSRC 2022 Baseline Recipe [Speaker Recognitiopn]](#4-cnsrc-2022-baseline-recipe-speaker-recognition)\n  * [Feedback](#feedback)\n  * [Acknowledgement](#acknowledgement)\n\n\u003c!--Table of contents generated with markdown-toc, see http://ecotrust-canada.github.io/markdown-toc--\u003e\n---\n\n## Introduction  \n\nIn ASV-Subtools, [Kaldi](http://www.kaldi-asr.org/) is used to extract acoustic features and scoring in the back-end and [Pytorch](https://pytorch.org/) is used to build a model freely and train it with a custom style.\n\nThe project structure, training framework and data pipeline shown as follows could help you to have some insights into ASV-Subtools.\n\n\u003e By the way, **if you can not see the pictures in Github**, maybe you should try to check the DNS of your network or use a VPN agent. If you are a student of XMU, then the VPN of campus network could be very helpful for these types of problems (see [https://vpn.xmu.edu.cn](https://vpn.xmu.edu.cn) for a configuration). Of course, **at least the last way is to clone ASV-Subtools to your local notebook.**\n\n### Project Structure  \nASV-Subtools contains **three main branches**:\n+ Basic Shell Scripts: data processing, back-end scoring (most are based on Kaldi)\n+ Kaldi: training of basic model (i-vector, TDNN, F-TDNN and multi-task learning x-vector)\n+ Pytorch: training of custom model (less limitation)\n\n\u003c/br\u003e\n\u003ccenter\u003e\u003cimg src=\"./doc/ASV-Subtools-project-structure.png\" width=\"600\"/\u003e\u003c/center\u003e\n\u003c/br\u003e\n\nFor pytorch branch, there are **two important concepts**:\n+ **Model Blueprint**: the path of ```your_model.py```\n+ **Model Creation** : the code to init a model class, such as ```resnet(40, 1211, loss=\"AM\")```\n\nIn ASV-Subtools, the model is individual, which means that we should know the path of ```model.py``` and how to initialize this model class at least when using this model in training or testing module. This structure is designed to avoid modifying codes of static modules frequently. For example, if the embedding extractor is wrote down as a called program and we use an inline method ```from my_model_py import my_model``` to import a fixed model from a fixed ```model.py``` , then it will be not free for ```model_2.py```, ```model_3.py``` and so on.\n\n**Note that**, all models ([torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)) shoud inherit [libs.nnet.framework.TopVirtualNnet](./pytorch/libs/nnet/framework.py) class to get some default functions, such as **auto-saving model creation and blueprint**, extracting emdedding of whole utterance, step-training, computing accuracy, etc.. It is easy to transform the original model of Pytorch to ASV-Subtools model by inheriting. Just modify your ```model.py``` w.r.t this [x-vector example](./pytorch/model/xvector.py).\n\n### Training Framework  \nThe basic training framework is provided here and the relations between every module are very clear. So it will be not complex if you want to change anything when you want to have a custom ASV-Subtools.  \n\n**Note that**, [libs/support/utils.py](./pytorch/libs/support/utils.py) has many common functions, so it is imported in most of ```*.py```.\n\n\u003c/br\u003e\n\u003ccenter\u003e\u003cimg src=\"./doc/pytorch-training-framework.png\" width=\"600\"/\u003e\u003c/center\u003e\n\u003c/br\u003e\n\n### Data Pipeline  \nHere, a data pipeline is given to show the relation between Kaldi and Pytorch. There are only two interfaces, **reading acoustic features** and **writing x-vectors**, and both of them are implemented by [kaldi_io](https://github.com/vesis84/kaldi-io-for-python).\n\nOf course, this data pipeline could be also followed to know the basic principle of xvector-based speaker recognition.  \n\n\u003c/br\u003e\n\u003ccenter\u003e\u003cimg src=\"./doc/pytorch-data-pipeline.png\" width=\"600\"/\u003e\u003c/center\u003e\n\u003c/br\u003e\n\n### Update Pipeline\n- **[20221113](https://mp.weixin.qq.com/s/HaqiwuwALsWdNxYnbU17YA)**\n  + Runtime module is implemented.\n  + Conformer X-vector is implemented.\n  + [User Manual for ASV-Subtools is implemented](https://speech.xmu.edu.cn/2022/1124/c18207a465302/page.htm)\n- **[20220707](https://mp.weixin.qq.com/s/L1TJdZdUyE1OruqcNOJy8Q)**\n  + Online Datasets is implemented (Including online feature extracting, online VAD, online augmentation and online x-vector extracting)\n  + Supporting mixed precision training.\n  + Runtime module for exporting jit model.\n  + Updating some models.\n  + Feature Decomposition and Cosine Similar Adversarial Learning (FD-AL)\n### Support List\n\n- **Multi-GPU Training Solution**\n  + [x] [DistributedDataParallel (DDP)](https://pytorch.org/docs/stable/nn.html#distributeddataparallel) [Built-in function of Pytorch]\n  + [x] [Horovod](https://github.com/horovod/horovod)\n\n- **Front-end**\n  + [x] [Convenient Augmentation of Reverb, Noise, Music and Babble](./augmentDataByNoise.sh)\n  + [x] Inverted [Specaugment](https://arxiv.org/pdf/1904.08779.pdf) [Note, it is still not available with multi-gpu and you will not get a better result if do it.]\n  + [x] [Mixup](https://arxiv.org/pdf/1710.09412.pdf) [For speaker recognition, see this [paper](https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2250.pdf).]\n  + [x] Online Datasets [Including online feature extracting, online VAD, online augmentation and online xvector extracting, developed by Dexin Liao] \n\n- **Model**\n  + [x] [Standard X-vector](http://www.danielpovey.com/files/2017_interspeech_embeddings.pdf)\n  + [x] [Extended X-vector](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=8683760)\n  + [x] Resnet1d\n  + [x] [Resnet2d](http://www.danielpovey.com/files/2019_interspeech_nist_sre18.pdf)\n  + [x] [F-TDNN X-vector](http://www.danielpovey.com/files/2019_interspeech_nist_sre18.pdf)\n  + [x] [ECAPA X-vector](https://arxiv.org/abs/2005.07143) [[Source codes](https://github.com/lawlict/ECAPA-TDNN) ]\n  + [x] [RepVGG](https://arxiv.org/pdf/2101.03697.pdf) \n  + [x] [Conformer X-vector](https://arxiv.org/abs/2211.07201.pdf)  \n\n- **Component**\n  + [x] [Attentive Statistics Pooling](https://arxiv.org/pdf/1803.10963v1.pdf)\n  + [x] [Learnable Dictionary Encoding (LDE) Pooling](https://arxiv.org/pdf/1804.00385.pdf)\n  + [x] [Multi-Head Attention Pooling](https://upcommons.upc.edu/bitstream/handle/2117/178623/2616.pdf?sequence=1\u0026isAllowed=y) [The codes could be found [here](./pytorch/libs/nnet/pooling.py), by Snowdar.]\n  + [x] [Global Multi-Head Attention Pooling](https://www.researchgate.net/publication/341085045_Multi-Resolution_Multi-Head_Attention_in_Deep_Speaker_Embedding)\n  + [x] [Multi-Resolution Multi-Head Attention Pooling](https://www.researchgate.net/publication/341085045_Multi-Resolution_Multi-Head_Attention_in_Deep_Speaker_Embedding)\n  + [x] [Sequeze and Excitation (SE)](https://arxiv.org/pdf/1709.01507.pdf) [A resnet1d-based SE example of speaker recognition could be found in this [paper](https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1704.pdf), by Jianfeng Zhou.] [Updating resnet2d-based SE]\n  + [x] [Xi-vector embedding](https://ieeexplore.ieee.org/document/9463712), [by [Dr. Kong Aik Lee](https://ieeexplore.ieee.org/author/37293718000).]\n\n- **Loss Function**\n  + [x] Softmax Loss (Affine + Softmax + Cross-Entropy)\n  + [x] [AM-Softmax Loss](https://arxiv.org/pdf/1801.05599.pdf)\n  + [x] [AAM-Softmax Loss](https://arxiv.org/pdf/1801.07698v1.pdf)\n  + [x] [Ring Loss](https://arxiv.org/pdf/1803.00130.pdf)\n\n  \u003c!--+ [x] [Curricular Margin Softmax Loss](https://arxiv.org/pdf/2004.00288.pdf)--\u003e\n  \u003c!--It does not work in my experiments--\u003e\n\n- **Optimizer** [Out of Pytorch built-in functions]\n  + [x] [Lookahead](https://arxiv.org/pdf/1907.08610.pdf) [A wrapper optimizer]\n  + [x] [RAdam](https://arxiv.org/pdf/1908.03265v1.pdf)\n  + [x] Ralamb [RAdam + [Layer-wise Adaptive Rate Scaling](https://openreview.net/pdf?id=rJ4uaX2aW) (LARS)]\n  + [x] [Novograd](https://arxiv.org/pdf/1905.11286.pdf)\n  + [x] [Gradient Centralization](https://arxiv.org/pdf/2004.01461.pdf) [Extra bound to optimizer]\n\n- **Training Strategy**\n  + [x] [AdamW](https://arxiv.org/pdf/1711.05101v1.pdf) + [WarmRestarts](https://arxiv.org/pdf/1608.03983v4.pdf)\n  + [x] SGD + [ReduceLROnPlateau](https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.ReduceLROnPlateau)\n  + [x] [Training with Magin Warmup Strategy](https://arxiv.org/pdf/1904.03479.pdf)\n  + [x] [Heated Up Strategy](https://arxiv.org/pdf/1809.04157.pdf)\n  + [x] [Multi-task Learning with Phonetic Information](http://yiliu.org.cn/papers/Speaker_Embedding_Extraction_with_Phonetic_Information.pdf) (Kaldi) [[Source codes](https://github.com/mycrazycracy/speaker-embedding-with-phonetic-information) was contributed by [Yi Liu](http://yiliu.org.cn/). Thanks.]\n  + [x] [Multi-task Learning with Phonetic Information (Pytorch)](./recipe/ap-olr/runMultiTaskXvector.py) [developed by Zheng Li]\n  + [x] [Feature Decomposition and Cosine Similar Adversarial Learning (FD-AL)](./pytorch/launcher/runEtdnn-FD-AL-trainer.py) [[Reference] (https://doi.org/10.48550/arXiv.2205.14294)] [developed by Fuchuan Tong] \n\n- **Back-End**\n  + [x] LDA, Submean, Whiten (ZCA), Vector Length Normalization\n  + [x] Cosine Similarity\n  + [x] Basic Classifiers: SVM, GMM, Logistic Regression (LR)\n  + [x] PLDA Classifiers: [PLDA](https://ravisoji.com/assets/papers/ioffe2006probabilistic.pdf), APLDA, [CORAL](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12443/11842), [CORAL+](https://arxiv.org/pdf/1812.10260), [LIP](http://150.162.46.34:8080/icassp2014/papers/p4075-garcia-romero.pdf), [CIP](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=9054113) [[Python versions](./score/pyplda) was contributed by Jianfeng Zhou. For more details, see the [note](./score/pyplda/Domain-Adaptation-of-PLDA-in-Speaker-Recognition.pdf).]\n  + [x] Score Normalization: [S-Norm](http://www.crim.ca/perso/patrick.kenny/kenny_Odyssey2010_presentation.pdf), [AS-Norm](https://www.researchgate.net/profile/Daniele_Colibro/publication/221480280_Comparison_of_Speaker_Recognition_Approaches_for_Real_Applications/links/545e4f6e0cf295b561602c42/Comparison-of-Speaker-Recognition-Approaches-for-Real-Applications.pdf)\n  + [x] Metric: EER, Cavg, minDCF\n\n- **Runtime**\n  + [x] export jit model.(./pytorch/pytorch/pipeline/export_jit_model.sh)\n  + [x] Cmake for constructing project.(./runtime)\n\n- **Others**\n  + [x] [Learning Rate Finder](https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html)\n  + [x] Support [TensorboardX](https://tensorflow.google.cn/tensorboard) in Log System\n  + [x] Training with AMP\n\n## Ready to Start  \n### 1. Install Kaldi  \nPytorch-training is not much related to Kaldi, but we have not provided other interfaces to concatenate acoustic feature and training module now. So if you don't want to use Kaldi, you could change the [libs.egs.egs.ChunkEgs](./pytorch/libs/egs/egs.py) class where the features are given to Pytorch only by [torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset). Besides, you should also change the interface of extracting x-vector after training. Note that, most of scripts which require Kaldi could be not available in this case, such as subtools/makeFeatures.sh and subtools/augmentDataByNoise.sh.\n\n**If you prefer to use Kaldi, then install Kaldi firstly w.r.t http://www.kaldi-asr.org/doc/install.html.**\n\nHere are conclusive stages:\n\n```shell\n# Download Kaldi\ngit clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream\ncd kaldi\n\n# You could check the INSTALL file of current directory for more details of installation\ncat INSTALL\n\n# Compile tools firstly\ncd tools\nbash extras/check_dependencies.sh\nmake -j 4\n\n# Config src before compiling\ncd ../src\n./configure --shared\n\n# Check depend and compile\nmake depend -j 4\nmake -j 4\ncd ..\n```\n\n### 2. Create Project  \nCreate your project with **4-level name** relative to Kaldi root directory (1-level), such as **kaldi/egs/xmuspeech/sre**. It is important for the project environment. For more details, see [subtools/path.sh](./path.sh).\n\n```shell\n# Suppose current directory is kaldi root directory\nmkdir -p kaldi/egs/xmuspeech/sre\n```\n\n### 3. Clone ASV-Subtools  \nASV-Subtools could be seen as a set of tools like 'utils' or 'steps' of Kaldi, so there are only two extra stages to complete the final installation:\n+ Clone ASV-Subtools to your project.\n+ Install the requirements of python (**Python3 is recommended**).\n\nHere is the method cloning ASV-Subtools from Github:\n\n```shell\n# Clone asv-subtools from github\ncd kaldi/egs/xmuspeech/sre\ngit clone https://github.com/Snowdar/asv-subtools.git subtools\n```\n\n### 4. Install Python Requirements  \n+ Pytorch\u003e=1.10: \n  ```shell\n  conda create -n subtools python=3.8\n  conda activate subtools\n  conda install pytorch=1.10.0 torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge\n  ```\n+ Other requirements: numpy, thop, pandas, progressbar2, matplotlib, scipy (option), sklearn (option)  \n    ```shell\n  # progressbar2 needs to install progeressbar first  \n  pip3 install progressbar\n  pip3 install progressbar2\n  pip3 install -r subtools/requirements.txt\n  ```\n\n### 5. Support Multi-GPU Training  \nASV-Subtools provide both **DDP (recommended)** and Horovod solutions to support multi-GPU training.\n\n**Some answers about how to use multi-GPU training, see [subtools/pytorch/launcher/runSnowdarXvector.py](./pytorch/launcher/runSnowdarXvector.py). It is very convenient and easy now.**\n\nRequirements List:  \n+ DDP: Pytorch, NCCL  \n+ Horovod: Pytorch, NCCL, Openmpi, Horovod  \n\n**An Example of Installing NCCL Based on Linux-Centos-7 and CUDA-10.2**  \nReference: https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html.  \n\n```shell\n# For a simple way, there are only three stages.\n# [1] Download rpm package of nvidia\nwget https://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm\n\n# [2] Add nvidia repo to yum (NOKEY could be ignored)\nsudo rpm -i nvidia-machine-learning-repo-rhel7-1.0.0-1.x86_64.rpm\n\n# [3] Install NCCL by yum\nsudo yum install libnccl-2.6.4-1+cuda10.2 libnccl-devel-2.6.4-1+cuda10.2 libnccl-static-2.6.4-1+cuda10.2\n```\n\nThese yum-clean commands could be very useful when you get some troubles when using yum.\n\n```shell\n# Install yum-utils firstly\nyum -y install yum-utils\n\n# Stop unfinished transactions\nyum-complete-transaction --cleanup-only\n\n# Clean duplicate and conflict\npackage-cleanup --cleandupes\n\n# Clean cached headers and packages\nyum clean all\n```\n\nIf you want to install Openmpi and Horovod, see https://github.com/horovod/horovod for more details.\n\n### 6. Extra Installation (Option)\nThere are some extra installations for some special applications.\n\n#### Train A Multi-Task Learning Model Based on Kaldi\nUse [subtools/kaldi/runMultiTaskXvector.sh](./kaldi/runMultiTaskXvector.sh) to train a model with multi-task learning,  but it requires some extra codes.\n```shell\n# Enter your project, such as kaldi/egs/xmuspeech/sre and make sure ASV-Subtools is cloned here\n# Just run this patch to compile some extra C++ commands with Kaldi's format\ncd kaldi/egs/xmuspeech/sre\nbash subtools/kaldi/patch/runPatch-multitask.sh\n```\n\n#### Accelerate X-vector Extractor of Kaldi\nIt will spend so much time to compile nnet3 models for the utterances with different frames when extracting x-vectors based on Kaldi. To optimize this problem, ASV-Subtools provides an **offine** modification (MOD) in [subtools/kaldi/sid/nnet3/xvector/extract_xvectors.sh](./kaldi/sid/nnet3/xvector/extract_xvectors.sh) to accelerate extracting. This MOD requires two extra commands, **nnet3-compile-xvector-net** and **nnet3-offline-xvector-compute**. When extracting x-vectors, all models with different input chunk-size will be compiled firstly. Then the utterances which have the same frames could share a compiled nnet3 network. It saves much time by avoiding a lot of duplicate dynamic compilations.\n\nBesides, the ```scp``` spliting type w.r.t length of utterances ([subtools/splitDataByLength.sh](./splitDataByLength.sh)) is adopted to balance the frames of different ```nj``` when multi-processes is used.\n\n```shell\n# Enter your project, such as kaldi/egs/xmuspeech/sre and make sure ASV-Subtools is cloned here\n# Just run this patch to compile some extra C++ commands with Kaldi's format\n\n# Target *.cc:\n#     src/nnet3bin/nnet3-compile-xvector-net.cc\n#     src/nnet3bin/nnet3-offline-xvector-compute.cc\n\ncd kaldi/egs/xmuspeech/sre\nbash subtools/kaldi/patch/runPatch-base-command.sh\n```\n\n#### Add A MMI-GMM Classifier for The Back-End\nIf you have run [subtools/kaldi/patch/runPatch-base-command.sh](./kaldi/patch/runPatch-base-command.sh), then it dosen't need to run again.\n\n```shell\n# Enter your project, such as kaldi/egs/xmuspeech/sre and make sure ASV-Subtools is cloned here\n# Just run this patch to compile some extra C++ commands with Kaldi's format\n\n# Target *.cc:\n#    src/gmmbin/gmm-global-init-from-feats-mmi.cc\n#    src/gmmbin/gmm-global-est-gaussians-ebw.cc\n#    src/gmmbin/gmm-global-est-map.cc\n#    src/gmmbin/gmm-global-est-weights-ebw.cc\n\ncd kaldi/egs/xmuspeech/sre\nbash subtools/kaldi/patch/runPatch-base-command.sh\n```\n## Training Model\nIf you have completed the [Ready to Start](#ready-to-start) stage, then you could try to train a model with ASV-Subtools.\n\nFor kaldi training, some launcher scripts named ```run*.sh``` could be found in [subtoos/Kaldi/](./kaldi).\n\nFor pytorch training, some launcher scripts named ```run*.py``` could be found in [subtools/pytorch/launcher/](./pytorch/launcher/). And some models named ```*.py``` could be found in [subtools/pytorch/model/](./pytorch/model).  Note that, model will be called in ```launcher.py```.\n\nHere is a pytorch training example, but you should follow a [pipeline](./recipe/voxceleb/runVoxceleb.sh) of [recipe](#recipe) to prepare your data and features before training. The part of data preprocessing is not complex and it is the same as Kaldi. \n\n```shell\n# Suppose you have followed the recipe and prepare your data and faetures, then the training could be run by follows.\n# Enter your project, such as kaldi/egs/xmuspeech/sre and make sure ASV-Subtools is cloned here\n\n# Firsty, copy a launcher to your project\ncp subtools/pytorch/launcher/runSnowdarXvector.py ./\n\n# Modify this launcher and run\n# In most of time, there are only two files, model.py and launcher.py, will be changed.\nsubtools/runLauncher.sh runSnowdarXvector.py --gpu-id=0,1,2,3 --stage=0\n```\n\n## Recipe\n### [1] Voxceleb Recipe [Speaker Recognition]\n[Voxceleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/index.html#about) is a popular dataset for the task of speaker recognition. It has two parts now, [Voxceleb1](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) and [Voxceleb2](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html).\n\nThere are **two recipes for Voxceleb**:\n\n**i. Test Voxceleb1-O only**\n\nIt means the trainset could be sampled from both Voxceleb1.dev and Voxceleb2 with a fixed training condition. The training script is available in [subtools/recipe/voxceleb/runVoxceleb.sh](./recipe/voxceleb/runVoxceleb.sh).\n\nThe voxceleb1 recipe with mfcc23\u0026pitch features is available:  \n**Link**: https://pan.baidu.com/s/1nMXaAXiOnFGRhahzVyrQmg  \n**Password**: 24sg\n\n```shell\n# Download this recipe to kaldi/egs/xmuspeech directory\ncd kaldi/egs/xmuspeech\ntar xzf voxceleb1_recipe.tar.gz\ncd voxceleb1\n\n# Clone ASV-Subtools (Suppose the configuration of related environment has been done)\ngit clone https://github.com/Snowdar/asv-subtools.git subtools\n\n# Train an extended x-vector model (Do not use multi-GPU training for it is not stable for specaugment.)\nsubtools/runPytorchLauncher.sh runSnowdarXvector-extended-spec-am.py --stage=0\n\n# Score (EER = 2.444% for voxceleb1.test)\nsubtools/recipe/voxceleb/gather_results_from_epochs.sh --vectordir exp/extended_spec_am --epochs 21 --score plda\n```\n\n**Results of Voxceleb1-O with Voxceleb1.dev.aug1:1 Training only**\n\n![results-1.png](./recipe/voxceleb/results-1.png)\n\n\u003c!--\n\u003ctable\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003cth\u003eIndex\u003c/th\u003e\n\u003cth\u003eFeatures\u003c/th\u003e\n\u003cth\u003eModel\u003c/th\u003e\n\u003cth\u003eInSpecAug\u003c/th\u003e\n\u003cth\u003eAM-Softmax (m=0.2)\u003c/th\u003e\n\u003cth\u003eBack-End\u003c/th\u003e\n\u003cth\u003eEER%\u003c/th\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e1\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003ex-vector\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e3.362\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e2\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003ex-vector\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e2.778\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e3\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003ex-vector\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e3.240\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e4\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003ex-vector\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e2.635\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e5\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003eextended x-vector\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e3.112\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e6\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003eextended x-vector\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e2.598\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e7\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003eextended x-vector\u003c/td\u003e\n\u003ctd\u003eno\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e3.293\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr style=\"white-space: nowrap;text-align:left;\"\u003e\n\u003ctd\u003e8\u003c/td\u003e\n\u003ctd\u003emfcc23\u0026pitch\u003c/td\u003e\n\u003ctd\u003eextended x-vector\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003eyes\u003c/td\u003e\n\u003ctd\u003ePLDA\u003c/td\u003e\n\u003ctd\u003e2.444\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n--\u003e\n\n\u003c!--HTML codes of table is generated by subtools/linux/generate_html_table_for_markdown.sh--\u003e\n\n**Results of Voxceleb1-O with Voxceleb1\u00262.dev.aug1:1 Training**\n\n![results-2.png](./recipe/voxceleb/results-2.png)\n\nNote, 2000 utterances are selected from no-aug-trainset as the cohort set of AS-Norm, the same below.\n\n---\n\n**ii. Test Voxceleb1-O/E/H**\n\nIt means the trainset could only be sampled from Voxceleb2 with a fixed training condition.\n\n**Old Results of Voxceleb1-O/E/H with Voxceleb2.dev.aug1:4 Training (EER%)**\n\n![results-3.png](./recipe/voxcelebSRC/results-adam.png)\n\nThese models are trained by adam + warmRestarts and they are old (so related scripts was removed).\nNote, Voxceleb1.dev is used as the trainset of back-end for the Voxceleb1-O* task and Voxceleb2.dev for others. \n\n \u003e **These basic models performs good but the results are not the state-of-the-art yet**. I found that training strategies could have an important influence on the final performance, such as the number of epoch, the value of weight decay, the selection of optimizer, and so on. Unfortunately, I have not enough time and GPU to fine-tune so many models, especially training model with a large dataset like Voxceleb2 whose duration is more than 2300h (In this case, it will spend 1~2 days to train one fbank81-based Resnet2d model for 6 epochs with 4 V100 GPUs).\n \u003e\n \u003e --#--Snowdar--2020-06-02--#--\n\n**New Results of Voxceleb1-O/E/H with Voxceleb2.dev.aug1:4 Training (EER%)**\n\nHere, this is a resnet34 benchmark model. And the training script is available in [subtools/recipe/voxcelebSRC/runVoxcelebSRC.sh](./recipe/voxcelebSRC/runVoxcelebSRC.sh). For more details, see it also. (by Snowdar)\n\n|EER%|vox1-O|vox1-O-clean|vox1-E|vox1-E-clean|vox1-H|vox1-H-clean|\n| :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n|Baseline|1.304|1.159|1.35|1.223|2.357|2.238|\n|Submean|1.262|1.096|1.338|1.206|2.355|2.223|\n|AS-Norm|1.161|1.026|-|-|-|-|\n---\n**New Results of Voxceleb1-O/E/H with Voxceleb2.dev.aug.speed1:4:2 Training (EER%)**\nHere, this is an ECAPA benchmark model. And the training script is available in [subtools/pytorch/launcher/runEcapaXvector.py](./pytorch/launcher/runEcapaXvector.py). For more details, see it also. (by Fuchuan Tong) ==new==\n\n|EER%|vox1-O|vox1-O-clean|vox1-E|vox1-E-clean|vox1-H|vox1-H-clean|\n| :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n|Baseline|1.506|1.393|1.583|1.462|2.811|2.683|\n|Submean|1.225|1.112|1.515|1.394|2.781|2.652|\n|AS-Norm|1.140|0.963|-|-|-|-|\n---\n**New Results of Voxceleb1-O/E/H with original Voxceleb2.dev (without data augmentation) Training (EER%)**\nHere, this is an statistical pooling and Xi-vector embedding benchmark model (implement on TDNN). And the training script is available in [subtools/pytorch/launcher/runSnowdar_Xivector.py](./pytorch/launcher/runSnowdar_Xivector.py). We would like to thank Dr. Kong Aik Lee for providing codes and useful discussion. (experiments conducted by Fuchuan Tong) ==2021-10-30==\n|EER%|vox1-O|vox1-E|vox1-H|\n| :--: | :--: | :--: | :--: |\n|Statistics Pooling|1.85|2.01|3.57|\n|Multi-head|1.76|2.00|3.54|\n|Xi-Vector(∅,𝜎)|1.59|1.90|3.38|\n---\n\n**New Results of Voxceleb1-O/E/H with Voxceleb2.dev (online random augmentation) Training(EER%)**\nHere, this is a resnet34 benchmark model. And the training script is available in [subtools/pytorch/launcher/runResnetXvector_online.py](./pytorch/launcher/runResnetXvector_online.py). For more details, see it also. (experiments conducted by Dexin Liao) ==2022-07-07==\n|EER%|vox1-O|vox1-O-clean|vox1-E|vox1-E-clean|vox1-H|vox1-H-clean|\n| :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n|Submean|1.071|0.920|1.257|1.135|2.205|2.072|\n|AS-Norm|0.970|0.819|-|-|-|-|\n\nHere, this is a ECAPA benchmark model. And the training script is available in [subtools/pytorch/launcher/runEcapaXvector_online.py](./pytorch/launcher/runEcapaXvector_online.py). For more details, see it also. (experiments conducted by Dexin Liao) ==2022-07-07==\n|EER%|vox1-O|vox1-O-clean|vox1-E|vox1-E-clean|vox1-H|vox1-H-clean|\n| :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n|Submean|1.045|0.904|1.330|1.211|2.430|2.303|\n|AS-Norm|0.991|0.856|-|-|-|-|\n---\n\n**New Results of Voxceleb1-O/E/H with Voxceleb2.dev (online random augmentation) Training(EER%)**\nHere, this is a Conformer benchmark model. And the training script is available in [subtools/pytorch/launcher/runTransformerXvector.py](./pytorch/launcher/runTransformerXvector.py). For more details, see it also. (experiments conducted by Dexin Liao) ==2022-11-15==\n* Egs = Voxceleb2_dev(online random aug) + random chunk(3s) \n* Optimization = [adamW (lr = 1e-6 - 1e-3) + 1cycle] x 4 GPUs (total batch-size=512)\n* Conformer + FC-Swish-LN + ASP + FC-LN + AAM-Softmax (margin = 0.2))\n* Back-end = near + Cosine\n* LM: Large-Margin Fine-tune (margin: 0.2 --\u003e 0.5, chunk: 6s)\n\n|Config|EER%|vox1-O|vox1-O-clean|vox1-E|vox1-E-clean|vox1-H|vox1-H-clean|\n| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n|6L-256-4H-4Sub|Submean|1.204|1.074|1.386|1.267|2.416|2.294|\n| |AS-Norm|1.029|0.952|-|-|-|-|\n|+SAM training |cosine|1.103|0.984|1.350|1.234|2.380|2.257|\n| |LM| 1.034|0.899|1.181|1.060|2.079|1.953|\n| |AS-Norm|0.943|0.792|-|-|-|-|\n|6L-256D-4H-2Sub |cosine|1.066|0.915|1.298|1.177|2.167|2.034|\n| |LM| 1.029|0.888|1.160|1.043|1.923|1.792|\n| |AS-Norm|0.949|0.792|-|-|-|-|\n---\n\n**Results of RTF**\n* RTF is evaluated on LibTorch-based runtime, see `subtools/runtime`\n* One thread is used for CPU threading and TorchScript inference. \n* CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz.\n\n| Model | Config | Params | RTF | \n|:-----|:------  |:------:|:---:|\n|  ResNet34  | base32 |  6.80M  | 0.090 |\n|  ECAPA     | C1024  |  16.0M  | 0.071 |\n|            | C512   |  6.53M  | 0.030 |\n|  Conformer | 6L-256D-4H-4Sub  |  18.8M |   0.025   |  \n|            | 6L-256D-4H-2Sub  |  22.5M |   0.070   |   \n---\n\n### [2] OLR Challenge 2020 Baseline Recipe [Language Identification]\n\nOLR Challenge 2020 is closed now.\n\n**Baseline**: [subtools/recipe/ap-olr2020-baseline](./recipe/ap-olr2020-baseline).  \n\u003e The **top training script of baseline** is available in [subtools/recipe/ap-olr2020-baseline/run.sh](./recipe/ap-olr2020-baseline/run.sh). And the baseline results could be seen in [subtools/recipe/ap-olr2020-baseline/results.txt](./recipe/ap-olr2020-baseline/results.txt).\n\n**Plan**: Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Liming Song and Cheng Yang: [AP20-OLR Challenge: Three Tasks and Their Baselines](https://arxiv.org/pdf/2006.03473.pdf), submitted to APSIPA ASC 2020.\n\n### [3] OLR Challenge 2021 Baseline Recipe [Language Identification]\n\n**Baseline**: [subtools/recipe/olr2021-baseline](./recipe/olr2021-baseline).  \n\u003e The **top training script of baseline** is available in [subtools/recipe/olr2021-baseline/run.sh](./recipe/olr2021-baseline/run.sh). \n\n**Plan**: Binling Wang, Wenxuan Hu, Jing Li, Yiming Zhi, Zheng Li, Qingyang Hong, Lin Li, Dong Wang, Liming Song and Cheng Yang: [OLR 2021 Challenge: Datasets, Rules and Baselines](http://cslt.riit.tsinghua.edu.cn/mediawiki/images/a/a8/OLR_2021_Plan.pdf), submitted to APSIPA ASC 2021.\n\nFor previous challenges (2016-2020), see http://olr.cslt.org.\n\n### [4] CNSRC 2022 Baseline Recipe [Speaker Recognition]\n\n**Baseline**: [subtools/recipe/cnsrc](./recipe/cnsrc).  \n\u003e The **top training script of baseline** is available in [subtools/recipe/cnsrc/sv/run-cnsrc_sv.sh](./recipe/cnsrc/sv/run-cnsrc_sv.sh) and [subtools/recipe/cnsrc/sr/run-cnsrc_sr.sh](./recipe/cnsrc/sr/run-cnsrc_sr.sh).\n\n**Plan**: Dong Wang, Qingyang Hong, Liantian Li, Hui Bu: [CNSRC 2022 Evaluation Plan](http://aishell-jiaofu.oss-cn-hangzhou.aliyuncs.com/cnsrc.pdf).\n\nFor more informations, see http://cnceleb.org.\nFor any Challenge questions please contact lilt@cslt.org and for any baseline questions contact sssyousen@163.com.\n|Tasks|Trainging|Evaluation|Metrics|\n| :--: | :--: | :--: | :--: |\n|Task1 SV|CN-Celeb.T|CN-Celeb.E|minDCF:0.463 EER:9.141%|\n|Task2 SR|CN-Celeb.T|SR.eval|mAP:0.242|\n\n---\n\n**New Results of CN-Celeb.E with CN-Celeb.T (online random augmentation) Training(EER%)**\n|config|pretrain ASR|EER%|minDCF|\n| :--: | :--: | :--: | :--: |\n|6L-256D-4H|-|8.39%|0.4748|\n| |Multi-CN|7.95%|0.4534|\n| |WenetSpeech|7.42%|0.4427|\n---\n\n\n## Feedback\n+ If you find bugs or have some questions, please create a github issue in this repository to let everyone knows it, so that a good solution could be contributed.\n+ If you want to ask some questions, just send e-mail to sssyousen@163.com (Tao Jiang) or snowdar@stu.xmu.edu.cn (Snowdar) for SRE answers and wangbling1207@stu.xmu.edu.cn for LID answers. In general, we will reply you in our free time.\n+ If you want to join the WeChat group of asv-subtools, you can scan the QR code on the left to follow XMUSPEECH and reply \"join group\" + your institution/university + your name. In addtion, you can also scan the QR code on the right and the guy will invite you to the chat group.\n+ \u003cimg src=\"./doc/xmuspeech.jpg\" width=\"300\" height=\"300\"/\u003e\u003cimg src=\"./doc/sssyousen_wechat_qr.jpg\" width=\"300\" height=\"300\"/\u003e\n\n## Acknowledgement\n+ Thanks to everyone who contribute their time, ideas and codes to ASV-Subtools.\n+ Thanks to [XMU Speech Lab](https://speech.xmu.edu.cn/) providing machine and GPU.\n+ Thanks to the excelent projects: [Kaldi](http://www.kaldi-asr.org/), [Pytorch](https://pytorch.org/), [Kaldi I/O](https://github.com/vesis84/kaldi-io-for-python), [Numpy](https://numpy.org/), [Pandas](https://pandas.pydata.org/), [Horovod](https://github.com/horovod/horovod), [Progressbar2](https://github.com/WoLpH/python-progressbar), [Matplotlib](https://matplotlib.org/index.html), [Prefetch Generator](https://github.com/justheuristic/prefetch_generator), [Thop](https://github.com/Lyken17/pytorch-OpCounter), [GPU Manager](https://github.com/QuantumLiu/tf_gpu_manager), etc.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSnowdar%2Fasv-subtools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSnowdar%2Fasv-subtools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSnowdar%2Fasv-subtools/lists"}