{"id":13958545,"url":"https://github.com/wenet-e2e/wespeaker","last_synced_at":"2025-05-16T04:03:58.615Z","repository":{"id":37916220,"uuid":"411337492","full_name":"wenet-e2e/wespeaker","owner":"wenet-e2e","description":"Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit","archived":false,"fork":false,"pushed_at":"2025-02-26T07:36:05.000Z","size":6525,"stargazers_count":893,"open_issues_count":36,"forks_count":136,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-05-13T11:22:19.745Z","etag":null,"topics":["asv","campplus","cnceleb","dino","ecapa-tdnn","eres2net","nist-sre","plda","production-ready","pytorch","redimnet","repvgg","resnet","speaker-diarization","speaker-recognition","speaker-verification","ssl","voxceleb","wavlm","xvector"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wenet-e2e.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null}},"created_at":"2021-09-28T15:25:44.000Z","updated_at":"2025-05-12T11:16:21.000Z","dependencies_parsed_at":"2023-11-13T06:26:57.244Z","dependency_job_id":"54911616-9298-4435-809d-169f83543d47","html_url":"https://github.com/wenet-e2e/wespeaker","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwespeaker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwespeaker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwespeaker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwespeaker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wenet-e2e","download_url":"https://codeload.github.com/wenet-e2e/wespeaker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254464891,"owners_count":22075570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asv","campplus","cnceleb","dino","ecapa-tdnn","eres2net","nist-sre","plda","production-ready","pytorch","redimnet","repvgg","resnet","speaker-diarization","speaker-recognition","speaker-verification","ssl","voxceleb","wavlm","xvector"],"created_at":"2024-08-08T13:01:42.920Z","updated_at":"2025-05-16T04:03:58.572Z","avatar_url":"https://github.com/wenet-e2e.png","language":"Python","funding_links":[],"categories":["语音识别","Speaker Recognition/Verification:"],"sub_categories":["网络服务_其他","Toolkit"],"readme":"# WeSpeaker\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python-Version](https://img.shields.io/badge/Python-3.8%7C3.9-brightgreen)](https://github.com/wenet-e2e/wespeaker)\n\n[**Roadmap**](ROADMAP.md)\n| [**Docs**](http://wenet.org.cn/wespeaker)\n| [**Paper**](https://arxiv.org/abs/2210.17016)\n| [**Runtime**](https://github.com/wenet-e2e/wespeaker/tree/master/runtime)\n| [**Pretrained Models**](docs/pretrained.md)\n| [**Huggingface Demo**](https://huggingface.co/spaces/wenet/wespeaker_demo)\n| [**Modelscope Demo**](https://www.modelscope.cn/studios/wenet/Speaker_Verification_in_WeSpeaker/summary)\n\n\nWeSpeaker mainly focuses on [**speaker embedding learning**](https://wsstriving.github.io/talk/ncmmsc_slides_shuai.pdf), with application to the speaker verification task. We support\nonline feature extraction or loading pre-extracted features in kaldi-format.\n\n## Installation\n\n### Install python package\n``` sh\npip install git+https://github.com/wenet-e2e/wespeaker.git\n```\n**Command-line usage** (use `-h` for parameters):\n\n``` sh\n$ wespeaker --task embedding --audio_file audio.wav --output_file embedding.txt\n$ wespeaker --task embedding_kaldi --wav_scp wav.scp --output_file /path/to/embedding\n$ wespeaker --task similarity --audio_file audio.wav --audio_file2 audio2.wav\n$ wespeaker --task diarization --audio_file audio.wav\n```\n\n**Python programming usage**:\n\n``` python\nimport wespeaker\n\nmodel = wespeaker.load_model('chinese')\nembedding = model.extract_embedding('audio.wav')\nutt_names, embeddings = model.extract_embedding_list('wav.scp')\nsimilarity = model.compute_similarity('audio1.wav', 'audio2.wav')\ndiar_result = model.diarize('audio.wav')\n```\n\nPlease refer to [python usage](docs/python_package.md) for more command line and python programming usage.\n\n### Install for development \u0026 deployment\n* Clone this repo\n``` sh\ngit clone https://github.com/wenet-e2e/wespeaker.git\n```\n\n* Create conda env: pytorch version \u003e= 1.12.1 is recommended !!!\n``` sh\nconda create -n wespeaker python=3.9\nconda activate wespeaker\nconda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge\npip install -r requirements.txt\npre-commit install  # for clean and tidy code\n```\n\n## 🔥 News\n* 2025.02.23: Add support for the Xi-vector, see [#404](https://github.com/wenet-e2e/wespeaker/pull/404).\n* 2024.09.03: Support the SimAM_ResNet and the model pretrained on VoxBlink2, check [Pretrained Models](docs/pretrained.md) for the pretrained model, [VoxCeleb Recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb/v2) for the super performance, and [python usage](docs/python_package.md) for the command line usage!\n* 2024.08.30: We support whisper_encoder based frontend and propose the [Whisper-PMFA](https://arxiv.org/pdf/2408.15585) framework, check [#356](https://github.com/wenet-e2e/wespeaker/pull/356).\n* 2024.08.20: Update diarization recipe for VoxConverse dataset by leveraging umap dimensionality reduction and hdbscan clustering, see [#347](https://github.com/wenet-e2e/wespeaker/pull/347) and [#352](https://github.com/wenet-e2e/wespeaker/pull/352).\n* 2024.08.18: Support using ssl pre-trained models as the frontend. The [WavLM recipe](https://github.com/wenet-e2e/wespeaker/blob/master/examples/voxceleb/v2/run_wavlm.sh) is also provided, see [#344](https://github.com/wenet-e2e/wespeaker/pull/344).\n* 2024.05.15: Add support for [quality-aware score calibration](https://arxiv.org/pdf/2211.00815), see [#320](https://github.com/wenet-e2e/wespeaker/pull/320).\n* 2024.04.25: Add support for the gemini-dfresnet model, see [#291](https://github.com/wenet-e2e/wespeaker/pull/291).\n* 2024.04.23: Support MNN inference engine in runtime, see [#310](https://github.com/wenet-e2e/wespeaker/pull/310).\n* 2024.04.02: Release [Wespeaker document](http://wenet.org.cn/wespeaker) with detailed model-training tutorials, introduction of various runtime platforms, etc.\n* 2024.03.04: Support the [eres2net-cn-common-200k](https://www.modelscope.cn/models/iic/speech_eres2net_sv_zh-cn_16k-common/summary) and [campplus-cn-common-200k](https://www.modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) of damo [#281](https://github.com/wenet-e2e/wespeaker/pull/281), check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.\n* 2024.02.05: Support the ERes2Net [#272](https://github.com/wenet-e2e/wespeaker/pull/272) and Res2Net [#273](https://github.com/wenet-e2e/wespeaker/pull/273) models.\n* 2023.11.13: Support CLI usage of wespeaker, check [python usage](https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md) for details.\n* 2023.07.18: Support the kaldi-compatible PLDA and unsupervised adaptation, see [#186](https://github.com/wenet-e2e/wespeaker/pull/186).\n* 2023.07.14: Support the [NIST SRE16 recipe](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016), see [#177](https://github.com/wenet-e2e/wespeaker/pull/177).\n\n## Recipes\n\n* [VoxCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb): Speaker Verification recipe on the [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)\n    * 🔥 UPDATE 2024.05.15: We support score calibration for Voxceleb and achieve better performance!\n    * 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving **2.627%** (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.\n    * 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving **0.447%/0.043** EER/mindcf on vox1-O-clean test set\n    * 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems\n      - EER/minDCF on vox1-O-clean test set are **0.723%/0.069** (ResNet34) and **0.728%/0.099** (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm\n* [CNCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/cnceleb/v2): Speaker Verification recipe on the [CnCeleb dataset](http://cnceleb.org/)\n    * 🔥 UPDATE 2024.05.16: We support score calibration for Cnceleb and achieve better EER.\n    * 🔥 UPDATE 2022.10.31: 221-layer ResNet achieves **5.655%/0.330**  EER/minDCF\n    * 🔥 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 [report](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082.pdf) [slides](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082-ZhengyangChen.pdf)\n      - EER/minDCF reduction from 8.426%/0.487 to **6.492%/0.354** after large margin fine-tuning and AS-Norm\n* [NIST SRE16](https://github.com/wenet-e2e/wespeaker/tree/master/examples/sre/v2): Speaker Verification recipe for the [2016 NIST Speaker Recognition Evaluation Plan](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016). Similar recipe can be found in [Kaldi](https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16).\n   * 🔥 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.\n* [VoxConverse](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse): Diarization recipe on the [VoxConverse dataset](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/)\n\n## Discussion\n\nFor Chinese users, you can scan the QR code on the left to follow our offical account of `WeNet Community`.\nWe also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.\n| \u003cimg src=\"https://github.com/wenet-e2e/wenet-contributors/blob/main/wenet_official.jpeg\" width=\"250px\"\u003e | \u003cimg src=\"https://github.com/wenet-e2e/wenet-contributors/blob/main/wespeaker/wangshuai.jpg\" width=\"250px\"\u003e |\n| ---- | ---- |\n\n## Citations\nIf you find wespeaker useful, please cite it as\n```bibtex\n@article{wang2024advancing,\n  title={Advancing speaker embedding learning: Wespeaker toolkit for research and production},\n  author={Wang, Shuai and Chen, Zhengyang and Han, Bing and Wang, Hongji and Liang, Chengdong and Zhang, Binbin and Xiang, Xu and Ding, Wen and Rohdin, Johan and Silnova, Anna and others},\n  journal={Speech Communication},\n  volume={162},\n  pages={103104},\n  year={2024},\n  publisher={Elsevier}\n}\n\n@inproceedings{wang2023wespeaker,\n  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},\n  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},\n  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n  pages={1--5},\n  year={2023},\n  organization={IEEE}\n}\n```\n## Looking for contributors\n\nIf you are interested to contribute, feel free to contact @wsstriving or @robin1001\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenet-e2e%2Fwespeaker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwenet-e2e%2Fwespeaker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenet-e2e%2Fwespeaker/lists"}