{"id":19695828,"url":"https://github.com/wenet-e2e/wesep","last_synced_at":"2025-10-24T02:18:35.459Z","repository":{"id":258141907,"uuid":"848757661","full_name":"wenet-e2e/wesep","owner":"wenet-e2e","description":"Target Speaker Extraction Toolkit","archived":false,"fork":false,"pushed_at":"2025-03-05T13:11:49.000Z","size":695,"stargazers_count":151,"open_issues_count":11,"forks_count":16,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-29T00:07:49.236Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wenet-e2e.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-28T10:59:28.000Z","updated_at":"2025-03-28T16:04:16.000Z","dependencies_parsed_at":"2025-02-24T16:13:15.393Z","dependency_job_id":"4101305b-18b5-4c2e-9498-9d0239ad67fd","html_url":"https://github.com/wenet-e2e/wesep","commit_stats":null,"previous_names":["wenet-e2e/wesep"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwesep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwesep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwesep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wenet-e2e%2Fwesep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wenet-e2e","download_url":"https://codeload.github.com/wenet-e2e/wesep/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271519,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T19:31:03.838Z","updated_at":"2025-10-24T02:18:30.437Z","avatar_url":"https://github.com/wenet-e2e.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Wesep\r\n\r\n\u003e We aim to build a toolkit focusing on front-end processing in the cocktail party set up, including target speaker extraction and ~~speech separation (Future work)~~\r\n\r\n\r\n### Install for development \u0026 deployment\r\n* Clone this repo\r\n``` sh\r\nhttps://github.com/wenet-e2e/wesep.git\r\n```\r\n\r\n* Create conda env: pytorch version \u003e= 1.12.0 is required !!!\r\n``` sh\r\nconda create -n wesep python=3.9\r\nconda activate wesep\r\nconda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge\r\npip install -r requirements.txt\r\npre-commit install  # for clean and tidy code\r\n```\r\n\r\n## The Target Speaker Extraction Task\r\n\r\n\u003e Target speaker extraction (TSE) focuses on isolating the speech of a specific target speaker from overlapped multi-talker speech, which is a typical setup in the cocktail party problem.\r\nWeSep is featured with flexible target speaker modeling, scalable data management, effective on-the-fly data simulation, structured recipes and deployment support.\r\n\r\n\u003cimg src=\"resources/tse.png\" width=\"600px\"\u003e\r\n\r\n## Features (To Do List)\r\n\r\n- [x] On the fly data simulation\r\n  - [x] Dynamic Mixture simulation\r\n  - [x] Dynamic Reverb simulation\r\n  - [x] Dynamic Noise simulation\r\n- [x] Support time- and frequency- domain models\r\n    - Time-domain\r\n        - [x] conv-tasnet based models\r\n            - [x] Spex+\r\n    - Frequency domain\r\n        - [x] pBSRNN\r\n        - [x] pDPCCN\r\n        - [x] tf-gridnet (Extremely slow, need double check)\r\n- [ ] Training Criteria\r\n    - [x] SISNR loss\r\n    - [x] GAN loss  (Need further investigation)\r\n- [ ] Datasets\r\n  - [x] Libri2Mix (Illustration for pre-mixed speech)\r\n  - [x] VoxCeleb (Illustration for online training)\r\n  - [ ] WSJ0-2Mix\r\n- [ ] Speaker Embedding\r\n  - [x] Wespeaker Intergration\r\n  - [x] Joint Learned Speaker Embedding\r\n  - [x] Different fusion methods\r\n- [ ] Pretrained models\r\n- [ ] CLI Usage\r\n- [x] Runtime\r\n\r\n## Data Pipe Design\r\n\r\nFollowing Wenet and Wespeaker, WeSep organizes the data processing modules as a pipeline of a set of different processors. The following figure shows such a pipeline with essential processors.\r\n\r\n\u003cimg src=\"resources/datapipe.png\" width=\"800px\"\u003e\r\n\r\n## Discussion\r\n\r\nFor Chinese users, you can scan the QR code on the left to join our group directly. If it has expired, please scan the personal Wechat QR code on the right.\r\n\r\n|\u003cimg src='resources/Wechat_group.jpg' style=\" width: 200px; height: 300px;\"\u003e|\u003cimg src='resources/Wechat.jpg' style=\" width: 200px; height: 300px;\"\u003e|\r\n| ---- | ---- |\r\n\r\n\r\n\r\n## Citations\r\nIf you find wespeaker useful, please cite it as\r\n\r\n```bibtex\r\n@inproceedings{wang24fa_interspeech,\r\n  title     = {WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction},\r\n  author    = {Shuai Wang and Ke Zhang and Shaoxiong Lin and Junjie Li and Xuefei Wang and Meng Ge and Jianwei Yu and Yanmin Qian and Haizhou Li},\r\n  year      = {2024},\r\n  booktitle = {Interspeech 2024},\r\n  pages     = {4273--4277},\r\n  doi       = {10.21437/Interspeech.2024-1840},\r\n}\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenet-e2e%2Fwesep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwenet-e2e%2Fwesep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwenet-e2e%2Fwesep/lists"}