{"id":13532040,"url":"https://github.com/funcwj/deep-clustering","last_synced_at":"2025-04-01T20:31:14.605Z","repository":{"id":34795639,"uuid":"137379139","full_name":"funcwj/deep-clustering","owner":"funcwj","description":"deep clustering method for single-channel speech separation","archived":false,"fork":false,"pushed_at":"2022-06-21T21:19:45.000Z","size":24,"stargazers_count":108,"open_issues_count":3,"forks_count":34,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-02T19:33:47.678Z","etag":null,"topics":["pytorch","speech-separation"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/funcwj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-14T15:50:16.000Z","updated_at":"2024-08-12T19:39:19.000Z","dependencies_parsed_at":"2022-09-02T07:41:05.633Z","dependency_job_id":null,"html_url":"https://github.com/funcwj/deep-clustering","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2Fdeep-clustering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2Fdeep-clustering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2Fdeep-clustering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2Fdeep-clustering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/funcwj","download_url":"https://codeload.github.com/funcwj/deep-clustering/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246709923,"owners_count":20821297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pytorch","speech-separation"],"created_at":"2024-08-01T07:01:07.761Z","updated_at":"2025-04-01T20:31:09.591Z","avatar_url":"https://github.com/funcwj.png","language":"Python","readme":"## Deep clustering for single-channel speech separation\n\nImplement of \"Deep Clustering Discriminative Embeddings for Segmentation and Separation\"\n\n### Requirements\n\nsee [requirements.txt](requirements.txt)\n\n### Usage\n\n1. Configure experiments in .yaml files, for example: `train.yaml`\n\n2. Training:\n\n    ```shell\n    python ./train_dcnet.py --config conf/train.yaml --num-epoches 20 \u003e train.log 2\u003e\u00261 \u0026\n    ```\n\n3. Inference:\n    ```\n    python ./separate.py --num-spks 2 $mdl_dir/train.yaml $mdl_dir/final.pkl egs.scp\n    ```\n\n### Experiments\n\n| Configure | Epoch |  FM   |  FF  |  MM  | FF/MM | AVG  |\n| :-------: | :---: | :---: | :--: | :--: | :---: | :--: |\n| [config-1](conf/1.config.yaml) |  25   | 11.42 | 6.85 | 7.88 | 7.36  | 9.54 |\n\n### Q \u0026 A\n\n1. The format of the `.scp` file?\n\n    The format of the `wav.scp` file follows the definition in kaldi toolkit. Each line contains a `key value` pair, where key is a unique string to index audio file and the value is the path of the file. For example\n    ```\n    mix-utt-00001 /home/data/train/mix-utt-00001.wav\n    ...\n    mix-utt-XXXXX /home/data/train/mix-utt-XXXXX.wav\n    ```\n\n2. How to prepare training dataset?\n\n    Original paper use MATLAB scripts from [create-speaker-mixtures.zip](http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip) to simulate two- and three-speaker dataset. You can use you own data source (egs: Librispeech, TIMIT) and create mixtures, keeping clean sources at meanwhile.\n\n\n### Reference\n\n1. Hershey J R, Chen Z, Le Roux J, et al. Deep clustering: Discriminative embeddings for segmentation and separation[C]//Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016: 31-35.\n2. Isik Y, Roux J L, Chen Z, et al. Single-channel multi-speaker separation using deep clustering[J]. arXiv preprint arXiv:1607.02173, 2016.\n","funding_links":[],"categories":["Speech Separation (single channel)","Python"],"sub_categories":["NN-based separation"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffuncwj%2Fdeep-clustering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffuncwj%2Fdeep-clustering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffuncwj%2Fdeep-clustering/lists"}