{"id":13532086,"url":"https://github.com/funcwj/uPIT-for-speech-separation","last_synced_at":"2025-04-01T20:31:20.232Z","repository":{"id":93560788,"uuid":"138253308","full_name":"funcwj/uPIT-for-speech-separation","owner":"funcwj","description":"Speech separation with utterance-level PIT experiments","archived":false,"fork":false,"pushed_at":"2018-07-12T09:29:59.000Z","size":39,"stargazers_count":101,"open_issues_count":0,"forks_count":39,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-11-02T19:33:49.082Z","etag":null,"topics":["pit","pytorch","speech-separation"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/funcwj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-06-22T04:08:07.000Z","updated_at":"2024-10-21T05:47:08.000Z","dependencies_parsed_at":"2023-06-11T00:00:45.403Z","dependency_job_id":null,"html_url":"https://github.com/funcwj/uPIT-for-speech-separation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2FuPIT-for-speech-separation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2FuPIT-for-speech-separation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2FuPIT-for-speech-separation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/funcwj%2FuPIT-for-speech-separation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/funcwj","download_url":"https://codeload.github.com/funcwj/uPIT-for-speech-separation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246709923,"owners_count":20821297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pit","pytorch","speech-separation"],"created_at":"2024-08-01T07:01:08.101Z","updated_at":"2025-04-01T20:31:15.218Z","avatar_url":"https://github.com/funcwj.png","language":"Python","funding_links":[],"categories":["Speech Separation (single channel)"],"sub_categories":["NN-based separation"],"readme":"## Speech Separation with uPIT\n\nSpeech separation with utterance-level PIT(Permutation Invariant Training)\n\n### Requirements\n\nsee [requirements.txt](requirements.txt)\n\n### Usage\n\n1. Generate dataset using [create-speaker-mixtures.zip](http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip)\n\n2. Prepare cmvn, .scp and configure experiments in .yaml files\n\n3. Training:\n    ```shell\n    ./run_pit.py --config $conf --num-epoches 100 \u003e $checkpoint/train.log 2\u003e\u00261 \u0026\n    ```\n\n4. Inference:\n    ```\n    ./separate.py --dump-dir cache $mdl_dir/train.yaml $mdl_dir/epoch.40.pkl egs.scp\n    ```\n\n### Experiments\n\n| Configure | Mask | Epoch |  FM   |  FF  |  MM  | FF/MM | AVG  |\n| :-------: | :--: | :---: | :---: | :--: | :--: | :---: | :--: |\n| [config-1](conf/1.config.yaml) |  AM-ReLU    |  75   | 10.41 |  6.73 |  7.35 | 7.19  | 8.82  |\n| [config-2](conf/2.config.yaml) |  AM-sigmoid |  50   | 9.95  |  5.99 |  6.72 | 6.35  | 8.26  |\n| [config-3](conf/3.config.yaml) |  PSM-ReLU   |  73   | 10.29 |  6.54 |  7.28 | 7.09  | 8.71  |\n| [config-4](conf/4.config.yaml) |  PSM-ReLU   |  80   | 10.37 |  6.59 |  7.29 | 7.10  | 8.76  |\n| [config-5](conf/5.config.yaml) |  PSM-ReLU   |  62   | 10.58 |  7.00 |  7.55 | 7.40  | 9.01  |\n| [config-6](conf/6.config.yaml) |  PSM-ReLU   |  62   | 10.47 |  7.44 |  7.78 | 7.69  | 9.10  |\n| [config-7](conf/7.config.yaml) |  PSM-ReLU   |  61   | 10.43 |  7.17 |  7.41 | 7.34  | 8.91  |\n|             -                  |  IAM-oracle |   -   | 12.49 | 12.73 | 11.58 | 11.88 | 12.19 |\n|             -                  |  IBM-oracle |   -   | 12.94 | 13.20 | 12.04 | 12.35 | 12.65 |\n|             -                  |  IRM-oracle |   -   | 12.86 | 13.14 | 11.96 | 12.27 | 12.57 |\n|             -                  |  PSM-oracle |   -   | 15.79 | 16.03 | 14.90 | 15.20 | 15.50 |\n\n\n### Reference\n\n* Kolbæk M, Yu D, Tan Z H, et al. Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2017, 25(10): 1901-1913.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffuncwj%2FuPIT-for-speech-separation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffuncwj%2FuPIT-for-speech-separation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffuncwj%2FuPIT-for-speech-separation/lists"}