{"id":13532145,"url":"https://github.com/kaituoxu/Conv-TasNet","last_synced_at":"2025-04-01T20:31:27.595Z","repository":{"id":39158141,"uuid":"162601462","full_name":"kaituoxu/Conv-TasNet","owner":"kaituoxu","description":"A PyTorch implementation of Conv-TasNet described in \"TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation\" with Permutation Invariant Training (PIT).","archived":false,"fork":false,"pushed_at":"2023-04-06T07:59:54.000Z","size":1288,"stargazers_count":679,"open_issues_count":28,"forks_count":151,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-11-02T19:33:53.058Z","etag":null,"topics":["audio-separation","conv-tasnet","permutation-invariant-training","pit","pytorch","source-separation","speech-separation","tasnet"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kaituoxu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-12-20T16:00:54.000Z","updated_at":"2024-10-30T07:03:09.000Z","dependencies_parsed_at":"2022-07-21T22:30:48.715Z","dependency_job_id":"d95b3446-4b0f-4e3e-88bc-c4713a2c7b7d","html_url":"https://github.com/kaituoxu/Conv-TasNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaituoxu%2FConv-TasNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaituoxu%2FConv-TasNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaituoxu%2FConv-TasNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kaituoxu%2FConv-TasNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kaituoxu","download_url":"https://codeload.github.com/kaituoxu/Conv-TasNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246709923,"owners_count":20821297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-separation","conv-tasnet","permutation-invariant-training","pit","pytorch","source-separation","speech-separation","tasnet"],"created_at":"2024-08-01T07:01:08.531Z","updated_at":"2025-04-01T20:31:22.574Z","avatar_url":"https://github.com/kaituoxu.png","language":"Python","funding_links":[],"categories":["Speech Separation (single channel)"],"sub_categories":["NN-based separation"],"readme":"# Conv-TasNet\nA PyTorch implementation of Conv-TasNet described in [\"TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation\"](https://arxiv.org/abs/1809.07454).\n\n## Results\n| From | N | L | B | H | P | X | R | Norm | Causal | batch size |SI-SNRi(dB) | SDRi(dB)|\n|:----:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:----:|:------:|:----------:|:----------:|:-------:|\n| Paper|256|20 |256|512| 3 | 8 | 4 |  gLN |   X    |     -      |    14.6    |  15.0   |\n| Here |256|20 |256|512| 3 | 8 | 4 |  gLN |   X    |     3      |    15.5    |  15.7   |\n\n## Install\n- PyTorch 0.4.1+\n- Python3 (Recommend Anaconda)\n- `pip install -r requirements.txt`\n- If you need to convert wjs0 to wav format and generate mixture files, `cd tools; make`\n\n## Usage\nIf you already have mixture wsj0 data:\n1. `$ cd egs/wsj0`, modify wsj0 data path `data` to your path in the beginning of `run.sh`.\n2. `$ bash run.sh`, that's all!\n\nIf you just have origin wsj0 data (sphere format):\n1. `$ cd egs/wsj0`, modify three wsj0 data path to your path in the beginning of `run.sh`.\n2. Convert sphere format wsj0 to wav format and generate mixture. `Stage 0` part provides an example.\n3. `$ bash run.sh`, that's all!\n\nYou can change hyper-parameter by `$ bash run.sh --parameter_name parameter_value`, egs, `$ bash run.sh --stage 3`. See parameter name in `egs/aishell/run.sh` before `. utils/parse_options.sh`.\n### Workflow\nWorkflow of `egs/wsj0/run.sh`:\n- Stage 0: Convert sphere format to wav format and generate mixture (optional)\n- Stage 1: Generating json files including wav path and duration\n- Stage 2: Training\n- Stage 3: Evaluate separation performance\n- Stage 4: Separate speech using Conv-TasNet\n### More detail\n```bash\n# Set PATH and PYTHONPATH\n$ cd egs/wsj0/; . ./path.sh\n# Train:\n$ train.py -h\n# Evaluate performance:\n$ evaluate.py -h\n# Separate mixture audio:\n$ separate.py -h\n```\n#### How to visualize loss?\nIf you want to visualize your loss, you can use [visdom](https://github.com/facebookresearch/visdom) to do that:\n1. Open a new terminal in your remote server (recommend tmux) and run `$ visdom`\n2. Open a new terminal and run `$ bash run.sh --visdom 1 --visdom_id \"\u003cany-string\u003e\"` or `$ train.py ... --visdom 1 --vidsdom_id \"\u003cany-string\u003e\"`\n3. Open your browser and type `\u003cyour-remote-server-ip\u003e:8097`, egs, `127.0.0.1:8097`\n4. In visdom website, chose `\u003cany-string\u003e` in `Environment` to see your loss\n![im](egs/wsj0/loss.png)\n#### How to resume training?\n```bash\n$ bash run.sh --continue_from \u003cmodel-path\u003e\n```\n#### How to use multi-GPU?\nUse comma separated gpu-id sequence, such as:\n```bash\n$ bash run.sh --id \"0,1\"\n```\n#### How to solve out of memory?\n- When happened in training, try to reduce `batch_size` or use more GPU. `$ bash run.sh --batch_size \u003clower-value\u003e`\n- When happened in cross validation, try to reduce `cv_maxlen`. `$ bash run.sh --cv_maxlen \u003clower-value\u003e`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaituoxu%2FConv-TasNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkaituoxu%2FConv-TasNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaituoxu%2FConv-TasNet/lists"}