{"id":13532276,"url":"https://github.com/AppleHolic/source_separation","last_synced_at":"2025-04-01T20:31:47.443Z","repository":{"id":56214227,"uuid":"198524244","full_name":"AppleHolic/source_separation","owner":"AppleHolic","description":"Deep learning based speech source separation using Pytorch","archived":false,"fork":false,"pushed_at":"2020-11-20T05:25:34.000Z","size":4313,"stargazers_count":312,"open_issues_count":0,"forks_count":45,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-02T19:34:06.381Z","etag":null,"topics":["audio","deep-learning","pytorch","source-separation","speech","speech-separation"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AppleHolic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-23T23:44:42.000Z","updated_at":"2024-10-19T20:26:16.000Z","dependencies_parsed_at":"2022-08-15T14:50:22.928Z","dependency_job_id":null,"html_url":"https://github.com/AppleHolic/source_separation","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AppleHolic%2Fsource_separation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AppleHolic%2Fsource_separation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AppleHolic%2Fsource_separation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AppleHolic%2Fsource_separation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AppleHolic","download_url":"https://codeload.github.com/AppleHolic/source_separation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246709923,"owners_count":20821297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","deep-learning","pytorch","source-separation","speech","speech-separation"],"created_at":"2024-08-01T07:01:09.674Z","updated_at":"2025-04-01T20:31:42.431Z","avatar_url":"https://github.com/AppleHolic.png","language":"Jupyter Notebook","funding_links":[],"categories":["Speech Separation (single channel)"],"sub_categories":["NN-based separation"],"readme":"# Source Separation\n\n[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/) [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FAppleholic%2Fsource_separation)](https://hits.seeyoufarm.com)\n[![Synthesis Example On Colab Notebook](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Appleholic/source_separation/blob/master/assets/Source_Separation_first_notebook.ipynb)\n---\n\n\n## Introduction\n\n*Source Separation* is a repository to extract speeches from various recorded sounds.\nIt focuses to adapt more real-like dataset for training models.\n\n### Main components, different things\n\nThe latest model in this repository is basically built with spectrogram based models.\nIn mainly, [Phase-aware Speech Enhancement with Deep Complex U-Net](https://arxiv.org/abs/1903.03107) are implemented with modifications.\n- Complex Convolution, Masking, Weighted SDR Loss\n\n\nAnd then, To more stable inferences in real cases, below things are adopted.\n\n- Audioset data is used to augment noises.\n\nDataset source is opened on [audioset_augmentor](https://github.com/AppleHolic/audioset_augmentor).\nSee this [link](https://research.google.com/audioset/download.html) for finding explanations about audioset.\nThis repo used *Balanced train label dataset* (Label balanced, non-human classes, 18055 samples)\n\n- Preemphasis is used to remove high-frequency noises on adapting real samples.\n\nIt's not official implementation by authors of paper.\n\n\n#### Singing Voice Separation\n\nSinging Voice Separation with [DSD100](https://sigsep.github.io/datasets/dsd100.html) dataset!*\nThis model is trained with larger model and higher sample rate (44.1k). So it gives more stable and high quality audio.\nLet's checkout [Youtube Playlist](https://www.youtube.com/playlist?list=PLQ4ukFz6Ieir5bZYOns08_2gMjt4hYP4I) with samples of my favorites!\n\n#### Recent Updates\n\n- Add Synthesis Colab Notebook Example. You can check out this on the badge or [here](https://colab.research.google.com/github/Appleholic/source_separation/blob/master/assets/Source_Separation_first_notebook.ipynb).\n\n\n### Dataset\n\nYou can use pre-defined preprocessing and dataset sources on https://github.com/Appleholic/pytorch_sound\n\n\n## Environment\n\n- Python \u003e 3.6\n- pytorch 1.0\n- ubuntu 16.04\n- Brain Cloud (Kakaobrain Cluster) V2.XLARGE (2 V100 GPUs, 28 cores cpu, 244 GB memory)\n\n\n## External Repositories\n\nThere are three external repositories on this repository.\n*These will be updated to setup with recursive clone or internal codes*\n\n- pytorch_sound package\n\nIt is built with using [pytorch_sound](https://github.com/AppleHolic/pytorch_sound).\nSo that, *pytorch_sound* is a modeling toolkit that allows engineers to train custom models for sound related tasks.\nMany of sources in this repository are based on pytorch_sound template.\n\n- audioset_augmentor\n\nExplained it on above section. [link](https://github.com/AppleHolic/audioset_augmentor)\n\n- pypesq : [git+https://github.com/ludlows/python-pesq](git+https://github.com/ludlows/python-pesq)\n\nFor evaluation, PESQ python wrapper repository is added.\n\n\n\n## Pretrained Checkpoint\n\n- General Voice Source Separation\n  - Model Name : refine_unet_base (see settings.py)\n  - Link : [Google Drive](https://drive.google.com/open?id=1JRK-0RVV2o7cyRdvFuwe5iw84ESvfcyR)\n\n- Singing Voice Separation\n  - Model Name : refine_unet_larger\n  - Link : [Google Drive](https://drive.google.com/open?id=1ywgFZ7ms7CmiCCv2MikrKx9g-2j9kd-I)\n\n- Current Tag : v0.1.1\n\n## Predicted Samples\n\n- *General Voice Source Separation*\n  - Validation 10 random samples\n    - Link : [Google Drive](https://drive.google.com/open?id=1CafFnqWn_QvVPu2feNLn6pnjRYIa_rbP)\n\n  - Test Samples :\n    - Link : [Google Drive](https://drive.google.com/open?id=19Sn6pe5-BtWXYa6OiLbYGH7iCU-mzB8j)\n\n- *Singing Voice Seperation*\n  - Check out my youtube playlist !\n    - Link : [Youtube Playlist](https://www.youtube.com/playlist?list=PLQ4ukFz6Ieir5bZYOns08_2gMjt4hYP4I)\n\n\n## Installation\n\n- Install above external repos\n\n\u003e You should see first README.md of audioset_augmentor and pytorch_sound, to prepare dataset and to train separation models.\n\n```\n$ pip install git+https://github.com/Appleholic/audioset_augmentor\n$ pip install git+https://github.com/Appleholic/pytorch_sound@v0.0.3\n$ pip install git+https://github.com/ludlows/python-pesq  # for evaluation code\n```\n\n- Install package\n\n```bash\n$ pip install -e .\n```\n\n## Usage\n\n- Train\n\n```bash\n$ python source_separation/train.py [YOUR_META_DIR] [SAVE_DIR] [MODEL NAME, see settings.py] [SAVE_PREFIX] [[OTHER OPTIONS...]]\n```\n\n- Joint Train (Voice Bank and DSD100)\n\n```bash\n$ python source_separation/train_jointly.py [YOUR_VOICE_BANK_META_DIR] [YOUR_DSD100_META_DIR] [SAVE_DIR] [MODEL NAME, see settings.py] [SAVE_PREFIX] [[OTHER OPTIONS...]]\n```\n\n\n- Synthesize\n  - Be careful the differences sample rate between general case and singing voice case!\n  - If you run more than one, it can help to get better result.\n    - Sapmles (voice bank, dsd) are ran twice.\n\nSingle sample\n\n```bash\n$ python source_separation/synthesize.py separate [INPUT_PATH] [OUTPUT_PATH] [MODEL NAME] [PRETRAINED_PATH] [[OTHER OPTIONS...]]\n```\n\n\nWhole validation samples (with evaluation)\n\n```bash\n$ python source_separation/synthesize.py validate [YOUR_META_DIR] [MODEL NAME] [PRETRAINED_PATH] [[OTHER OPTIONS...]]\n```\n\n\nAll samples in given directory.\n\n```bash\n$ python source_separation/synthesize.py test-dir [INPUT_DIR] [OUTPUT_DIR] [MODEL NAME] [PRETRAINED_PATH] [[OTHER OPTIONS...]]\n```\n\n\n## Experiments\n\n### Reproduce experiments\n\n- General Voice Separation\n  - single train code\n  - Pretrained checkpoint is trained on default options\n  - Above option will be changed with curriculum learning and the other tries.\n\n- Singing Voice Separation\n  - joint train code\n  - Pretrained checkpoint is trained on 4 GPUs, double (256) batch size.\n\n### Parameters and settings :\n\nIt is tuned to find out good validation WSDR loss\n- refine_unet_base : 75M\n- refine_unet_larger : 95M\n\n\n### Evaluation Scores (on validation dataset)\n\n*PESQ score* is evaluated all validation dataset, but wdsr loss is picked with best loss of small subset while training is going on.\n\u003e Results may vary slightly depending on the meta file, random state.\n\n\u003e - The validation results tend to be different from the test results.\n\u003e - Original sample rate is 22050, but PESQ needs 16k. So audios are resampled for calculating PESQ.\n\n- General (voice bank), 200k steps\n\n|training type|score name| value |\n|:------------:|:--------:|:-----:|\n|without audioset|PESQ|2.346|\n|without audioset|wsdr loss|-0.9389|\n|with audioset|PESQ|2.375|\n|with audioset|wsdr loss|-0.9078|\n\n- Singing Voice Separation, 200k steps, WSDR Loss\n  - Got an error for calculating PESQ on this case.\n\n|training type| value |\n|:------------:|:--------:|\n|dsd only|-0.9593|\n|joint with voice bank|-0.9325|\n\n### Loss curves (Voice Bank)\n\n#### Train\n\n![Train L1 Loss curve](./assets/train_curve_wsdr.png)\n\n#### Valid\n\n![Valid L1 Loss curve](./assets/valid_curve_wsdr.png)\n\n\n## License\n\nThis repository is developed by [ILJI CHOI](https://github.com/Appleholic).  It is distributed under Apache License 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAppleHolic%2Fsource_separation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAppleHolic%2Fsource_separation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAppleHolic%2Fsource_separation/lists"}