{"id":13585013,"url":"https://github.com/tsurumeso/vocal-remover","last_synced_at":"2025-05-15T13:07:42.916Z","repository":{"id":37465793,"uuid":"198052141","full_name":"tsurumeso/vocal-remover","owner":"tsurumeso","description":"Vocal Remover using Deep Neural Networks","archived":false,"fork":false,"pushed_at":"2024-07-23T18:19:55.000Z","size":173,"stargazers_count":1658,"open_issues_count":69,"forks_count":242,"subscribers_count":38,"default_branch":"develop","last_synced_at":"2025-04-15T02:11:18.354Z","etag":null,"topics":["audio","deep-learning","pytorch","segmentation","spectrogram","vocal-separation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tsurumeso.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-21T12:14:04.000Z","updated_at":"2025-04-12T04:17:21.000Z","dependencies_parsed_at":"2022-08-08T20:30:23.508Z","dependency_job_id":"07fd1a84-2668-4671-8a70-59ab0b6c044c","html_url":"https://github.com/tsurumeso/vocal-remover","commit_stats":null,"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsurumeso%2Fvocal-remover","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsurumeso%2Fvocal-remover/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsurumeso%2Fvocal-remover/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tsurumeso%2Fvocal-remover/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tsurumeso","download_url":"https://codeload.github.com/tsurumeso/vocal-remover/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254346624,"owners_count":22055808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","deep-learning","pytorch","segmentation","spectrogram","vocal-separation"],"created_at":"2024-08-01T15:04:40.881Z","updated_at":"2025-05-15T13:07:42.854Z","avatar_url":"https://github.com/tsurumeso.png","language":"Python","funding_links":[],"categories":["Python","Stale"],"sub_categories":["ML \u0026 Training"],"readme":"# vocal-remover\n\n[![Release](https://img.shields.io/github/release/tsurumeso/vocal-remover.svg)](https://github.com/tsurumeso/vocal-remover/releases/latest)\n[![Release](https://img.shields.io/github/downloads/tsurumeso/vocal-remover/total.svg)](https://github.com/tsurumeso/vocal-remover/releases)\n\nThis is a deep-learning-based tool to extract instrumental track from your songs.\n\n## Installation\n\n### Getting vocal-remover\nDownload the latest version from [here](https://github.com/tsurumeso/vocal-remover/releases).\n\n### Install PyTorch\n**See**: [GET STARTED](https://pytorch.org/get-started/locally/)\n\n### Install the other packages\n```\ncd vocal-remover\npip install -r requirements.txt\n```\n\n## Usage\nThe following command separates the input into instrumental and vocal tracks. They are saved as `*_Instruments.wav` and `*_Vocals.wav`.\n\n### Run on CPU\n```\npython inference.py --input path/to/an/audio/file\n```\n\n### Run on GPU\n```\npython inference.py --input path/to/an/audio/file --gpu 0\n```\n\n### Advanced options\n`--tta` option performs Test-Time-Augmentation to improve the separation quality.\n```\npython inference.py --input path/to/an/audio/file --tta --gpu 0\n```\n\n`--postprocess` option masks instrumental part based on the vocals volume to improve the separation quality.\n\u003e [!WARNING]\n\u003e This is an experimental feature. If you get any problems with this option, please disable it.\n```\npython inference.py --input path/to/an/audio/file --postprocess --gpu 0\n```\n\n## Train your own model\n\n### Place your dataset\n```\npath/to/dataset/\n  +- instruments/\n  |    +- 01_foo_inst.wav\n  |    +- 02_bar_inst.mp3\n  |    +- ...\n  +- mixtures/\n       +- 01_foo_mix.wav\n       +- 02_bar_mix.mp3\n       +- ...\n```\n\n### Train a model\n```\npython train.py --dataset path/to/dataset --mixup_rate 0.5 --reduction_rate 0.5 --gpu 0\n```\n\n## References\n- [1] Jansson et al., \"Singing Voice Separation with Deep U-Net Convolutional Networks\", https://ejhumphrey.com/assets/pdf/jansson2017singing.pdf\n- [2] Takahashi et al., \"Multi-scale Multi-band DenseNets for Audio Source Separation\", https://arxiv.org/pdf/1706.09588.pdf\n- [3] Takahashi et al., \"MMDENSELSTM: AN EFFICIENT COMBINATION OF CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS FOR AUDIO SOURCE SEPARATION\", https://arxiv.org/pdf/1805.02410.pdf\n- [4] Choi et al., \"PHASE-AWARE SPEECH ENHANCEMENT WITH DEEP COMPLEX U-NET\", https://openreview.net/pdf?id=SkeRTsAcYm\n- [5] Jansson et al., \"Learned complex masks for multi-instrument source separation\", https://arxiv.org/pdf/2103.12864.pdf\n- [6] Liutkus et al., \"The 2016 Signal Separation Evaluation Campaign\", Latent Variable Analysis and Signal Separation - 12th International Conference\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftsurumeso%2Fvocal-remover","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftsurumeso%2Fvocal-remover","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftsurumeso%2Fvocal-remover/lists"}