{"id":13861552,"url":"https://github.com/RetroCirce/Zero_Shot_Audio_Source_Separation","last_synced_at":"2025-07-14T09:32:24.180Z","repository":{"id":41836071,"uuid":"437672810","full_name":"RetroCirce/Zero_Shot_Audio_Source_Separation","owner":"RetroCirce","description":"The official code repo for \"Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data\", in AAAI 2022","archived":false,"fork":false,"pushed_at":"2022-07-14T05:33:59.000Z","size":700,"stargazers_count":178,"open_issues_count":2,"forks_count":32,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-08-05T06:03:25.392Z","etag":null,"topics":["audio-source-separation","music-information-retrieval","python","query-based-learning","transformer-models","zero-shot-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2112.07891","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RetroCirce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-12T22:41:14.000Z","updated_at":"2024-07-29T12:37:59.000Z","dependencies_parsed_at":"2022-08-28T23:32:01.133Z","dependency_job_id":null,"html_url":"https://github.com/RetroCirce/Zero_Shot_Audio_Source_Separation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RetroCirce%2FZero_Shot_Audio_Source_Separation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RetroCirce%2FZero_Shot_Audio_Source_Separation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RetroCirce%2FZero_Shot_Audio_Source_Separation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RetroCirce%2FZero_Shot_Audio_Source_Separation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RetroCirce","download_url":"https://codeload.github.com/RetroCirce/Zero_Shot_Audio_Source_Separation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225968837,"owners_count":17553146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-source-separation","music-information-retrieval","python","query-based-learning","transformer-models","zero-shot-learning"],"created_at":"2024-08-05T06:01:24.980Z","updated_at":"2024-11-22T21:30:48.453Z","avatar_url":"https://github.com/RetroCirce.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Zero Shot Audio Source Separation\n\n## Introduction\n\nThe Code Repository for \"[Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data](https://arxiv.org/abs/2112.07891)\", in AAAI 2022.\n\nIn this paper, we propose a three-component pipline that allows you to train a audio source separator to separate *any source* from the track. All you need is a mixture audio to separate, and a given source sample as a query. Then the model will separate your specified source from the track. Our model lies in a zero-shot setting because we never use the seapration dataset but a general audio dataset **AudioSet**. However, we achieve a very competible separation performance (SDR) in MUSDB18 Dataset compared with those supervised models. Our model has a generalization ability to unseen sources out of the training set. Indeed, we do not even require the separation dataset for training but solely **AudioSet**.\n\nThe demos and introduction are presented in our [short instroduction video](https://youtu.be/8XQ5ZyYRLQM) and [full presentation video](https://youtu.be/RgNwB_pJ7Cw). \n\nMore demos will be presented in [my personal website](https://www.knutchen.com) (now under construction)\n\nChckout this interactive demo at Replicate \u003ca href=\"https://replicate.com/retrocirce/zero_shot_audio_source_separation\"\u003e\u003cimg src=\"https://replicate.com/retrocirce/zero_shot_audio_source_separation/badge\"\u003e\u003c/a\u003e Thanks @[ariel415el](https://github.com/ariel415el) for creating this!\n\n![Model Arch](fig/arch.png)\n\n\n\n## Main Separation Performance on MUSDB18 Dataset\nWe achieve a very competible separation performance (SDR) in MUSDB18 Dataset **with neither seeing the MUSDB18 training data nor speficying source targets**, compared with those supervised models.\n\nAdditionally, our model can easily separate many other sources, such as violin, harmonica, guitar, etc. (demos shown in the above video link)\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"fig/results.png\" align=\"center\" alt=\"MUSDB results\" width=\"50%\"/\u003e\n\u003c/p\u003e\n\n## Getting Started\n\n### Install Requirments\n```\npip install -r requirements.txt\n```\n\n### Download and Processing Datasets\n\n* config.py\n```\nchange the varible \"dataset_path\" to your audioset address\nchange the classes_num to 527\n```\n\n* [AudioSet](https://research.google.com/audioset/download.html)\n```\n./create_index.sh # \n// remember to change the pathes in the script\n// more information about this script is in https://github.com/qiuqiangkong/audioset_tagging_cnn\n\npython main.py save_idc \n// count the number of samples in each class and save the npy files\n```\n\n* [MUSDB18](https://sigsep.github.io/datasets/musdb.html) - You can directly use [our processed musdb audio files](https://drive.google.com/drive/folders/1VwRnCxp3t2bXUS_MbXiFiggwkkJQEmha?usp=sharing) in 32000Hz sample rate. Or you set the \"musdb_path\" in the download path, and: \n\n```\npython main.py musdb_process\n// Notice that the training set is a highlight version, while the testing set is the full version\n```\n\n\n### Set the Configuration File: config.py\n\nThe script *config.py* contains all configurations you need to assign to run your code. \n\nPlease read the introduction comments in the file and change your settings.\n\nFor the most important part:\n\nIf you want to train/test your model on AudioSet, you need to set:\n```\ndataset_path = \"your processed audioset folder\"\nbalanced_data = True\nsample_rate = 32000\nhop_size = 320 \nclasses_num = 527\n```\n\n### Train and Evaluation\n\n#### Train the sound event detection system ST-SED/HTS-AT\nWe further integrated this system ST-SED into an independent repository, and evaluteed it on more datasets, improved it a lot and achieved better performance. \n\nYou can follow [this repo](https://github.com/RetroCirce/HTS-Audio-Transformer) to train and evalute the sound event detection system ST-SED (or a more relevant name HTS-AT), the configuation file for training the model for this separation task should be [htsat_config.py](htsat_config.py).\n\nFor this separation task, if you want to save time, you can also download [the checkpoint](https://drive.google.com/drive/folders/1RouwHsGsMs8n3l_jF8XifWtbPzur_YQS?usp=sharing) directly.\n\n#### Train, Evaluate and Inference the Seapration Model\n\nAll scripts is run by main.py:\n```\nTrain: CUDA_VISIBLE_DEVICES=1,2,3,4 python main.py train\n\nTest: CUDA_VISIBLE_DEVICES=1,2,3,4 python main.py test\n\n```\nWe recommend using at least 4 GPU cards with above 20GB memories per card. In our training phrase, we use 8 Nvidia V-100 (32GB) GPUs. \n\nWe provide a quick **inference** interface by:\n```\nCUDA_VISIBLE_DEVICES=1 python main.py inference\n```\nWhere you can separate any given source from the track. You need to set the value of \"inference_file\" and \"inference_query\" in *config.py*. Just check the comment and get it started. And for the inference, we recommend to use only one card (because it is already enough).\n\n\n#### Model Checkpoints:\n\nWe provide the model checkpoints in this [link](https://drive.google.com/drive/folders/1RouwHsGsMs8n3l_jF8XifWtbPzur_YQS?usp=sharing). Feel free to download and test it.\n\n## Citing\n```\n@inproceedings{zsasp-ke2022,\n  author = {Ke Chen* and Xingjian Du* and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},\n  title = {Zero-shot Audio Source Separation via Query-based Learning from Weakly-labeled Data},\n  booktitle = {{AAAI} 2022}\n}\n\n@inproceedings{htsat-ke2022,\n  author = {Ke Chen and Xingjian Du and Bilei Zhu and Zejun Ma and Taylor Berg-Kirkpatrick and Shlomo Dubnov},\n  title = {HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection},\n  booktitle = {{ICASSP} 2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRetroCirce%2FZero_Shot_Audio_Source_Separation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRetroCirce%2FZero_Shot_Audio_Source_Separation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRetroCirce%2FZero_Shot_Audio_Source_Separation/lists"}