{"id":25215618,"url":"https://github.com/deezer/musicfpaugment","last_synced_at":"2025-10-25T14:31:25.064Z","repository":{"id":203925648,"uuid":"710693328","full_name":"deezer/musicFPaugment","owner":"deezer","description":"Code for reproducting the paper Music Augmentation and Denoising For Peak-Based Audio Fingerprinting","archived":false,"fork":false,"pushed_at":"2023-10-31T13:40:19.000Z","size":1406,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-04-16T11:27:17.930Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-27T08:34:35.000Z","updated_at":"2024-02-11T20:35:39.000Z","dependencies_parsed_at":"2023-10-29T11:27:37.123Z","dependency_job_id":null,"html_url":"https://github.com/deezer/musicFPaugment","commit_stats":null,"previous_names":["deezer/musicfpaugment"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2FmusicFPaugment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2FmusicFPaugment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2FmusicFPaugment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2FmusicFPaugment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deezer","download_url":"https://codeload.github.com/deezer/musicFPaugment/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238161490,"owners_count":19426669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-10T18:15:09.494Z","updated_at":"2025-10-25T14:31:19.499Z","avatar_url":"https://github.com/deezer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Music Augmentation and Denoising For Peak-Based Audio Fingerprinting\n\nThis repository provides the code to reproduce the paper:  \n\n```\n@article{akesbi2023music,\n  title={Music Augmentation and Denoising For Peak-Based Audio Fingerprinting},\n  author={Akesbi, Kamil and Desblancs, Dorian and Martin, Benjamin},\n  journal={arXiv preprint arXiv:2310.13388},\n  year={2023}\n}\n``````\nwhich can be found [here](https://arxiv.org/abs/2310.13388).\n\nThis work is going to be presented at the Late-Breaking Demo Session of ISMIR 2023.\n\n## Setup\n\n### Datasets\n\nIn order to use the music augmentation pipeline, you will need to download the following datasets: \n\n### Room Impulse Responses\n\nWe use the MIT impulse response (IR) survey dataset. You can download the dataset [here](https://mcdermottlab.mit.edu/Reverb/IR_Survey.html).\n\n### Background Noise\n\nWe build our own background noise dataset by mixing samples from past [Acoustic scence datasets from the DCASE challenge](https://dcase-repo.github.io/dcase_datalist/datasets_scenes.html). We select: \n\n- The TUT Acoustic scenes 2017. Development [here](https://zenodo.org/records/400515) and evaluation dataset [here](https://zenodo.org/records/1040168). \n- The TUT Urban Acoustic Scenes 2018 Mobile. Development dataset [here](https://zenodo.org/records/1228235), and evaluation dataset [here](https://zenodo.org/records/1293901). \n- The 2020 TAU Urban Acoustic Scenes 2020 Mobile challenge. Development dataset [here](https://zenodo.org/records/3819968), and evaluation dataset [here](https://zenodo.org/records/3685828).\n\n### Music Dataset\n\nWe use the MTG-Jamendo dataset to train different music denoising models. Download the dataset [here](https://mtg.github.io/mtg-jamendo-dataset/). \n\n### Audio Fingerprinting Dataset\n\nWe use the Free Music Archive (FMA) Large as our reference database to evaluate the performances of Audio Fingerprinting systems to noisy query snipets.\nDownload the dataset [here](https://github.com/mdeff/fma). \n\n## Running some Code \n\nYou can specify the path to the datasets folders in `docker/install/.env`.\n\nFrom there, you can build and launch your Docker environment for experiments using the following commands:\n```\ndocker-compose -f docker/install/docker-compose.yaml build\ndocker-compose -f docker/install/docker-compose.yaml up -d\ndocker-compose -f docker/install/docker-compose.yaml run python /bin/bash\n```\nYour code will then use the following structure:\n```\nworkspace/ \n    src/\n    noise_databases/\n        mit_ir_survey/\n        dcase/\n            tut_2017_development/\n            tut_2017_evaluation/\n            tut_2018_development_mobile/\n            tut_2018_evaluation_mobile/\n            tut_2020_development_mobile/\n            tut_2020_evaluation_mobile/\n    fma/\n    mtg-jamendo-dataset/\n    queries/\n```\nYou can then install the dependencies needed using [poetry](https://python-poetry.org/):\n```\ncd src/\npoetry install \npoetry shell\npoetry run python ...\n```\n\n## Music Augmentation Pipeline: \n\n![pipeline](images/SourcesOfNoise.png)\n\nThe augmentation pipeline is composed of several transformations applied to an audio input. It is designed to reproduce degradations caused by room responses, background noise, recording devices and loud speakers. \n\n### Augmented Audios Generation\n\nTo generate an augmented music recording from a clean music snipet, you can use the following script: \n``` python\nfrom augmentation import AugmentFP\nfrom training.parameters import WAVEFORM_SAMPLING_RATE, DURATION\n\naf = AugmentFP(List[noise_paths], WAVEFORM_SAMPLING_RATE)\n\nwaveform, sr = torchaudio.load(\"path_to_audio\")\nwaveform = waveform.mean(axis=0)\nwaveform = torchaudio.transforms.Resample(sr, WAVEFORM_SAMPLING_RATE)(waveform)\n\nnb_samples_segment = WAVEFORM_SAMPLING_RATE * DURATION\nstart = random.randint(0, waveform.shape[0] - nb_samples_segment)\nwaveform = waveform[start : start + nb_samples_segment].unsqueeze(0)\n\naug = af(waveform)\n```\n\n### Streamlit integration: \n\nThe pipeline and its different parameters can also be tested in a user-friendly interface using streamlit. The script to access the streamlit interface is:\n```\nstreamlit run streamlit_app/app.py --server.port=8501 --server.address=0.0.0.0 --server.fileWatcherType=None\n```\nwhich results in an interface that looks like...\n\n![augm](images/StreamlitApp.png)\n\n## Experiments\n\n![schema](images/general_schema.png)\n\n### Music Denoising\n\nTo train the models (UNet on magnitude spectrograms or Demucs on raw audio waveforms): \n\n```\npython -m training.train --model=unet\n```\nIn order to visualize the tensorboard logs: \n```\ntensorboard --logdir=monitoring/ --port=6006\n```\nModel weights of two pretrained models can be found in the following Google Drive: https://drive.google.com/file/d/1wAV5EP3oh-V-Q3k-Qf6BEdJjQjZATZJ4/view?usp=sharing. \n\nIn particular, we provide the pretrained weights of: \n\n- A UNet trained to denoise magnitude spectrograms of 8kHz audio signals. \n- A Demucs trained to denoise 8 kHz raw audio waveforms.  \n\nUse the models to generate audios and spectrograms:\n```\npython -m training.generate_audios --model=unet\n```\n\n## Audio Fingerprinting\n\nWe evaluate the robustness of AFP systems to the distortions our augmentation pipeline generates. We use two popular open-source systems: [Audfprint](https://github.com/dpwe/audfprint) and [Dejavu](https://github.com/worldveil/dejavu).\n\nWe use the FMA large dataset as our reference database. To preprocess it, use: \n```\npython testing/fma_preprocessing.py\n```\nWe can then generate 10000 eight-second audio queries using: \n```\npython -m testing.generate_queries --queries=cleans\npython -m testing.generate_queries --queries=augmented\n```\n\n### Audfprint:\n\nTo index the FMA large on Audfprint, use: \n```\npython -m testing.audfpring_exps --action=index\n```\nTo obtain results on Audfprint (specify demucs or unet model):\n```\npython -m testing.audfpring_exps --action=identification_rate --model=unet\npython -m testing.audfpring_exps --action=peaks_metrics --model=unet\n```\n\n### Dejavu: \n\nTo index the FMA large on Dejavu, use: \n```\npython -m testing.dejavu_exps --action=index\n```\nTo obtain results on Dejavu (specify demucs or unet model):\n```\npython -m testing.dejavu_exps --action=identification_rate --model=unet\npython -m testing.dejavu_exps --action=peaks_metrics --model=unet \n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fmusicfpaugment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeezer%2Fmusicfpaugment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fmusicfpaugment/lists"}