{"id":13861584,"url":"https://github.com/deezer/cover_song_detection","last_synced_at":"2025-10-27T09:12:57.235Z","repository":{"id":37070344,"uuid":"137506647","full_name":"deezer/cover_song_detection","owner":"deezer","description":"Tools to run experiments around large scale cover detection.","archived":false,"fork":false,"pushed_at":"2022-09-30T18:31:03.000Z","size":1088,"stargazers_count":27,"open_issues_count":3,"forks_count":5,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-01T14:01:41.080Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-15T15:55:51.000Z","updated_at":"2024-04-28T15:55:23.000Z","dependencies_parsed_at":"2023-01-17T13:30:36.927Z","dependency_job_id":null,"html_url":"https://github.com/deezer/cover_song_detection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/deezer/cover_song_detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fcover_song_detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fcover_song_detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fcover_song_detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fcover_song_detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deezer","download_url":"https://codeload.github.com/deezer/cover_song_detection/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fcover_song_detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280968764,"owners_count":26422274,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-25T02:00:06.499Z","response_time":81,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T06:01:25.615Z","updated_at":"2025-10-27T09:12:57.193Z","avatar_url":"https://github.com/deezer.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Large Scale Cover Detection in Digital Music Libraries using Metadata, Lyrics and Audio Features\n\n\nSource code and supplementary materials for the paper \"Correya, Albin, Romain Hennequin, and Mickaël Arcos. \"Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features.\" arXiv preprint arXiv:1808.10351 (2018)\".\n\nThis repo contains scripts to run text-based experiments for cover song detection task on the [MillionSongDataset (MSD)](https://labrosa.ee.columbia.edu/millionsong/)\nwhich is imported into an [Elasticsearch (ES)](https://www.elastic.co/blog/what-is-an-elasticsearch-index) index as described in the above mentioned paper.\n# Requirements\n\nInstall python dependencies from the requirements.txt file\n\n```\n$ pip install -r requirements.txt\n```\n\n# Setup\n\n* Use [ElasticMSD](https://github.com/deezer/elasticmsd) scripts to setup your local Elasticsearch index of MSD.\n* Fill your ES db credentials (host, port and index) as a environment variable in your local system. \nCheck [templates.py](templates.py) file.\n\n## Datasets\n\nThe following datasets have corresponding mapping with MSD tracks. These data are ingested to the ES index in an update operation\n\n* [Second Hand Songs (SHS)](https://labrosa.ee.columbia.edu/millionsong/secondhand) dataset. Check the ./data folder\n* For lyrics we used the [musiXmatch (MXM)](https://labrosa.ee.columbia.edu/millionsong/musixmatch) dataset\n\n# Usage\n\n## Modular mode\n\nIn this section, you can have a glimpse on how to use these classes and various methods for doing experiments\n\n```python\n#import modules\nfrom es_search import SearchModule\nfrom experiments import Experiments\nimport templates as presets\n\n# Initiaite es search class\nes = SearchModule(presets.uri_config)\n\n# search method by msd_track title in view mode\nresults = es.search_by_exact_title('Listen To My Babe', 'TRPIIKF128F1459A09', mode='view')\n\n#You can also use the experiment class to automate particular experiments for a method\n#Initiate experiment class with the instance of SearchModule and path to the dataset as arguments\nexp = Experiments(es, './data/test_shs.csv')\n\n#run the song title match experiment with top 100 results\nresults = exp.run_song_title_match_task(size=100)\n\n#compute evaluation metrics for the task\nmean_avg_precison = exp.mean_average_precision(results)\n\n#reset the preset if you want to do another experiment on the same same SearchModule instance.\nexp.reset_preset()\n\nresults = exp.run_mxm_lyrics_search_task(size=1000)\n\nmean_avg_precison = exp.mean_average_precision(results)\n\n```\n\n## Evaluation tasks\n\nSome examples for using functions in evaluations.py script to reproduce the results mentioned in the paper\n```python\nfrom evaluations import *\n\n#Evaluation task on SHS train set against the whole MSD (1 x 999,999 songs)\nshs_train_set_evals(size=100, method=\"msd_title\", mode=\"msd\", with_duplicates=True)\n\n#You can specify various prune sizes and methods as parameters\nshs_train_set_evals(size=1000, method=\"mxm_lyrics\", mode=\"msd\", with_duplicates=False)\n\n#You can run the same experiment only on the SHS train set against itself by specifying \"mode\" param as \"shs\" (1 x 12,960)\nshs_train_set_evals(size=100, method=\"msd_title\", mode=\"shs\", with_duplicates=True)\n\n#In same way you can do the evaluation experiments on SHS test sets\nshs_test_set_evals(size=100, method=\"title_mxm_lyrics\", with_duplicates=True)\n\n```\n\n\nIf you don't want to care about how the module works and you only need results various experiments, then this is for you. \nIt's a wrapper around the modules to run automated experiments and save the results to a .log file or a json_template. \nThe experiments are multi-threaded and able to run from terminal using command-line arguments.\n\n```bash\n$ python evaluations.py -m test -t -1 -e msd -d 0 -s 100\n\n    -m : (type: string) Choose between \"train\" or \"test\" modes\n    -t : (type: int) No of threads\n    -e : (type: int) Choose between \"msd\"\n    -d : (type: boolean) include duplicates\n    -s : (type: int) Required pruning size for the experiments\n\n```\n\n# Cite\n\nIf you use these work, please cite our paper.\n\n```\nCorreya, Albin, Romain Hennequin, and Mickaël Arcos. \"Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features.\" arXiv preprint arXiv:1808.10351 (2018).\n```\n\nBibtex format\n```\n@article{correya2018large,\n  title={Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features},\n  author={Correya, Albin and Hennequin, Romain and Arcos, Micka{\\\"e}l},\n  journal={arXiv preprint arXiv:1808.10351},\n  year={2018}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fcover_song_detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeezer%2Fcover_song_detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fcover_song_detection/lists"}