https://github.com/deezer/cover_song_detection

Tools to run experiments around large scale cover detection.
https://github.com/deezer/cover_song_detection

Last synced: 8 months ago
JSON representation

Tools to run experiments around large scale cover detection.

Host: GitHub
URL: https://github.com/deezer/cover_song_detection
Owner: deezer
Created: 2018-06-15T15:55:51.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2022-09-30T18:31:03.000Z (over 3 years ago)
Last Synced: 2025-04-01T14:01:41.080Z (about 1 year ago)
Language: Python
Size: 1.04 MB
Stars: 27
Watchers: 10
Forks: 5
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Large Scale Cover Detection in Digital Music Libraries using Metadata, Lyrics and Audio Features

Source code and supplementary materials for the paper "Correya, Albin, Romain Hennequin, and Mickaël Arcos. "Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features." arXiv preprint arXiv:1808.10351 (2018)".

This repo contains scripts to run text-based experiments for cover song detection task on the [MillionSongDataset (MSD)](https://labrosa.ee.columbia.edu/millionsong/)

which is imported into an [Elasticsearch (ES)](https://www.elastic.co/blog/what-is-an-elasticsearch-index) index as described in the above mentioned paper.

# Requirements

Install python dependencies from the requirements.txt file

```

$ pip install -r requirements.txt

```

# Setup

* Use [ElasticMSD](https://github.com/deezer/elasticmsd) scripts to setup your local Elasticsearch index of MSD.

* Fill your ES db credentials (host, port and index) as a environment variable in your local system. 

Check [templates.py](templates.py) file.

## Datasets

The following datasets have corresponding mapping with MSD tracks. These data are ingested to the ES index in an update operation

* [Second Hand Songs (SHS)](https://labrosa.ee.columbia.edu/millionsong/secondhand) dataset. Check the ./data folder

* For lyrics we used the [musiXmatch (MXM)](https://labrosa.ee.columbia.edu/millionsong/musixmatch) dataset

# Usage

## Modular mode

In this section, you can have a glimpse on how to use these classes and various methods for doing experiments

```python

#import modules

from es_search import SearchModule

from experiments import Experiments

import templates as presets

# Initiaite es search class

es = SearchModule(presets.uri_config)

# search method by msd_track title in view mode

results = es.search_by_exact_title('Listen To My Babe', 'TRPIIKF128F1459A09', mode='view')

#You can also use the experiment class to automate particular experiments for a method

#Initiate experiment class with the instance of SearchModule and path to the dataset as arguments

exp = Experiments(es, './data/test_shs.csv')

#run the song title match experiment with top 100 results

results = exp.run_song_title_match_task(size=100)

#compute evaluation metrics for the task

mean_avg_precison = exp.mean_average_precision(results)

#reset the preset if you want to do another experiment on the same same SearchModule instance.

exp.reset_preset()

results = exp.run_mxm_lyrics_search_task(size=1000)

mean_avg_precison = exp.mean_average_precision(results)

```

## Evaluation tasks

Some examples for using functions in evaluations.py script to reproduce the results mentioned in the paper

```python

from evaluations import *

#Evaluation task on SHS train set against the whole MSD (1 x 999,999 songs)

shs_train_set_evals(size=100, method="msd_title", mode="msd", with_duplicates=True)

#You can specify various prune sizes and methods as parameters

shs_train_set_evals(size=1000, method="mxm_lyrics", mode="msd", with_duplicates=False)

#You can run the same experiment only on the SHS train set against itself by specifying "mode" param as "shs" (1 x 12,960)

shs_train_set_evals(size=100, method="msd_title", mode="shs", with_duplicates=True)

#In same way you can do the evaluation experiments on SHS test sets

shs_test_set_evals(size=100, method="title_mxm_lyrics", with_duplicates=True)

```

If you don't want to care about how the module works and you only need results various experiments, then this is for you. 

It's a wrapper around the modules to run automated experiments and save the results to a .log file or a json_template. 

The experiments are multi-threaded and able to run from terminal using command-line arguments.

```bash

$ python evaluations.py -m test -t -1 -e msd -d 0 -s 100

    -m : (type: string) Choose between "train" or "test" modes

    -t : (type: int) No of threads

    -e : (type: int) Choose between "msd"

    -d : (type: boolean) include duplicates

    -s : (type: int) Required pruning size for the experiments

```

# Cite

If you use these work, please cite our paper.

```

Correya, Albin, Romain Hennequin, and Mickaël Arcos. "Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features." arXiv preprint arXiv:1808.10351 (2018).

```

Bibtex format

```

@article{correya2018large,

  title={Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features},

  author={Correya, Albin and Hennequin, Romain and Arcos, Micka{\"e}l},

  journal={arXiv preprint arXiv:1808.10351},

  year={2018}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/deezer/cover_song_detection

Awesome Lists containing this project

README