https://github.com/deezer/similar_artists_ranking
Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders (RecSys 2021)
https://github.com/deezer/similar_artists_ranking
Last synced: 8 months ago
JSON representation
Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders (RecSys 2021)
- Host: GitHub
- URL: https://github.com/deezer/similar_artists_ranking
- Owner: deezer
- Created: 2021-07-12T14:04:06.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2021-10-17T13:44:06.000Z (over 4 years ago)
- Last Synced: 2024-04-16T11:27:18.821Z (about 2 years ago)
- Language: Python
- Homepage:
- Size: 86.4 MB
- Stars: 19
- Watchers: 6
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders
This repository provides code and data to reproduce results from the article [Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders](https://arxiv.org/pdf/2108.01053.pdf), published in the proceedings of the 15th ACM Conference on Recommender Systems ([RecSys 2021](https://recsys.acm.org/recsys21/)).
## Recommending Similar Artists on Music Streaming Services
On an artist’s profile page, music streaming services such as [Deezer](https://www.deezer.com/) frequently recommend a ranked list of _"similar artists"_ that fans also liked. However, implementing such a feature is challenging for new artists, for which usage data on the service (e.g. streams or likes) is not
yet available. In Section 3 of our paper, we model this cold start similar artists ranking problem as a _directed link prediction task_ in a directed and attributed graph, connecting artists to their top-_k_ most similar neighbors and incorporating side musical information.
Then, we address this task by learning node embedding representations from this graph, notably by leveraging [gravity-inspired graph (variational) autoencoders](https://github.com/deezer/gravity_graph_autoencoders/) models.
## Datasets
In the `data` repository, we release two datasets associated with this work, and detailed in Section 4.1.1 of the paper.
#### Directed graph `deezer_graph.csv`
This file provides a directed graph dataset of 24 270 artists from the Deezer catalog. Artist point towards 20 other artists[1](#myfootnote1). They correspond, up to internal business rules, to the top-20 artists from the same graph that would be recommended in production in our _"Similar Artists"_ feature. Due to confidentiality constraints, artists are unfortunately anonymized.
Each row corresponds to a directed edge from an artist `i` to an artist `j` in the format `(id_i, id_j, S_ij)`. The edge weight `S_ij` denotes the similarity score[2](#myfootnote2) of `j` with respect to `i`, as described in the paper.
1: A minority of artists actually have fewer than 20 neighbors in this graph. This corresponds to special case where similarity scores associated the removed connections were lower than 0.
2: For graph construction purposes, `deezer_graph.csv` includes self-loops. They are removed when loading the graph for experiments. Also, while similarity scores might be > 1, they were normalized in the [0,1] set for AE/VAE-related model trainings.
#### Descriptive features `deezer_features.csv`
This file provides descriptions of these artists, as detailed in Section 4.1.1. Specifically, each artist `i` is described by a 56-dimensional feature vector:
- the first column of `deezer_features.csv` corresponds to artist ids;
- the next 32 columns correspond to the _music genre_ embedding vector of each artist;
- the next 20 columns correspond to the _country_ indicator vector of each artist. Countries are anonymized;
- the last 4 columns correspond to the _music mood_ vector of each artist.
In our experiments, the top-80% of artists ids are _train_ artists. The next 10% correspond to _test_ artists and the last 10% to _validation_ artists.
Train artists are ranked by popularity on Deezer, i.e. artist `1` from the dataset is the most popular artist from the graph.
## Experiments
#### Installation
```Bash
git clone https://github.com/deezer/similar_artists_ranking
cd similar_artists_ranking
python setup.py install
cd src
```
Requirements: python **3.8**, networkx, numpy, scikit-learn, scipy.
#### Run experiments
The following command will execute experiments corresponding to Table 1 from the paper and report all Recall@K, MAP@K and NDCG@K scores:
```Bash
python main.py
```
Various options can be changed in the `option.py` file.
We re-compute the following methods from scratch: Popularity, Popularity-Country, In-Degree, In-Degree by Country, K-NN, K-NN+Popularity, K-NN+In-degree.
Regarding the DEAL, DropoutNet, STAR-GCN and SVD-DNN methods, we provide representative pre-computed node embedding vectors (that we obtained from models trained on Deezer internal usage data on train artists) in the `embeddings` folder. Then, we re-compute scores[3](#myfootnote3) obtained from these embedding vectors.
Regarding the Standard, Source-Target, and Gravity GAE/VGAE models, we also provide representative pre-computed node embedding vectors in this repository.
For model training, we used our Tensorflow implementation of these models available [here](https://github.com/deezer/gravity_graph_autoencoders/).
This implementation builds upon T. Kipf's original [gae](https://github.com/tkipf/gae) repository.
3: Note: as these methods include _random_ components during training, scores from Table 1 are actually averaged over 20 model trainings with different neural initializations. Scores obtained from the specific embeddings provided in `embeddings` will therefore slighly deviate from these averages.
#### To do list:
- incorporate FastGAE in TF code for faster training
## Cite
Please consider citing our paper if you use these datasets in your own work:
```BibTeX
@inproceedings{salha2021coldstart,
title={Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders},
author={Salha-Galvan, Guillaume and Hennequin, Romain and Chapus, Benjamin and Tran, Viet-Anh and Vazirgiannis, Michalis},
booktitle={Fifteenth ACM Conference on Recommender Systems},
pages={443--452},
year={2021}
}
```