{"id":25215728,"url":"https://github.com/deezer/similar_artists_ranking","last_synced_at":"2025-11-05T09:03:12.694Z","repository":{"id":59311373,"uuid":"385268171","full_name":"deezer/similar_artists_ranking","owner":"deezer","description":"Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders (RecSys 2021)","archived":false,"fork":false,"pushed_at":"2021-10-17T13:44:06.000Z","size":90578,"stargazers_count":19,"open_issues_count":1,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-04-16T11:27:18.821Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-12T14:04:06.000Z","updated_at":"2022-11-25T07:44:08.000Z","dependencies_parsed_at":"2022-09-23T19:02:35.333Z","dependency_job_id":null,"html_url":"https://github.com/deezer/similar_artists_ranking","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsimilar_artists_ranking","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsimilar_artists_ranking/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsimilar_artists_ranking/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsimilar_artists_ranking/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deezer","download_url":"https://codeload.github.com/deezer/similar_artists_ranking/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238161490,"owners_count":19426669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-10T18:15:30.189Z","updated_at":"2025-10-20T02:58:56.310Z","avatar_url":"https://github.com/deezer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders\n\n\nThis repository provides code and data to reproduce results from the article [Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders](https://arxiv.org/pdf/2108.01053.pdf), published in the proceedings of the 15th ACM Conference on Recommender Systems ([RecSys 2021](https://recsys.acm.org/recsys21/)).\n\n\n## Recommending Similar Artists on Music Streaming Services \n\nOn an artist’s profile page, music streaming services such as [Deezer](https://www.deezer.com/) frequently recommend a ranked list of _\"similar artists\"_ that fans also liked. However, implementing such a feature is challenging for new artists, for which usage data on the service (e.g. streams or likes) is not\nyet available. In Section 3 of our paper, we model this cold start similar artists ranking problem as a _directed link prediction task_ in a directed and attributed graph, connecting artists to their top-_k_ most similar neighbors and incorporating side musical information.\nThen, we address this task by learning node embedding representations from this graph, notably by leveraging [gravity-inspired graph (variational) autoencoders](https://github.com/deezer/gravity_graph_autoencoders/) models.\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg\n        src=\"images/image.png\"\n        width=\"65%\"\u003e\n\u003c/div\u003e\n\n\n## Datasets\n\nIn the `data` repository, we release two datasets associated with this work, and detailed in Section 4.1.1 of the paper.\n\n#### Directed graph `deezer_graph.csv`\n\nThis file provides a directed graph dataset of 24 270 artists from the Deezer catalog. Artist point towards 20 other artists\u003csup\u003e[1](#myfootnote1)\u003c/sup\u003e. They correspond, up to internal business rules, to the top-20 artists from the same graph that would be recommended in production in our  _\"Similar Artists\"_ feature. Due to confidentiality constraints, artists are unfortunately anonymized. \n\nEach row corresponds to a directed edge from an artist `i` to an artist `j` in the format `(id_i, id_j, S_ij)`. The edge weight `S_ij` denotes the similarity score\u003csup\u003e[2](#myfootnote2)\u003c/sup\u003e of `j` with respect to `i`, as described in the paper.\n\n\u003csub\u003e\u003csup\u003e\u003ca name=\"footnote1\"\u003e1\u003c/a\u003e: A minority of artists actually have fewer than 20 neighbors in this graph. This corresponds to special case where similarity scores associated the removed connections were lower than 0.\u003c/sup\u003e\u003c/sub\u003e\n\n\u003csub\u003e\u003csup\u003e\u003ca name=\"footnote2\"\u003e2\u003c/a\u003e: For graph construction purposes, `deezer_graph.csv` includes self-loops. They are removed when loading the graph for experiments. Also, while similarity scores might be \u003e 1, they were normalized in the [0,1] set for AE/VAE-related model trainings. \u003c/sup\u003e\u003c/sub\u003e\n\n#### Descriptive features `deezer_features.csv`\n\nThis file provides descriptions of these artists, as detailed in Section 4.1.1. Specifically, each artist `i` is described by a 56-dimensional feature vector:\n- the first column of `deezer_features.csv` corresponds to artist ids;\n- the next 32 columns correspond to the _music genre_ embedding vector of each artist;\n- the next 20 columns correspond to the _country_ indicator vector of each artist. Countries are anonymized;\n- the last 4 columns correspond to the _music mood_ vector of each artist.\n\nIn our experiments, the top-80% of artists ids are _train_ artists. The next 10% correspond to _test_ artists and the last 10% to _validation_ artists.\nTrain artists are ranked by popularity on Deezer, i.e. artist `1` from the dataset is the most popular artist from the graph.\n\n## Experiments\n\n#### Installation \n\n```Bash\ngit clone https://github.com/deezer/similar_artists_ranking\ncd similar_artists_ranking\npython setup.py install\ncd src\n```\n\nRequirements: python **3.8**, networkx, numpy, scikit-learn, scipy.\n\n\n#### Run experiments\n\nThe following command will execute experiments corresponding to Table 1 from the paper and report all Recall@K, MAP@K and NDCG@K scores:\n\n```Bash\npython main.py\n```\n\nVarious options can be changed in the `option.py` file.\n\nWe re-compute the following methods from scratch: Popularity, Popularity-Country, In-Degree, In-Degree by Country, K-NN, K-NN+Popularity, K-NN+In-degree.\n\nRegarding the DEAL, DropoutNet, STAR-GCN and SVD-DNN methods, we provide representative pre-computed node embedding vectors (that we obtained from models trained on Deezer internal usage data on train artists) in the `embeddings` folder. Then, we re-compute scores\u003csup\u003e[3](#myfootnote3)\u003c/sup\u003e obtained from these embedding vectors.\n\nRegarding the Standard, Source-Target, and Gravity GAE/VGAE models, we also provide representative pre-computed node embedding vectors in this repository.\nFor model training, we used our Tensorflow implementation of these models available [here](https://github.com/deezer/gravity_graph_autoencoders/).\nThis implementation builds upon T. Kipf's original [gae](https://github.com/tkipf/gae) repository.\n\n\u003csub\u003e\u003csup\u003e\u003ca name=\"footnote3\"\u003e3\u003c/a\u003e: Note: as these methods include _random_ components during training, scores from Table 1 are actually averaged over 20 model trainings with different neural initializations. Scores obtained from the specific embeddings provided in `embeddings` will therefore slighly deviate from these averages.\u003c/sup\u003e\u003c/sub\u003e\n\n#### To do list:\n\n- incorporate FastGAE in TF code for faster training\n\n\n## Cite\n\nPlease consider citing our paper if you use these datasets in your own work:\n\n```BibTeX\n@inproceedings{salha2021coldstart,\n  title={Cold Start Similar Artists Ranking with Gravity-Inspired Graph Autoencoders},\n  author={Salha-Galvan, Guillaume and Hennequin, Romain and Chapus, Benjamin and Tran, Viet-Anh and Vazirgiannis, Michalis},\n  booktitle={Fifteenth ACM Conference on Recommender Systems},\n  pages={443--452},\n  year={2021}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fsimilar_artists_ranking","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeezer%2Fsimilar_artists_ranking","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fsimilar_artists_ranking/lists"}