{"id":15554886,"url":"https://github.com/finn-no/recsys_slates_dataset","last_synced_at":"2025-04-14T10:33:17.572Z","repository":{"id":39866610,"uuid":"348984858","full_name":"finn-no/recsys_slates_dataset","owner":"finn-no","description":"FINN.no Slate Dataset for Recommender Systems. A dataset containing all interactions (viewed items + response (clicked item / no click) for users over a longer time horizon.","archived":false,"fork":false,"pushed_at":"2023-01-29T16:32:55.000Z","size":2384,"stargazers_count":52,"open_issues_count":5,"forks_count":5,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-04-14T00:45:43.732Z","etag":null,"topics":["dataset","deep-learning","pytorch","recommender-system"],"latest_commit_sha":null,"homepage":"https://opensource.finntech.no/recsys_slates_dataset/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/finn-no.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-18T07:47:10.000Z","updated_at":"2024-01-04T16:55:49.000Z","dependencies_parsed_at":"2023-02-16T00:00:57.232Z","dependency_job_id":null,"html_url":"https://github.com/finn-no/recsys_slates_dataset","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finn-no%2Frecsys_slates_dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finn-no%2Frecsys_slates_dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finn-no%2Frecsys_slates_dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finn-no%2Frecsys_slates_dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/finn-no","download_url":"https://codeload.github.com/finn-no/recsys_slates_dataset/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248862825,"owners_count":21173891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","deep-learning","pytorch","recommender-system"],"created_at":"2024-10-02T15:04:06.033Z","updated_at":"2025-04-14T10:33:17.537Z","avatar_url":"https://github.com/finn-no.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"FINN.no Slate Dataset for Recommender Systems\n================\n\n\u003c!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! --\u003e\n\nWe release the *FINN.no slate dataset* to improve recommender systems\nresearch. The dataset includes both search and recommendation\ninteractions between users and the platform over a 30 day period. The\ndataset has logged both exposures and clicks, *including interactions\nwhere the user did not click on any of the items in the slate*. To our\nknowledge there exists no such large-scale dataset, and we hope this\ncontribution can help researchers constructing improved models and\nimprove offline evaluation metrics.\n\n![A visualization of a presented slate to the user on the frontpage of\nFINN.no](finn-frontpage.png)\n\nFor each user u and interaction step t we recorded all items in the\nvisible slate\n![equ](https://latex.codecogs.com/gif.latex?a_t%5Eu(s_t%5Eu)) (up to the\nscroll length ![equ](https://latex.codecogs.com/gif.latex?s_t%5Eu)), and\nthe user’s click response\n![equ](https://latex.codecogs.com/gif.latex?c_t%5Eu). The dataset\nconsists of 37.4 million interactions, \\|U\\| ≈ 2.3) million users and\n\\|I\\| ≈ 1.3 million items that belong to one of G = 290 item groups. For\na detailed description of the data please see the\n[paper](https://arxiv.org/abs/2104.15046).\n\n![A visualization of a presented slate to the user on the frontpage of\nFINN.no](interaction_illustration.png)\n\nFINN.no is the leading marketplace in the Norwegian classifieds market\nand provides users with a platform to buy and sell general merchandise,\ncars, real estate, as well as house rentals and job offerings. For\nquestions, email simen.eide@finn.no or file an issue.\n\n## Install\n\n`pip install recsys_slates_dataset`\n\n## How to use\n\nTo download the generic numpy data files:\n\n``` python\nfrom recsys_slates_dataset import data_helper\ndata_helper.download_data_files(data_dir=\"data\")\n```\n\nDownload and prepare data into ready-to-use PyTorch dataloaders:\n\n``` python\nfrom recsys_slates_dataset import dataset_torch\nind2val, itemattr, dataloaders = dataset_torch.load_dataloaders(data_dir=\"data\")\n```\n\n## Organization\n\nThe repository is organized as follows: - The dataset is placed in\n`data/` and stored using git-lfs. We also provide an automatic download\nfunction in the pip package (preferred usage). - The code open sourced\nfrom the article [“Dynamic Slate Recommendation with Gated Recurrent\nUnits and Thompson Sampling”](https://arxiv.org/abs/2104.15046) is found\nin (`code_eide_et_al21/`). However, we are in the process of making the\ndata more generally available which makes the code incompatible with the\ncurrent (newer) version of the data. Please use [the v1.0 release of the\nrepository](https://github.com/finn-no/recsys-slates-dataset/tree/v1.0)\nfor a compatible version of the code and dataset.\n\n## Quickstart dataset [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/finn-no/recsys-slates-dataset/blob/main/examples/quickstart-finn-recsys-slate-data.ipynb)\n\nWe provide a quickstart Jupyter notebook that runs on Google Colab\n(quickstart-finn-recsys-slate-data.ipynb) which includes all necessary\nsteps above. It gives a quick introduction to how to use the dataset.\n\n## Example training scripts\n\nWe provide an example training jupyter notebook that implements a matrix\nfactorization model with categorical loss that can be found in\n`examples/`. It is also runnable using Google Colab:\n[![matrix_factorization.ipynb](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/finn-no/recsys-slates-dataset/blob/main/examples/matrix_factorization.ipynb)  \nThere is ongoing work in progress to build additional examples and use\nthem as benchmarks for the dataset.\n\n### Dataset files\n\nThe dataset `data.npz` contains the following fields: - userId: The\nunique identifier of the user. - click: The items the user clicked on in\neach of the 20 presented slates. - click_idx: The index the clicked item\nwas on in each of the 20 presented slates. - slate_lengths: The length\nof the 20 presented slates. - slate: All the items in each of the 20\npresented slates. - interaction_type: The recommendation slate can be\nthe result of a search query (1), a recommendation (2) or can be\nundefined (0).\n\nThe dataset `itemattr.npz` contains the categories ranging from 0 to\n290. Corresponding with the 290 unique groups that the items belong to.\nThese 290 unique groups are constructed using a combination of\ncategorical information and the geographical location.\n\nThe dataset `ind2val.json` contains the mapping between the indices and\nthe values of the categories (e.g. `\"287\": \"JOB, Rogaland\"`) and\ninteraction types (e.g. `\"1\": \"search\"`).  \n\n## Citations \nThis repository accompanies the paper [“Dynamic Slate\nRecommendation with Gated Recurrent Units and Thompson\nSampling”](https://arxiv.org/abs/2104.15046) by Simen Eide, David S.\nLeslie and Arnoldo Frigessi. \n\nIf you use either the code, data or paper, please consider citing the\npaper.\n\n    Eide, S., Leslie, D.S. \u0026 Frigessi, A. Dynamic slate recommendation with gated recurrent units and Thompson sampling. Data Min Knowl Disc (2022). https://doi.org/10.1007/s10618-022-00849-w\n\n## Todo\n\nThis repository is currently *work in progress*, and we will provide\ndescriptions and tutorials. Suggestions and contributions to make the\nmaterial more available are welcome. There are some features of the\nrepository that we are working on:\n\n- [ ] Add more usable functions that compute relevant metrics such as\n  F1, counterfactual metrics etc.\n- [ ] The git lfs is currently broken by removing some lines in\n  .gitattributes that is in conflict with nbdev. The dataset is still\n  usable using the building download functions as they use a different\n  source. However, we should fix this. An issue is [posted on\n  nbdev](https://github.com/fastai/nbdev/issues/506).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinn-no%2Frecsys_slates_dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffinn-no%2Frecsys_slates_dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinn-no%2Frecsys_slates_dataset/lists"}