{"id":13732703,"url":"https://github.com/westlake-repl/PixelRec","last_synced_at":"2025-05-08T08:32:10.268Z","repository":{"id":195284658,"uuid":"692620761","full_name":"westlake-repl/PixelRec","owner":"westlake-repl","description":"A  Large-scale Multimodal Dataset for recommender System","archived":false,"fork":false,"pushed_at":"2024-09-21T07:00:36.000Z","size":101982,"stargazers_count":107,"open_issues_count":0,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-15T01:32:57.765Z","etag":null,"topics":["foundation-model","image-recommendation","large-image-dataset","large-language-model","llm4rec","multimodal-recommendation","multimodal-recommendation-dataset","pre-train-recommendation","recommender-system","text-recommendation","vision-recommendation","visual-recommender-system"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/westlake-repl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-17T03:49:57.000Z","updated_at":"2024-11-13T14:21:10.000Z","dependencies_parsed_at":"2024-08-09T10:26:04.105Z","dependency_job_id":"286be3c2-1bd7-441f-9a30-e5060c0621ec","html_url":"https://github.com/westlake-repl/PixelRec","commit_stats":null,"previous_names":["westlake-repl/pixelrec"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/westlake-repl%2FPixelRec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/westlake-repl%2FPixelRec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/westlake-repl%2FPixelRec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/westlake-repl%2FPixelRec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/westlake-repl","download_url":"https://codeload.github.com/westlake-repl/PixelRec/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253029154,"owners_count":21843032,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["foundation-model","image-recommendation","large-image-dataset","large-language-model","llm4rec","multimodal-recommendation","multimodal-recommendation-dataset","pre-train-recommendation","recommender-system","text-recommendation","vision-recommendation","visual-recommender-system"],"created_at":"2024-08-03T03:00:32.542Z","updated_at":"2025-05-08T08:32:10.257Z","avatar_url":"https://github.com/westlake-repl.png","language":"Python","funding_links":[],"categories":["Dataset","2. Datasets \u0026 Benchmarks","The papers and related projects"],"sub_categories":["2.1 Datasets","Common Datasets"],"readme":"# [SDM2024] [PixelRec: An Image Dataset for Benchmarking Recommender Systems with Raw Pixels](https://arxiv.org/pdf/2309.06789.pdf)\n\n\u003ca href=\"https://arxiv.org/pdf/2309.06789.pdf\" alt=\"paper\"\u003e\u003cimg src=\"https://img.shields.io/badge/ArXiv-2309.06789-FAA41F.svg?style=flat\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/westlake-repl/PixelRec/blob/main/doc/pre.pdf\" alt=\"slides\"\u003e\u003cimg src=\"https://img.shields.io/badge/Slides-SDM2024-yellow\" /\u003e\u003c/a\u003e\n\u003ca href=\"https://medium.com/@lifengyi_6964/pixelrec-a-large-scale-multimodal-recommendation-dataset-under-short-video-scenario-b5e4113ee4ea\" alt=\"blog\"\u003e\u003cimg src=\"https://img.shields.io/badge/Blog-English-blue\" /\u003e\u003c/a\u003e \n\u003ca href=\"https://zhuanlan.zhihu.com/p/684805058\" alt=\"博客\"\u003e\u003cimg src=\"https://img.shields.io/badge/%E5%8D%9A%E5%AE%A2-%E4%B8%AD%E6%96%87-purple\" /\u003e\u003c/a\u003e \n\n![Multi-Modal](https://img.shields.io/badge/Task-Multi--Modal-red) \n![Foundation-Model](https://img.shields.io/badge/Task-Foundation_Model-red) \n![Recommendation](https://img.shields.io/badge/Task-Recommendation-red) \n\nQuick Links: [🗃️Dataset](#Dataset) |\n[🛠️Experiments](#Experiments) |\n[👀Others](#Others) |\n[📭Citation](#Citation) |\n[💡News](#News)\n\n\n\n\u003c!--# Note\nIn this paper, we evaluate the PixelNet model based on end-to-end training of the recommendation backbone and item modality encoders, which is computationally expensive. The reason we do this is because end-to-end training shows better performance than pre-extracted multi-modal features. However, we hope that PixelRec can inspire more effective and efficient ways to exploit visual features rather than just limiting them to end-to-end training paradigms. If you can develop a very efficient method that goes beyond end-to-end training, it will be a great contribution to the community!!!--\u003e\n\n# Dataset\n\n### Overview\n\n\u003cdiv align=center\u003e\u003cimg src=\"https://github.com/westlake-repl/PixelRec/blob/main/dataset/overview.png\"/\u003e\u003c/div\u003e\n\n### Download Link \n[**Interaction**](https://drive.google.com/drive/folders/1vR1lgQUZCy1cuhzPkM2q7AsdYRP43feQ?usp=drive_link) It contains the interaction of  **Pixel200K**, **Pixel1M**, **Pixel8M** and **PixelRec**, see `dataset/statistics` for detailed statistics. \n\n[**Item Infomation**](https://drive.google.com/drive/folders/1rXBM-zi5sSdLHNshXWtGgVReWLuYNgDB?usp=drive_link)  It contains the item description/attributes of  **Pixel200K**, **Pixel1M**, **Pixel8M** and **PixelRec**, see `dataset` for its detailed descriptions. \n\n[**Cover**](https://drive.google.com/file/d/17V70KN6UOAdphNEc0wXlocFwgmI7hVOo/view?usp=drive_link)  It includes all the images in **PixelRec**,  a total of 408,374 covers. \n\n[**Extracted Features**](https://drive.google.com/drive/folders/1ahUB5fwaBU-6RS2hjtEdBjoWw-Z4r6pk?usp=drive_link), currently including [text feature vectors](https://drive.google.com/file/d/1t1ZknzSY-8KxhhfTWMORh66BOdV7qmCj/view?usp=drive_link) and [image feature vectors](https://drive.google.com/file/d/12VW6o5AToMFWLbSILi5_c6tlSXS43qm6/view?usp=drive_link).\n\nA sampled dataset PixelRec50K was provided to help quickly understand the data contained in PixelRec. This data includes 989,494 interactions from 50,000 users with 82,865 items. The interaction data, item attributes, and covers can be downloaded [here](https://drive.google.com/drive/folders/1bQPgM-6yAnzcD0jKBoUUheA9LL5xnCHG?usp=drive_link). \n\n\n\nWe provide an [integrated folder](https://drive.google.com/file/d/1fu0tqCmmXkte5PAsyMo0DQrDS0zofTLH/view?usp=drive_link) for Pixel200K. After downloading the data file in this format, you can directly run the experiments in the paper under Pixel200K.\n\n\n\n\u003e :warning: **Caution**: It's prohibited to privately modify the dataset and then offer secondary downloads. If you've made alterations to the dataset in your work, you are encouraged to open-source the data processing code, so others can benefit from your methods. Or notify us of your new dataset so we can put it on this Github with your paper.\n\n**Note that this is an image recommendation dataset, if you need video information, please go to our MicroLens github https://github.com/westlake-repl/MicroLens, a large-scale micro-video recommendation dataset collected from a different platform.**\n\n\n\n# Experiments\n\n## Environments\n\n```\nPytorch==1.10.2\ncudatoolkit==11.2.1\npython==3.9.7\n```\n\nSee requirements.txt for other packages:\n```python\npip install -r requirements.txt\n```\n\n\n## Run Baselines\n\nTo run the baselines:\n\n- Download the interaction data and images.\n\n- Generate lmdb database from the images:\n\n```python\ncd code \u0026\u0026 python generate_lmdb.py\n```\n\n- You can choose different `yaml` files to run different baselines, the `yaml` files are under folders `IDNet`, `PixelNet` ,`ViNet` and `overall`\n\nTo run IDNet, for example, run `SASRec` model on one card:\n\n```python\npython main.py --device 0 --config_file IDNet/sasrec.yaml overall/ID.yaml\n```\n\nChange the `IDNet/sasrec.yaml` to run other IDNet baselines.\n\n\n\nTo run PixelNet, for example, run `SASRec` model with `ViT` encoders on four cards:\n\n```python\npython main.py --device 0,1,2,3 --config_file PixelNet/sasrec.yaml overall/ViT.yaml\n```\n\nChange  `PixelNet/sasrec.yaml` to run other PixelNet baselines with `ViT` as item encoder,  change  `overall/ViT.yaml` to run `sasrec` model with other image encoders.\n\n\n\nTo run ViNet, e.g. run `VBPR` model on one card:\n\n```python\npython main.py --device 0 --config_file ViNet/vbpr.yaml\n```\n\nChange  `ViNet/vbpr.yaml` to run other ViNet\n\n\n\nNote: you may need to modify some path in files under folders `ViNet` and `overall` and file `generate_lmdb.py` , depending on where you put the downloaded data.\n\n\n\n## Hyper Parameters\n\n\u003e Hyper parameter range : \n\u003e\n\u003e embedding size [128, 512, 1024, 2048, 4096, 8192]\n\u003e\n\u003e learning rate [0.000001, 0.00005, ... , 0.001]\n\u003e\n\u003e weight decay [0, 0.01, 0.1]\n\u003e\n\u003e batch size [64, 128, 256, 512, 1024]\n\n\n\n**Hyper-parameter details of IDNet. $\\gamma$,  $\\beta$ and $B$ are the learning rate, weight decay and batch size respectively.**\n\n| Method (IDNet) | Model Parameters                                             | Training Parameters                                          |\n| -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| MF             | dropout prob [0]    embedding size [4096]                    | γ [0.0001]   B [512]     β [0]                               |\n| FM             | embedding size [4096]                                        | γ [0.00005]   B [64]     β [0]                               |\n| DSSM           | dnn layer number [0]    embedding size [4096]                | γ [0.0001]   B [64]     β [0]                                |\n| LightGCN       | step [2]    embedding size [256]                             | γ [0.0005]   B [1024]     β [0.01]                           |\n| SASRec         | trm layer number [2]    inner size [2]    embedding size [512] | γ [0.00005]   B [64]     β [0.1]                             |\n| BERT4Rec       | mask ratio [0.6]    trm layer number [2]    inner size  [1]    embedding size [512] | γ [0.00005]   B [64]     β [0.1]                             |\n| LightSANs      | k [3]    trm layer number [1]  embedding size [512]          | γ [0.00005]   B [512]     β [0.1]                            |\n| GRU4Rec        | dropout prob [0]    gru layer number [1]    inner size  [2]    embedding size [2048] | γ [0.0001]   B [64]     β [0.01]                             |\n| NextItNet      | block number [3]    embedding size [1024]                    | γ [0.0005]   B [64]     β [0.01]                             |\n| SRGNN          | step [2]    embedding size [512]                             | γ [0.00005]   B [64]     β [0.01]                            |\n| VisRank        | visual feature [RN_2048]    method [maximum]                 |                                                              |\n| VBPR           |                                                              | id γ [0.001]    id β [0]    visual γ [0.0001]    visual β [0.1] |\n| ACF            | embedding size [128]                                         | γ [0.0001]   B [64]     β [0.1]                              |\n\n\n\n**For the most architectures, PixelNet uses the same hyperparameters as its IDNet, with a few exceptions here. The embedding size refers to the hidden dimension of the user encoder.**\n\n| Method  (PixelNet) | Model Parameters                                             | Training Parameters               |\n| ------------------ | ------------------------------------------------------------ | --------------------------------- |\n| SASRec             | trm layer number [2]    inner size [2]    embedding size [512] | γ [0.0001]   B [64]     β [0.1]   |\n| BERT4Rec           | mask ratio [0.6]    trm layer number [2]    inner size  [1]    embedding size [512] | γ [0.0001]   B [64]     β [0.1]   |\n| LightSANs          | k [3]    trm layer number [1]  embedding size [512]          | γ [0.0001]   B [512]     β [0.1]  |\n| NextItNet          | block number [3]    embedding size [1024]                    | γ [0.0001]   B [64]     β [0.01]  |\n| SRGNN              | step [2]    embedding size [512]                             | γ [0.0001]   B [512]     β [0.01] |\n\n\n\n**In PixelNet, we adopt different learning rate and weight decay between the image encoder and the rest of the model structures. Here are the hyper-parameter for tuning the image encoders.**\n\n| Image Encoder                            | Hyper Parameter       |\n| ---------------------------------------- | --------------------- |\n| RN50, RN50x4, RN50x16, RN50x64, ResNet50 | γ [0.0001]   β [0.01] |\n| ViT, Swin-T, Swin-B, BEiT                | γ [0.0001]   β [0]    |\n\n\n\n\n\n\n\n\n\n# Citation\n\nIf our work has been of assistance to your work, please cite our paper as :  \n\n```\n@article{cheng2023image,\n  title={An Image Dataset for Benchmarking Recommender Systems with Raw Pixels},\n  author={Cheng, Yu and Pan, Yunzhu and Zhang, Jiaqi and Ni, Yongxin and Sun, Aixin and Yuan, Fajie},\n  journal={arXiv preprint arXiv:2309.06789},\n  year={2023}\n}\n```\n# More Resources：\n|MicroLens (A short video recommendation dataset) | https://github.com/westlake-repl/MicroLens |\n\n|Tenrec (A dataset with 10 diverse recommendation tasks) | https://github.com/yuangh-x/2022-NIPS-Tenrec   |\n\n|NineRec (A dataset suite covering 9 downstream recommendation tasks) | https://github.com/westlake-repl/NineRec   |\n\n# News\n\\- **2024/11/18**: We have updated a missing item \"i192714\" in the item information files.\n\n\\- **2024/04/18**: We have added the \"description\" column to the item information. \n\n\n**License**\n\n- The **code** in this repository is under the MIT License. Please look at the [LICENSE](LICENSE) file for details.\n- See the `dataset/LICENSE` file for **dataset** license details.\n\n\n\n\n\n\n\n\n#### :bulb: If you have an innovative idea for building a foundation recommendation model but require a large dataset and computational resources, consider joining our lab as an intern or visiting scholar. We can provide access to 100 NVIDIA 80G A100 GPUs and a billion-level dataset of user-image/text interactions.\n\n\n\n\n#### The laboratory is hiring research assistants, interns, doctoral students, and postdoctoral researchers. Please contact the corresponding author for details.\n\n\n#### 实验室招聘科研助理，实习生，博士生和博士后，请联系通讯作者。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwestlake-repl%2FPixelRec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwestlake-repl%2FPixelRec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwestlake-repl%2FPixelRec/lists"}