{"id":13774829,"url":"https://github.com/fuxiAIlab/RL4RS","last_synced_at":"2025-05-11T07:30:42.823Z","repository":{"id":44409210,"uuid":"458985611","full_name":"fuxiAIlab/RL4RS","owner":"fuxiAIlab","description":"A Real-World Benchmark for Reinforcement Learning based Recommender System","archived":false,"fork":false,"pushed_at":"2024-02-03T07:09:15.000Z","size":2029,"stargazers_count":217,"open_issues_count":6,"forks_count":26,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-08-03T17:11:08.442Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-sa-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fuxiAIlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-14T02:05:45.000Z","updated_at":"2024-07-23T07:33:51.000Z","dependencies_parsed_at":"2024-01-07T21:02:25.798Z","dependency_job_id":"a51e0eb5-fe17-4f6c-bc83-50d644049ba5","html_url":"https://github.com/fuxiAIlab/RL4RS","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fuxiAIlab%2FRL4RS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fuxiAIlab%2FRL4RS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fuxiAIlab%2FRL4RS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fuxiAIlab%2FRL4RS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fuxiAIlab","download_url":"https://codeload.github.com/fuxiAIlab/RL4RS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225022309,"owners_count":17408607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T17:01:30.763Z","updated_at":"2024-11-17T09:31:25.719Z","avatar_url":"https://github.com/fuxiAIlab.png","language":"Python","funding_links":[],"categories":["Open Source Software/Implementations","其他_推荐系统"],"sub_categories":["Off-Policy Evaluation and Learning: Applications","网络服务_其他"],"readme":"# RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System\n\u003c!-- [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) --\u003e\n\n[![License](https://licensebuttons.net/l/by/3.0/88x31.png)](https://creativecommons.org/licenses/by/4.0/)\n\nRL4RS is a real-world deep reinforcement learning recommender system dataset for practitioners and researchers.\n\n```py\nimport gym\nfrom rl4rs.env.slate import SlateRecEnv, SlateState\n\nsim = SlateRecEnv(config, state_cls=SlateState)\nenv = gym.make('SlateRecEnv-v0', recsim=sim)\nfor i in range(epoch):\n    obs = env.reset()\n    for j in range(config[\"max_steps\"]):\n        action = env.offline_action\n        next_obs, reward, done, info = env.step(action)\n        if done[0]:\n            break\n```\nDataset Download(data only): https://zenodo.org/record/6622390#.YqBBpRNBxQK\n\nDataset Download(for reproduction): https://drive.google.com/file/d/1YbPtPyYrMvMGOuqD4oHvK0epDtEhEb9v/view?usp=sharing\n\nPaper: https://arxiv.org/pdf/2110.11073.pdf\n\n\u003c!--Paper_latest: https://openreview.net/pdf?id=euli0I5CKvy--\u003e\n\nAppendix: https://github.com/fuxiAIlab/RL4RS/blob/main/RL4RS_appendix.pdf\n\nKaggle Competition (old version): https://www.kaggle.com/c/bigdata2021-rl-recsys/overview\n\nResource Page: https://fuxi-up-research.gitbook.io/fuxi-up-challenges/\n\nTutorial: https://github.com/fuxiAIlab/RL4RS/blob/main/tutorial.ipynb\n\n## RL4RS News\n![new](/assets/new.gif) **04/20/2023**: SIGIR 2023 Resource Track, [Accept].\n\n**09/02/2022**: We release RL4RS [v1.1.0](https://github.com/fuxiAIlab/RL4RS/releases/tag/v1.1.0). 1) two additional RS datasets for comparison, Last.fm and CIKMCup2016; 2) two additional model-free baselines, TD3 and RAINBOW, and two additional model-based batch RL baselines, MOPO (Model-based Offline Policy Optimization) and COMBO(Conservative Offline Model-Based Policy Optimization). 3) BCQ and CQL support continuous action spaces. \n\n\u003c!--**08/28/2022**: NeurIPS 2022 Track Datasets and Benchmarks, [Under Review](https://openreview.net/forum?id=euli0I5CKvy).--\u003e\n\n**09/17/2022**: A hand-on Invited talk at [DRL4IR Workshop](https://drl4ir.github.io/), SIGIR2022.\n\n**12/17/2021**: Hosting [IEEE BigData2021 Cup Challenges](http://bigdataieee.org/BigData2021/BigDataCupChallenges.html), [Track I](https://www.kaggle.com/c/bigdata2021-rl-recsys/overview) for Supervised Learning and [Track II](https://fuxi-up-research.gitbook.io/fuxi-up-challenges/challenge/bigdatacup2021-rl4rs-challenge) for Reinforcement Learning.\n\n\n## key features\n\n### :star: Real-World Datasets\n- **two real-world datasets**: Besides the artificial datasets or semi-simulated datasets, RL4RS collects the raw logged data from one of the most popular games released by NetEase Game, which is naturally a sequential decision-making problem.\n- **data understanding tool**: RL4RS provides a data understanding tool for testing the proper use of RL on recommendation system datasets.\n- **advanced dataset setting**: RL4RS provides the separated data before and after reinforcement learning deployment for each dataset, which can simulate the difficulties to train a good RL policy from the dataset collected by SL-based algorithm.\n\n### :zap: Practical RL Baselines\n- **model-free RL**: RL4RS supports state-of-the-art RL libraries, such as RLlib and Tianshou. We provide the example codes of state-of-the-art model-free algorithms (A2C, PPO, etc.) implemented by RLlib library on both discrete and continue (combining policy gradients with a K-NN search) RL4RS environment.\n- **offline RL**: RL4RS implements offline RL algorithms including BC, BCQ and CQL through d3rlpy library. RL4RS is also the first to report the effectiveness of offline RL algorithms (BCQ and CQL) in RL-based RS domain.\n- **RL-based RS baselines**: RL4RS implements some algorithms proposed in the RL-based RS domain, including Exact-k and Adversarial User Model.\n- **offline RL evaluation**: In addition to the reward indicator and traditional RL evaluation setting (train and test on the same environment), RL4RS try to provide a complete evaluation framework by placing more emphasis on counterfactual policy evaluation.\n\n### :beginner: Easy-To-Use scaleable API\n- **low coupling structure**: RL4RS specifies a fixed data format to reduce code coupling. And the data-related logics are unified into data preprocessing scripts or user-defined state classes.\n- **file-based RL environment**: RL4RS implements a file-based gym environment, which enables random sampling and sequential access to datasets exceeding memory size. It is easy to extend it to distributed file systems.\n- **http-based vector Env**: RL4RS naturally supports Vector Env, that is, the environment processes batch data at one time. We further encapsulate the env through the HTTP interface, so that it can be deployed on multiple servers to accelerate the generation of samples.\n       \n## experimental features (welcome contributions!)\n- A new dataset for bundle recommendation with variable discounts, flexible recommendation trigger, and modifiable item content is in prepare.\n- Take raw feature rather than hidden layer embedding as observation input for offline RL\n- Model-based RL Algorithms \n- Reward-oriented simulation environment construction\n- reproduce more algorithms (RL models, safe exploration techniques, etc.) proposed in RL-based RS domain\n- Support Parametric-Action DQN, in which we input concatenated state-action pairs and output the Q-value for each pair.\n\n                                     \n\n## installation\nRL4RS supports Linux, at least 64 GB Mem !!\n\n### Github (recommended)\n```\n$ git clone https://github.com/fuxiAIlab/RL4RS\n$ export PYTHONPATH=$PYTHONPATH:`pwd`/rl4rs\n$ conda env create -f environment.yml\n$ conda activate rl4rs\n```\n\n### Dataset Download (Google Driver) \nDataset Download: https://drive.google.com/file/d/1YbPtPyYrMvMGOuqD4oHvK0epDtEhEb9v/view?usp=sharing\n\n```\n.\n|-- batchrl\n|   |-- BCQ_SeqSlateRecEnv-v0_b_all.h5\n|   |-- BCQ_SlateRecEnv-v0_a_all.h5\n|   |-- BC_SeqSlateRecEnv-v0_b_all.h5\n|   |-- BC_SlateRecEnv-v0_a_all.h5\n|   |-- CQL_SeqSlateRecEnv-v0_b_all.h5\n|   `-- CQL_SlateRecEnv-v0_a_all.h5\n|-- data_understanding_tool\n|   |-- dataset\n|   |   |-- ml-25m.zip\n|   |   `-- yoochoose-clicks.dat.zip\n|   `-- finetuned\n|       |-- movielens.csv\n|       |-- movielens.h5\n|       |-- recsys15.csv\n|       |-- recsys15.h5\n|       |-- rl4rs.csv\n|       `-- rl4rs.h5\n|-- exactk\n|   |-- exact_k.ckpt.10000.data-00000-of-00001\n|   |-- exact_k.ckpt.10000.index\n|   `-- exact_k.ckpt.10000.meta\n|-- ope\n|   `-- logged_policy.h5\n|-- raw_data\n|   |-- item_info.csv\n|   |-- rl4rs_dataset_a_rl.csv\n|   |-- rl4rs_dataset_a_sl.csv\n|   |-- rl4rs_dataset_b_rl.csv\n|   `-- rl4rs_dataset_b_sl.csv\n`-- simulator\n    |-- finetuned\n    |   |-- simulator_a_dien\n    |   |   |-- checkpoint\n    |   |   |-- model.data-00000-of-00001\n    |   |   |-- model.index\n    |   |   `-- model.meta\n    |   `-- simulator_b2_dien\n    |       |-- checkpoint\n    |       |-- model.data-00000-of-00001\n    |       |-- model.index\n    |       `-- model.meta\n    |-- rl4rs_dataset_a_shuf.csv\n    `-- rl4rs_dataset_b3_shuf.csv\n```\n\n## two ways to use this resource\n### Reinforcement Learning Only \n```\n# move simulator/*.csv to rl4rs/dataset\n# move simulator/finetuned/* to rl4rs/output\ncd reproductions/\n# run exact-k\nbash run_exact_k.sh\n# start http-based Env, then run RLlib library\nnohup python -u rl4rs/server/gymHttpServer.py \u0026\nbash run_modelfree_rl.sh DQN/PPO/DDPG/PG/PG_conti/etc.\n```\n\n### start from scratch (batch-rl, environment simulation, etc.)\n```\ncd reproductions/\n# first step, generate tfrecords for supervised learning (environment simulation) \n# is time-consuming, you can annotate them firstly.\nbash run_split.sh\n\n# environment simulation part (need tfrecord)\n# run these scripts to compare different SL methods\nbash run_supervised_item.sh dnn/widedeep/dien/lstm\nbash run_supervised_slate.sh dnn_slate/adversarial_slate/etc.\n# or you can directly train DIEN-based simulator as RL Env.\nbash run_simulator_train.sh dien\n\n# model-free part (need run_simulator_train.sh)\n# run exact-k\nbash run_exact_k.sh\n# start http-based Env, then run RLlib library\nnohup python -u rl4rs/server/gymHttpServer.py \u0026\nbash run_modelfree_rl.sh DQN/PPO/DDPG/PG/PG_conti/etc.\n\n# offline RL part (need run_simulator_train.sh)\n# generate offline dataset for offline RL first (dataset_generate stage)\n# generate offline dataset for offline RL first (train stage)\nbash run_batch_rl.sh BC/BCQ/CQL\n```\n\n## reported baselines \n| algorithm  | category | support mode |\n|:-|:-:|:-:|\n| [Wide\u0026Deep](https://dl.acm.org/doi/pdf/10.1145/2988450.2988454) | supervised learning | item-wise classification/slate-wise classification/item ranking |\n| [GRU4Rec](https://arxiv.org/pdf/1511.06939) | supervised learning | item-wise classification/slate-wise classification/item ranking |\n| [DIEN](https://www.researchgate.net/profile/Xiaoqiang-Zhu-7/publication/327591686_Deep_Interest_Evolution_Network_for_Click-Through_Rate_Prediction/links/5bc0398f458515a7a9e2a6db/Deep-Interest-Evolution-Network-for-Click-Through-Rate-Prediction.pdf) | supervised learning | item-wise classification/slate-wise classification/item ranking |\n| [Adversarial User Model](http://proceedings.mlr.press/v97/chen19f/chen19f.pdf) | supervised learning | item-wise classification/slate-wise classification/item ranking |\n| [Exact-K](https://arxiv.org/pdf/1905.07089.pdf) | model-free learning | discrete env \u0026 hidden state as observation |\n| [Policy Gredient (PG)](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf) | model-free RL | model-free learning | discrete/conti env \u0026 raw feature/hidden state as observation |\n| [Deep Q-Network (DQN)](https://www.nature.com/articles/nature14236) | model-free RL | discrete env \u0026 raw feature/hidden state as observation |\n| [Deep Deterministic Policy Gradients (DDPG)](https://arxiv.org/abs/1509.02971) | model-free RL | conti env \u0026 raw feature/hidden state as observation |\n| [Asynchronous Actor-Critic (A2C)](http://proceedings.mlr.press/v48/mniha16.pdf) | model-free RL | discrete/conti env \u0026 raw feature/hidden state as observation |\n| [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf) | model-free RL | discrete/conti env \u0026 raw feature/hidden state as observation |\n| Behavior Cloning | supervised learning/Offline RL | discrete env \u0026 hidden state as observation |\n| [Batch Constrained Q-learning (BCQ)](https://arxiv.org/abs/1812.02900) | Offline RL | discrete env \u0026 hidden state as observation |\n| [Conservative Q-Learning (CQL)](https://arxiv.org/abs/2006.04779) | Offline RL | discrete env \u0026 hidden state as observation |\n\n## supported algorithms (from RLlib and d3rlpy)\n| algorithm | discrete control | continuous control | offline RL? |\n|:-|:-:|:-:|:-:|\n| Behavior Cloning (supervised learning) | :white_check_mark: | :white_check_mark: | |\n| [Deep Q-Network (DQN)](https://www.nature.com/articles/nature14236) | :white_check_mark: | :no_entry: | |\n| [Double DQN](https://arxiv.org/abs/1509.06461) | :white_check_mark: | :no_entry: | |\n| [Rainbow](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/17204/16680) | :white_check_mark: | :no_entry: | |\n| [PPO](https://arxiv.org/pdf/1707.06347.pdf) | :white_check_mark: | :white_check_mark: | |\n| [A2C A3C](http://proceedings.mlr.press/v48/mniha16.pdf) | :white_check_mark: | :white_check_mark: | |\n| [IMPALA](https://arxiv.org/pdf/1802.01561.pdf) | :white_check_mark: | :white_check_mark: | |\n| [Deep Deterministic Policy Gradients (DDPG)](https://arxiv.org/abs/1509.02971) | :no_entry: | :white_check_mark: | |\n| [Twin Delayed Deep Deterministic Policy Gradients (TD3)](https://arxiv.org/abs/1802.09477) | :no_entry: | :white_check_mark: | |\n| [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1812.05905) | :white_check_mark: | :white_check_mark: | |\n| [Batch Constrained Q-learning (BCQ)](https://arxiv.org/abs/1812.02900) | :white_check_mark: | :white_check_mark: | :white_check_mark: |\n| [Bootstrapping Error Accumulation Reduction (BEAR)](https://arxiv.org/abs/1906.00949) | :no_entry: | :white_check_mark: | :white_check_mark: |\n| [Advantage-Weighted Regression (AWR)](https://arxiv.org/abs/1910.00177) | :white_check_mark: | :white_check_mark: | :white_check_mark: |\n| [Conservative Q-Learning (CQL)](https://arxiv.org/abs/2006.04779) | :white_check_mark: | :white_check_mark: | :white_check_mark: |\n| [Advantage Weighted Actor-Critic (AWAC)](https://arxiv.org/abs/2006.09359) | :no_entry: | :white_check_mark: | :white_check_mark: |\n| [Critic Reguralized Regression (CRR)](https://arxiv.org/abs/2006.15134) | :no_entry: | :white_check_mark: | :white_check_mark: |\n| [Policy in Latent Action Space (PLAS)](https://arxiv.org/abs/2011.07213) | :no_entry: | :white_check_mark: | :white_check_mark: |\n| [TD3+BC](https://arxiv.org/abs/2106.06860) | :no_entry: | :white_check_mark: | :white_check_mark: |\n\n\n## examples\nSee script/ and reproductions/.\n\nRLlib examples: https://docs.ray.io/en/latest/rllib-examples.html\n\nd3rlpy examples: https://d3rlpy.readthedocs.io/en/v1.0.0/\n\n## reproductions\nSee reproductions/.\n```bash\nbash run_xx.sh ${param}\n```\n| experiment in the paper  | shell script | optional param. | description | \n|:-|:-:|:-:|:-:|\n| Sec.3 | run_split.sh  | - | dataset split/shuffle/align(for datasetB)/to tfrecord |\n| Sec.4 | run_mdp_checker.sh | recsys15/movielens/rl4rs | unzip ml-25m.zip and yoochoose-clicks.dat.zip into dataset/ |\n| Sec.5.1 | run_supervised_item.sh | dnn/widedeep/lstm/dien | Table 5. Item-wise classification |\n| Sec.5.1 | run_supervised_slate.sh | dnn_slate/widedeep_slate/lstm_slate/dien_slate/adversarial_slate | Table 5. Item-wise rank |\n| Sec.5.1 | run_supervised_slate.sh | dnn_slate_multiclass/widedeep_slate_multiclass/lstm_slate_multiclass/dien_slate_multiclass | Table 5. Slate-wise classification |\n| Sec.5.1 \u0026 Sec.6 | run_simulator_train.sh | dien | dien-based simulator for different trainsets |\n| Sec.5.1 \u0026 Sec.6 | run_simulator_eval.sh | dien | Table 6. |\n| Sec.5.1 \u0026 Sec.6 | run_modelfree_rl.sh | PG/DQN/A2C/PPO/IMPALA/DDPG/*_conti | Table 7. |\n| Sec.5.2 \u0026 Sec.6 | run_batch_rl.sh | BC/BCQ/CQL | Table 8. |\n| Sec.5.1 | run_exact_k.sh | - | Exact-k |\n| - | run_simulator_env_test.sh | - | examining the consistency of features (observations) between RL env and supervised simulator |\n\n\n## contributions\nAny kind of contribution to RL4RS would be highly appreciated!\nPlease contact us by email.\n\n## community\n| Channel | Link |\n|:-|:-|\n| Materials | [Google Drive](https://drive.google.com/file/d/1YbPtPyYrMvMGOuqD4oHvK0epDtEhEb9v/view?usp=sharing) |\n| Email | [Mail](asdqsczser@gmail.com) |\n| Issues | [GitHub Issues](https://github.com/fuxiAIlab/RL4RS/issues) |\n| Fuxi Team | [Fuxi HomePage](https://fuxi.163.com/en/) |\n| Our Team | [Open-project](https://fuxi-up-research.gitbook.io/open-project/) |\n\n## citation\n```\n@article{2021RL4RS,\ntitle={RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System},\nauthor={ Kai Wang and Zhene Zou and Yue Shang and Qilin Deng and Minghao Zhao and Runze Wu and Xudong Shen and Tangjie Lyu and Changjie Fan},\njournal={ArXiv},\nyear={2021},\nvolume={abs/2110.11073}\n}\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FfuxiAIlab%2FRL4RS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FfuxiAIlab%2FRL4RS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FfuxiAIlab%2FRL4RS/lists"}