Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/opendilab/awesome-model-based-RL
A curated list of awesome model based RL resources (continually updated)
https://github.com/opendilab/awesome-model-based-RL
List: awesome-model-based-RL
awesome awesome-list model-based-reinforcement-learning model-based-rl reinforcement-learning reinforcement-learning-algorithms
Last synced: 16 days ago
JSON representation
A curated list of awesome model based RL resources (continually updated)
- Host: GitHub
- URL: https://github.com/opendilab/awesome-model-based-RL
- Owner: opendilab
- License: apache-2.0
- Created: 2021-12-28T06:34:00.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-20T10:01:29.000Z (7 months ago)
- Last Synced: 2024-05-23T08:03:34.463Z (7 months ago)
- Topics: awesome, awesome-list, model-based-reinforcement-learning, model-based-rl, reinforcement-learning, reinforcement-learning-algorithms
- Homepage:
- Size: 131 KB
- Stars: 735
- Watchers: 34
- Forks: 39
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - awesome-model-based-RL - A curated list of awesome model based RL resources (continually updated). (Other Lists / Monkey C Lists)
README
# Awesome Model-Based Reinforcement Learning
[![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) [![docs](https://img.shields.io/badge/docs-latest-blue)](https://github.com/opendilab/awesome-model-based-RL) ![GitHub stars](https://img.shields.io/github/stars/opendilab/awesome-model-based-RL?color=yellow) ![GitHub forks](https://img.shields.io/github/forks/opendilab/awesome-model-based-RL?color=9cf) [![GitHub license](https://img.shields.io/github/license/opendilab/awesome-model-based-RL)](https://github.com/opendilab/awesome-model-based-RL/blob/main/LICENSE)
This is a collection of research papers for **model-based reinforcement learning (mbrl)**.
And the repository will be continuously updated to track the frontier of model-based rl.Welcome to follow and star!
[2024.10.27] New: We update the NeurIPS 2024 paper list of model-based rl![2024.05.20] We update the ICML 2024 paper list of model-based rl.
[2023.11.29] We update the ICLR 2024 paper list of model-based rl.
[2023.09.29] We update the NeurIPS 2023 paper list of model-based rl.
[2023.06.15] We update the ICML 2023 paper list of model-based rl.
[2023.02.05] We update the ICLR 2023 paper list of model-based rl.
[2022.11.03] We update the NeurIPS 2022 paper list of model-based rl.
[2022.07.06] We update the ICML 2022 paper list of model-based rl.
[2022.02.13] We update the ICLR 2022 paper list of model-based rl.
[2021.12.28] We release the awesome model-based rl.
## Table of Contents
- [Awesome Model-Based Reinforcement Learning](#awesome-model-based-reinforcement-learning)
- [Table of Contents](#table-of-contents)
- [A Taxonomy of Model-Based RL Algorithms](#a-taxonomy-of-model-based-rl-algorithms)
- [Papers](#papers)
- [Classic Model-Based RL Papers](#classic-model-based-rl-papers)
- [NeurIPS 2024đ„](#neurips-2024)
- [ICML 2024](#icml-2024)
- [ICLR 2024](#iclr-2024)
- [NeurIPS 2023](#neurips-2023)
- [ICML 2023](#icml-2023)
- [ICLR 2023](#iclr-2023)
- [NeurIPS 2022](#neurips-2022)
- [ICML 2022](#icml-2022)
- [ICLR 2022](#iclr-2022)
- [NeurIPS 2021](#neurips-2021)
- [ICLR 2021](#iclr-2021)
- [ICML 2021](#icml-2021)
- [Other](#other)
- [Tutorial](#tutorial)
- [Codebase](#codebase)
- [Contributing](#contributing)
- [License](#license)## A Taxonomy of Model-Based RL Algorithms
Weâll start this section with a disclaimer: itâs really quite hard to draw an accurate, all-encompassing taxonomy of algorithms in the Model-Based RL space, because the modularity of algorithms is not well-represented by a tree structure. So we will publish a series of related blogs to explain more Model-Based RL algorithms.
A non-exhaustive, but useful taxonomy of algorithms in modern Model-Based RL.We simply divide `Model-Based RL` into two categories: `Learn the Model` and `Given the Model`.
- `Learn the Model` mainly focuses on how to build the environment model.
- `Given the Model` cares about how to utilize the learned model.
And we give some examples as shown in the figure above. There are links to algorithms in taxonomy.
>[1] [World Models](https://worldmodels.github.io/): Ha and Schmidhuber, 2018
[2] [I2A](https://arxiv.org/abs/1707.06203) (Imagination-Augmented Agents): Weber et al, 2017
[3] [MBMF](https://sites.google.com/view/mbmf) (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
[4] [MBVE](https://arxiv.org/abs/1803.00101) (Model-Based Value Expansion): Feinberg et al, 2018
[5] [ExIt](https://arxiv.org/abs/1705.08439) (Expert Iteration): Anthony et al, 2017
[6] [AlphaZero](https://arxiv.org/abs/1712.01815): Silver et al, 2017
[7] [POPLIN](https://openreview.net/forum?id=H1exf64KwH) (Model-Based Policy Planning): Wang et al, 2019
[8] [M2AC](https://arxiv.org/abs/2010.04893) (Masked Model-based Actor-Critic): Pan et al, 2020## Papers
```
format:
- [title](paper link) [links]
- author1, author2, and author3
- Key: key problems and insights
- OpenReview: optional
- ExpEnv: experiment environments
```### Classic Model-Based RL Papers
Toggle
- [Dyna, an integrated architecture for learning, planning, and reacting](https://dl.acm.org/doi/10.1145/122344.122377)
- Richard S. Sutton. *ACM 1991*
- Key: dyna architecture
- ExpEnv: None- [PILCO: A Model-Based and Data-Efficient Approach to Policy Search](https://www.researchgate.net/publication/221345233_PILCO_A_Model-Based_and_Data-Efficient_Approach_to_Policy_Search)
- Marc Peter Deisenroth, Carl Edward Rasmussen. *ICML 2011*
- Key: probabilistic dynamics model
- ExpEnv: cart-pole system, robotic unicycle- [Learning Complex Neural Network Policies with Trajectory Optimization](https://proceedings.mlr.press/v32/levine14.html)
- Sergey Levine, Vladlen Koltun. *ICML 2014*
- Key: guided policy search
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Learning Continuous Control Policies by Stochastic Value Gradients](https://arxiv.org/abs/1510.09142)
- Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez. *NIPS 2015*
- Key: backpropagation through paths, gradient on real trajectory
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Value Prediction Network](https://arxiv.org/abs/1707.03497)
- Junhyuk Oh, Satinder Singh, Honglak Lee. *NIPS 2017*
- Key: value-prediction model
- ExpEnv: collect domain, [atari](https://github.com/openai/gym)- [Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion](https://arxiv.org/abs/1807.01675)
- Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee. *NIPS 2018*
- Key: ensemble model and Qnet, value expansion
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [roboschool](https://github.com/openai/roboschool)- [Recurrent World Models Facilitate Policy Evolution](https://arxiv.org/abs/1809.01999)
- David Ha, JĂŒrgen Schmidhuber. *NIPS 2018*
- Key: vae(representation), rnn(predictive model)
- ExpEnv: [car racing](https://github.com/openai/gym), [vizdoom](https://github.com/mwydmuch/ViZDoom)- [Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models](https://arxiv.org/abs/1805.12114)
- Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine. *NIPS 2018*
- Key: probabilistic ensembles with trajectory sampling
- ExpEnv: [cartpole](https://github.com/openai/gym), [mujoco](https://github.com/openai/mujoco-py)- [When to Trust Your Model: Model-Based Policy Optimization](https://arxiv.org/abs/1906.08253)
- Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine. *NeurIPS 2019*
- Key: ensemble model, sac, *k*-branched rollout
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees](https://arxiv.org/abs/1807.03858)
- Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma. *ICLR 2019*
- Key: Discrepancy Bounds Design, ME-TRPO with multi-step, Entropy regularization
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Model-Ensemble Trust-Region Policy Optimization](https://openreview.net/forum?id=SJJinbWRZ)
- Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel. *ICLR 2018*
- Key: ensemble model, TRPO
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Dream to Control: Learning Behaviors by Latent Imagination](https://arxiv.org/abs/1912.01603)
- Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi. *ICLR 2019*
- Key: DreamerV1, latent space imagination
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [atari](https://github.com/openai/gym), [deepmind lab](https://github.com/deepmind/lab)- [Exploring Model-based Planning with Policy Networks](https://openreview.net/forum?id=H1exf64KwH)
- Tingwu Wang, Jimmy Ba. *ICLR 2020*
- Key: model-based policy planning in action space and parameter space
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://arxiv.org/abs/1911.08265)
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver. *Nature 2020*
- Key: MCTS, value equivalence
- ExpEnv: chess, shogi, go, [atari](https://github.com/openai/gym)### NeurIPS 2024
Toggle
- [The Surprising Ineffectiveness of Pre-Trained Visual Representations for Model-Based Reinforcement Learning](https://openreview.net/pdf?id=LvAy07mCxU)
- Moritz Schneider, Robert Krug, Narunas Vaskevicius, Luigi Palmieri, Joschka Boedecker
- Key: reinforcement learning, rl, model-based reinforcement learning, representation learning, pvr, visual representations
- ExpEnv: DMC, ManiSkill2, Miniworld- [Multi-Agent Domain Calibration with a Handful of Offline Data](https://openreview.net/pdf?id=LvAy07mCxU)
- Tao Jiang, Lei Yuan, Lihe Li, Cong Guan, Zongzhang Zhang, Yang Yu
- Key: Multi-agent reinforcement learning, domain transfer
- ExpEnv: D4RL- [WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment](https://arxiv.org/abs/2402.12275)
- Hao Tang, Darren Key, Kevin Ellis
- Key: learn world models as code, LLM
- ExpEnv: [sokoban](https://github.com/mpSchrader/gym-sokoban), [minigrid](https://github.com/Farama-Foundation/Minigrid), [alfworld](https://github.com/alfworld/alfworld)- [The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning](https://arxiv.org/abs/2402.12527)
- Anya Sims, Cong Lu, Jakob Foerster, Yee Whye Teh
- Key: edge-of-reach problem, reach-aware value learning
- ExpEnv: [d4rl](https://github.com/Farama-Foundation/D4RL), [v-r4rl](https://github.com/conglu1997/v-d4rl)- [Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning](https://arxiv.org/abs/2406.04088)
- Abdullah AkgĂŒl, Manuel Haussmann, Melih Kandemir
- Key: The paper argues that uncertainty-based reward penalization introduces excessive conservatism, potentially resulting in suboptimal policies through underestimation.
- ExpEnv: [d4rl](https://github.com/Farama-Foundation/D4RL)- [BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning](https://arxiv.org/abs/2407.10967)
- Haohong Lin, Wenhao Ding, Jian Chen, Laixi Shi, Jiacheng Zhu, Bo Li, DING ZHAO
- Key: objective mismatch problem, capture causal representation for both states and actions
- ExpEnv: [list](https://github.com/ARISE-Initiative/robosuite), [unlock](https://github.com/Farama-Foundation/Minigrid), [crash](https://github.com/Farama-Foundation/HighwayEnv)- [Model-Based Transfer Learning for Contextual Reinforcement Learning](https://arxiv.org/abs/2408.04498)
- Jung-Hoon Cho, Vindula Jayawardana, Sirui Li, Cathy Wu
- Key: bayesian optimization, contextual rl
- ExpEnv: [gaussian process, traffic signal, eco-driving, advisory autonomy, control tasks]()- [Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity](https://arxiv.org/abs/2312.17248)
- Guhao Feng, Han Zhong
- Key: rl representation complexity
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)### ICML 2024
Toggle
- [HarmonyDream: Task Harmonization Inside World Models](https://arxiv.org/abs/2310.00344)
- Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long
- Key: observation modeling and reward modeling analysis in world models
- ExpEnv: [meta-world](https://github.com/Farama-Foundation/Metaworld), [rlbench](https://github.com/stepjam/RLBench), [deepmind control suite](https://github.com/deepmind/dm_control), [atari 100k](https://github.com/openai/gym)- [3D-VLA: A 3D Vision-Language-Action Generative World Model](https://arxiv.org/abs/2403.09631)
- Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan
- Key: unify 3D perception, reasoning, and action with a generative world model; create a large-scale 3D embodied instruction tuning dataset
- ExpEnv: [rlbench](https://github.com/stepjam/RLBench), [calvin](https://github.com/mees/calvin)- [CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents](https://arxiv.org/abs/2310.17512)
- Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie
- Key: propose a competitive framework for LLM-based agents; build a simulated competitive environment
- ExpEnv: a virtual town with only restaurants and customers- [Model-based Reinforcement Learning for Parameterized Action Spaces](https://arxiv.org/abs/2404.03037)
- Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris
- Key: discrete-continuous hybrid action space, dynamics model with parameterized actions, MPC with parameterized actions
- ExpEnv: [platform, goal, hard goal, catch point, hard move](https://github.com/Valarzz/Model-based-Reinforcement-Learning-for-Parameterized-Action-Spaces/tree/main/common)- [Learning Latent Dynamic Robust Representations for World Models](https://arxiv.org/abs/2405.06263)
- Ruixiang Sun, Hongyu Zang, Xin Li, Riashat Islam
- Key: modified Dreamer architecture, hybrid-recurrent state space model
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [distracted deepmind control suite](https://github.com/bit1029public/HRSSM/tree/main/env), [mani-skill2](https://github.com/haosulab/ManiSkill2)- [AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors](https://arxiv.org/abs/2403.09976)
- Yucen Wang, Shenghua Wan, Le Gan, Shuai Feng, De-Chuan Zhan
- Key: implicit action generator, action-conditioned separated world models
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [Hieros: Hierarchical Imagination on Structured State Space Sequence World Models](https://arxiv.org/abs/2310.05167)
- Paul Mattes, Rainer Schlosser, Ralf Herbrich
- Key: state-space models, multilayered hierarchical imagination, S5 based world model
- ExpEnv: [atari 100k](https://github.com/openai/gym)- [Improving Token-Based World Models with Parallel Observation Prediction](https://arxiv.org/abs/2402.05643)
- Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
- Key: pixel-based mbrl, token-based world models, retentive environment model
- ExpEnv: [atari 100k](https://github.com/openai/gym)- [Do Transformer World Models Give Better Policy Gradients?](https://arxiv.org/abs/2402.05290)
- Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon
- Key: actions world model
- ExpEnv: [double-pendulum](https://github.com/openai/gym), [Myriad](https://github.com/nikihowe/myriad)- [Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming](https://arxiv.org/abs/2402.18866)
- Hany Hamed, Subin Kim, Dongyeong Kim, Jaesik Yoon, Sungjin Ahn
- Key: during strategeic dreaming, train three policies -- highway policy, explorer policy and achiever policy, and then achieve downstream tasks
- ExpEnv: 2D Navigation, 3D-Maze Navigation, RoboKitchen- [Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption](https://arxiv.org/abs/2402.08991)
- Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang
- Key: theoretical analysis of adversarial corruption for model-based rl, encompassing both online and offline settings
- ExpEnv: None- [Model-based Reinforcement Learning for Confounded POMDPs](https://proceedings.mlr.press/v235/hong24d.html)
- Mao Hong, Zhengling Qi, Yanxun Xu
- Key: model-based RL, POMDP
- ExpEnv: None### ICLR 2024
Toggle
- [Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning](https://openreview.net/forum?id=m3xVPaZp6Z)
- Chengxing Jia, Chenxiao Gao, Hao Yin, Fuxiang Zhang, Xiong-Hui Chen, Tian Xu, Lei Yuan, Zongzhang Zhang, Zhi-Hua Zhou, Yang Yu
- Key: Reinforcement Learning, Model-based Reinforcement Learning, Offline Reinforcement Learning
- OpenReview: 8, 8, 8, 6
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)- [Efficient Dynamics Modeling in Interactive Environments with Koopman Theory](https://openreview.net/forum?id=fkrYDQaHOJ)
- Arnab Kumar Mondal, Siba Smarak Panigrahi, Sai Rajeswar, Kaleem Siddiqi, Siamak Ravanbakhsh
- Key: Koopman Theory, Reinforcement Learning, Dynamical System, Planning, Longe range dynamics prediction models, Efficient forward dynamics
- OpenReview: 8, 6, 5, 3
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Combining Spatial and Temporal Abstraction in Planning for Better Generalization](https://openreview.net/forum?id=eo9dHwtTFt)
- Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio
- Key: Reinforcement Learning, Planning, Neural Networks, Temporal Difference Learning, Generalization, Deep Reinforcement Learning
- OpenReview: 6, 6, 6, 5
- ExpEnv: [MiniGrid-BabyAI framework](https://github.com/maximecb/gym-minigrid)- [Mastering Memory Tasks with World Models](https://openreview.net/forum?id=1vDArHJ68h)
- Mohammad Reza Samsami, Artem Zholus, Janarthanan Rajendran, Sarath Chandar
- Key: recall to imagine module, based on DreamerV3
- OpenReview: 10, 8, 6
- ExpEnv: [bsuite](https://github.com/google-deepmind/bsuite), [popgym](https://github.com/proroklab/popgym), [atari](https://github.com/openai/gym), [deepmind control suite](https://github.com/deepmind/dm_control), [memory maze](https://github.com/jurgisp/memory-maze)- [Privileged Sensing Scaffolds Reinforcement Learning](https://openreview.net/forum?id=EpVe8jAjdx)
- Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman
- Key: privileged information, based on DreamerV3
- OpenReview: 10, 8, 8, 8
- ExpEnv: [gymnasium robotics](https://github.com/Farama-Foundation/Gymnasium-Robotics)
- [TD-MPC2: Scalable, Robust World Models for Continuous Control](https://openreview.net/forum?id=Oxh5CstDJU)
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: implicit world model, model predictive control, generalist td-mpc2
- OpenReview: 8, 8, 8, 8
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [Meta-World](https://github.com/Farama-Foundation/Metaworld), [maniskill2](https://github.com/haosulab/ManiSkill2), [myosuite](https://github.com/MyoHub/myosuite)- [Robust Model Based Reinforcement Learning Using L1 Adaptive Control](https://openreview.net/forum?id=GaLCLvJaoF)
- Minjun Sung, Sambhu Harimanas Karumanchi, Aditya Gahlawat, Naira Hovakimyan
- Key: L1 Adaptive Control
- OpenReview: 8, 6, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Learning Hierarchical World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics](https://openreview.net/forum?id=TjCDNssXKU)
- Christian Gumbsch, Noor Sajid, Georg Martius, Martin V. Butz
- Key: Context-specific Recurrent State Space Model, hierarchical world model
- OpenReview: 8, 6, 6
- ExpEnv: [MiniHack](https://github.com/facebookresearch/minihack), [VisualPinPad](https://github.com/danijar/director/blob/main/embodied/envs/pinpad.py), [MultiWorld](https://github.com/vitchyr/multiworld)- [Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion](https://arxiv.org/abs/2311.01017)
- Lunjun Zhang, Yuwen Xiong, Ze Yang, Sergio Casas, Rui Hu, Raquel Urtasun
- Key: discrete diffusion; world model; autonomous driving
- OpenReview: 10, 8, 6, 6, 6
- ExpEnv: [NuScenes](https://www.nuscenes.org/), [KITTI Odometry](https://www.cvlibs.net/datasets/kitti/eval_odometry.php), [Argoverse2 Lidar](https://www.argoverse.org/av2.html)- [COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL](https://openreview.net/forum?id=jnFcKjtUPN)
- Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang
- Key: conservative model rollouts, optimistic environment exploration
- OpenReview: 6, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [deepmind control suite](https://github.com/deepmind/dm_control)- [Efficient Multi-agent Reinforcement Learning by Planning](https://openreview.net/forum?id=CpnKq3UJwp)
- Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang
- Key: mcts, optimistic search lambda, advantage-weighted policy optimization
- OpenReview: 8, 6, 6, 6
- ExpEnv: [smac](https://github.com/oxwhirl/smac)- [Differentiable Trajectory Optimization as a Policy Class for Reinforcement and Imitation Learning](https://openreview.net/forum?id=HL5P4H8eO2)
- Weikang Wan, Yufei Wang, Zackory Erickson, David Held
- Key: differentiable trajectory optimization
- OpenReview: 10, 8, 8, 5
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [robomimic](https://github.com/ARISE-Initiative/robomimic), [maniskill](https://github.com/haosulab/ManiSkill2)- [DMBP: Diffusion model based predictor for robust offline reinforcement learning against state observation perturbations](https://openreview.net/forum?id=ZULjcYLWKe)
- Zhihe YANG, Yunjian Xu
- Key: conditional diffusion, offline RL
- OpenReview: 8, 8, 6, 6
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)- [MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning](https://openreview.net/forum?id=1RE0H6mU7M)
- Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar
- Key: context-based meta-RL, based on dreamer
- OpenReview: 6, 6, 6, 6
- ExpEnv: [Point Robot Navigation, Escape Room](https://github.com/Rondorf/BOReL/blob/main/environments/toy_navigation/point_robot.py), [Reacher Sparse](https://github.com/deepmind/dm_control)- [Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning](https://openreview.net/forum?id=GSBHKiw19c)
- Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
- Key: reward learning, offline RL
- OpenReview: 8, 6, 6, 6
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl), [NeoRL](https://github.com/polixir/NeoRL)- [DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing](https://openreview.net/forum?id=GruDNzQ4ux)
- Vint Lee, Pieter Abbeel, Youngwoon Lee
- Key: learn to predict a temporally-smoothed reward rather than the exact reward at each timestep
- OpenReview: 6, 6, 6, 5
- ExpEnv: [robodesk](https://github.com/google-research/robodesk), [hand](https://github.com/openai/gym), [earthmoving](https://www.algoryx.se/agx-dynamics/)- [Informed POMDP: Leveraging Additional Information in Model-Based RL](https://openreview.net/forum?id=5NJzNAXAmx)
- Gaspard Lambrechts, Adrien Bolland, Damien Ernst
- Key: informed world model, based on DreamerV3
- OpenReview: 6, 6, 6, 5
- ExpEnv: [varying mountain hike](https://github.com/maximilianigl/DVRL/tree/master), [deepmind control suite](https://github.com/deepmind/dm_control), [pop gym](https://github.com/proroklab/popgym), [flickering atari and flickering control](https://github.com/openai/gym)### NeurIPS 2023
Toggle
- [Large Language Models as Commonsense Knowledge for Large-Scale Task Planning](https://proceedings.neurips.cc/paper_files/paper/2023/hash/65a39213d7d0e1eb5d192aa77e77eeb7-Abstract-Conference.html)
- Zirui Zhao, Wee Sun Lee, David Hsu
- Key: LLM-MCTS
- ExpEnv: [VirtualHome]()- [Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents](https://proceedings.neurips.cc/paper_files/paper/2023/file/6b8dfb8c0c12e6fafc6c256cb08a5ca7-Paper-Conference.pdf)
- Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian (Shawn) Ma, Yitao Liang
- Key: interactive planning approach based on LLM
- ExpEnv: [minecraft](https://github.com/minerllabs/minerl)- [Facing Off World Model Backbones: RNNs, Transformers, and S4](https://proceedings.neurips.cc/paper_files/paper/2023/file/e6c65eb9b56719c1aa45ff73874de317-Paper-Conference.pdf)
- Fei Deng, Junyeong Park, Sungjin Ahn
- Key: world model backbones
- ExpEnv: [MiniGrid](https://github.com/maximecb/gym-minigrid), [memory maze](https://github.com/jurgisp/memory-maze)- [Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning](https://proceedings.neurips.cc/paper_files/paper/2023/file/7ce1cbededb4b0d6202847ac1b484ee8-Paper-Conference.pdf)
- Jialong Wu, Haoyu Ma, Chaoyi Deng, Mingsheng Long
- Key: Contextualized World Models
- ExpEnv: [CARLA](https://github.com/wayveai/mile/tree/main/carla_gym), [deepmind control suite](https://github.com/deepmind/dm_control)- [Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model](https://proceedings.neurips.cc/paper_files/paper/2023/file/fe318a2b6c699808019a456b706cd845-Paper-Conference.pdf)
- Jiankai Sun, Yiqi Jiang, Jianing Qiu, Parth Nobel, Mykel J Kochenderfer, Mac Schwager
- Key: Diffusion Dynamics Model
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl), [Maze2D](https://github.com/Farama-Foundation/D4RL/tree/master/d4rl)- [LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios](https://openreview.net/forum?id=oIUXpBnyjv)
- Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
- Key: MCTS-style benchmark
- ExpEnv: [board games](https://github.com/opendilab/LightZero/tree/main/zoo/board_games), [atari](https://github.com/openai/gym), [mujoco](https://github.com/openai/mujoco-py), [gobigger](https://github.com/opendilab/GoBigger)- [Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning](https://openreview.net/forum?id=fAdMly4ki5)
- Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
- Key: GPT-based diffusion model for planning and data synthesizing
- ExpEnv: [Meta-World](https://github.com/Farama-Foundation/Metaworld), [Maze2D](https://github.com/Farama-Foundation/D4RL/tree/master/d4rl)- [MoVie: Visual Model-Based Policy Adaptation for View Generalization](https://openreview.net/forum?id=YV1MYtj2AR)
- Sizhe Yang, Yanjie Ze, Huazhe Xu
- Key: view generalization, spatial adaptive encoder
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [adroit](https://github.com/aravindr93/mjrl), [xArm](https://github.com/yangsizhe/MoVie/tree/main/src/envs/xarm_env)- [Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms](https://openreview.net/forum?id=bUgqyyNo8j)
- Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao
- Key: model-based reparameterization policy gradient method, smoothness regularization
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning](https://openreview.net/forum?id=zDbsSscmuj)
- Lin Guan, Karthik Valmeekam, Sarath Sreedharan, Subbarao Kambhampati
- Key: construct an explicit world (domain) model in planning domain definition language
- ExpEnv: [household-robot domain](), [tyreworld and logistics]()- [RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability](https://openreview.net/forum?id=OIJ3VXDy6s)
- Chuning Zhu, Max Simchowitz, Siri Gadipudi, Abhishek Gupta
- Key: representation resilience for visual RL
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [maniskill](https://github.com/haosulab/ManiSkill2)- [Model-Based Control with Sparse Neural Dynamics](https://openreview.net/forum?id=ymBG2xs9Zf)
- Ziang Liu, Jeff He, Genggeng Zhou, Tobia Marcucci, Fei-Fei Li, Jiajun Wu, Yunzhu Li
- Key: network sparsification, mixed-integer formulation of ReLU neural dynamics
- ExpEnv: [gym, cartpole, reacher](https://github.com/openai/gym)- [Optimal Exploration for Model-Based RL in Nonlinear Systems](https://openreview.net/forum?id=pJQu0zpKCS)
- Andrew Wagenmaker, Guanya Shi, Kevin Jamieson
- Key: optimal sample complexity for nonlinear dynamical systems
- ExpEnv: [affine dynamics system](https://github.com/ajwagen/nonlinear_sysid_for_control/blob/main/environments.py)- [State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding](https://openreview.net/forum?id=xGz0wAIJrS)
- Devleena Das, Sonia Chernova, Been Kim
- Key: a joint embedding model between state-action pairs and concept-based explanations
- ExpEnv: [connect4](), [lunar lander](https://github.com/openai/gym)- [Efficient Exploration in Continuous-time Model-based Reinforcement Learning](https://openreview.net/forum?id=VkhvDfY2dB)
- Lenart Treven, Jonas HĂŒbotter, Bhavya, Florian Dorfler, Andreas Krause
- Key: nonlinear ordinary differential equations, regret bound, measurement selection strategies
- ExpEnv: [systemâs tasks]()- [Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models](https://openreview.net/forum?id=WjlCQxpuxU)
- Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian Karl
- Key: pretrained world models, imitation learning from observation only
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning](https://openreview.net/forum?id=WxnrX42rnS)
- Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, Gao Huang
- Key: categorical-VAE, transformer structure, DreamerV3
- ExpEnv: [atari](https://github.com/openai/gym)### ICML 2023
Toggle
- [Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels](https://arxiv.org/abs/2209.12016)
- Sai Rajeswar Mudumba, Pietro Mazzaglia, Tim Verbelen, Alexandre Piche, Bart Dhoedt, Aaron Courville, Alexandre Lacoste
- Key: unsupervised pretrain, task-aware finetune, dyna-mpc
- ExpEnv: [URLB benchmark](https://github.com/rll-research/url_benchmark), [RWRL suite](https://github.com/google-research/realworldrl_suite)- [Reparameterized Policy Learning for Multimodal Trajectory Optimization](https://openreview.net/forum?id=5Akrk9Ln6N)
- Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
- Key: multimodal policy learning, reparameterized policy gradient
- ExpEnv: [Meta-World](https://github.com/Farama-Foundation/Metaworld), [mujoco](https://github.com/openai/mujoco-py)- [Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy](https://arxiv.org/abs/2207.12141)
- Xiyao Wang, Wichayaporn Wongkamjan, Ruonan Jia, Furong Huang
- Key: policy-adapted model learning, weight design
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Predictable MDP Abstraction for Unsupervised Model-Based RL](https://arxiv.org/abs/2302.03921)
- Seohong Park, Sergey Levine
- Key: predictable MDP abstraction, tackle model exploitation
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Investigating the Role of Model-Based Learning in Exploration and Transfer](https://arxiv.org/abs/2302.04009)
- Jacob C Walker, Eszter VĂ©rtes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Jessica Hamrick, Theophane Weber
- Key Insights: (1) Is there an advantage to an agent being model-based during unsupervised exploration and/or fine-tuning? (2) What are the contributions of each component of a model-based agent for downstream task learning? (3) How well does the model-based agent deal with environmental shift between the unsupervised and downstream phases?
- ExpEnv: [Crafter](https://github.com/danijar/crafter), [RoboDesk](https://github.com/google-research/robodesk), [Meta-World](https://github.com/Farama-Foundation/Metaworld)- [The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms](https://arxiv.org/abs/2303.00694)
- Anirudh Vemula, Yuda Song, Aarti Singh, J. Bagnell, Sanjiban Choudhury
- Key: objective mismatch, mbrl framework
- ExpEnv: [Helicopter, WideTree, Linear Dynamical System, Maze](https://github.com/vvanirudh/LAMPS-MBRL/tree/master), [mujoco](https://github.com/openai/mujoco-py)- [The Benefits of Model-Based Generalization in Reinforcement Learning](https://arxiv.org/abs/2211.02222)
- Kenny Young, Aditya Ramesh, Louis Kirsch, JĂŒrgen Schmidhuber
- Key: experience replay, when and how learned model generalization
- ExpEnv: [ProcMaze, ButtonGrid, PanFlute](https://github.com/kenjyoung/Model_Generalization_Code_supplement/blob/main/environments.py)- [STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning](https://arxiv.org/abs/2301.12038)
- Souradip Chakraborty, Amrit Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha
- Key: information directed sampling, kernelized Stein discrepancy
- ExpEnv: [DeepSea](https://github.com/stratisMarkou/sample-efficient-bayesian-rl/blob/master/code/Environments.py)- [Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators](https://openreview.net/forum?id=rDMAJECBM2)
- Paavo Parmas, Takuma Seno, Yuma Aoki
- Key: extension of Dreamer, total propagation computation graph
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [Reinforcement Learning with History Dependent Dynamic Contexts](https://openreview.net/forum?id=rdOuTlTUMX)
- Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier
- Key: non-Markov context dynamics, logistic DCMDPs, theoretical analysis, extension of MuZero
- ExpEnv: [MovieLens dataset](https://www.tensorflow.org/datasets/catalog/movielens)- [Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning](https://openreview.net/forum?id=rwLwGPdzDD)
- Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, Yang Yu
- Key: pessimistic value estimation, theoretical analysis
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl), [NeoRL](https://github.com/polixir/NeoRL)- [Simplified Temporal Consistency Reinforcement Learning](https://openreview.net/forum?id=IkhTCX9x5i)
- Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen
- Key: representation learning, temporal consistency
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [Curious Replay for Model-based Adaptation](https://openreview.net/forum?id=7p7YakZP2H)
- Isaac Kauvar, Chris Doyle, Linqi Zhou, Nick Haber
- Key: extension of DreamerV3, curious replay, count-based replay, adversarial replay
- ExpEnv: [Crafter](https://github.com/danijar/crafter), [deepmind control suite](https://github.com/deepmind/dm_control)- [On Many-Actions Policy Gradient](https://openreview.net/forum?id=HKfSTYLJh7)
- Michal Nauman, Marek Cygan
- Key: bias and variance, theoretical analysis
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [Posterior Sampling for Deep Reinforcement Learning](https://openreview.net/forum?id=ZwjSECgl6p)
- Remo Sasso, Michelangelo Conserva, Paulo Rauber
- Key: posterior sampling, continual value network
- ExpEnv: [atari](https://github.com/openai/gym)- [Model-based Offline Reinforcement Learning with Count-based Conservatism](https://openreview.net/forum?id=T5VlejGx7f)
- Byeongchan Kim, Min-hwan Oh
- Key: count estimation, theoretical analysis
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)### ICLR 2023
Toggle
- [Transformers are Sample-Efficient World Models](https://openreview.net/forum?id=vhFu1Acb0xb)
- Vincent Micheli, Eloi Alonso, François Fleuret
- Key: discrete autoencoder, transformer based world model
- OpenReview: 8, 8, 8, 8
- ExpEnv: [atari](https://github.com/openai/gym)- [Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization](https://openreview.net/forum?id=dNqxZgyjcYA)
- Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner
- Key: model-based offline, bayesian posterior value estimate
- OpenReview: 8, 8, 6, 6
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)- [User-Interactive Offline Reinforcement Learning](https://openreview.net/forum?id=a4COps0uokg)
- Phillip Swazinna, Steffen Udluft, Thomas Runkler
- Key: let the user adapt the policy behavior after training is finished
- OpenReview: 10, 8, 6, 3
- ExpEnv: [2d-world](), [industrial benchmark](https://github.com/siemens/industrialbenchmark/tree/offline_datasets/datasets)- [CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning](https://openreview.net/forum?id=5aT4ganOd98)
- Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren, Junshan Zhang
- Key: offline IRL, reward extrapolation error
- OpenReview: 8, 8, 6, 6
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)- [Efficient Offline Policy Optimization with a Learned Model](https://openreview.net/forum?id=Yt-yM-JbYFO)
- Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
- Key: offline rl, analysis of MuZero Unplugged, one-step look-ahead policy improvement
- OpenReview: 8, 6, 5
- ExpEnv: [atari dataset](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged)- [Efficient Planning in a Compact Latent Action Space](https://openreview.net/forum?id=cA77NrVEuqn)
- zhengyao jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim RocktÀschel, Edward Grefenstette, Yuandong Tian
- Key: planning with VQ-VAE
- OpenReview: 6, 6, 6, 6
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function](https://openreview.net/forum?id=hNyJBk3CwR)
- Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang
- Key: lipschitz regularization
- OpenReview: 8, 8, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations](https://openreview.net/forum?id=JdTnc9gjVfJ)
- Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran
- Key: three phases -- policy pretraining, targeted exploration, interactive learning
- OpenReview: 8, 6, 6, 6
- ExpEnv: [adroit](https://github.com/aravindr93/mjrl), [meta-world](https://github.com/rlworkgroup/metaworld), [deepmind control suite](https://github.com/deepmind/dm_control)- [Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective](https://openreview.net/forum?id=MQcmfgRxf7a)
- Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov
- Key: Aligned Latent Models
- OpenReview: 8, 6, 6, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning](https://openreview.net/forum?id=H4Ncs5jhTCu)
- Daniel Palenicek, Michael Lutter, Joao Carvalho, Jan Peters
- Key: longer horizons yield diminishing returns in terms of sample efficiency
- OpenReview: 8, 6, 6, 6
- ExpEnv: [brax](https://github.com/google/brax)- [Planning Goals for Exploration](https://openreview.net/forum?id=6qeBuZSo7Pr)
- Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman
- Key: sampling-based planning, set goals for each training episode to directly optimize an intrinsic exploration reward
- OpenReview: 8, 8, 8, 8, 6
- ExpEnv: [point maze](), [walker](https://github.com/deepmind/dm_control), [ant maze, 3-block stack](https://github.com/spitis/mrl/tree/master/envs)- [Making Better Decision by Directly Planning in Continuous Control](https://openreview.net/forum?id=r8Mu7idxyF)
- Jinhua Zhu, Yue Wang, Lijun Wu, Tao Qin, Wengang Zhou, Tie-Yan Liu, Houqiang Li
- Key: deep differentiable dynamic programming planner
- OpenReview: 8, 8, 8, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Latent Variable Representation for Reinforcement Learning](https://openreview.net/forum?id=mQpmZVzXK1h)
- Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, sujay sanghavi, Dale Schuurmans, Bo Dai
- Key: variational learning, representation learning
- OpenReview: 8, 6, 6, 3
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [deepmind control suite](https://github.com/deepmind/dm_control)- [SpeedyZero: Mastering Atari with Limited Data and Time](https://openreview.net/forum?id=Mg5CLXZgvLJ)
- Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
- Key: distributed model-based rl, speed up EfficientZero
- OpenReview: 6, 6, 5
- ExpEnv: [atari 100k](https://github.com/openai/gym)- [Transformer-based World Models Are Happy With 100k Interactions](https://openreview.net/forum?id=TdBaDGCpjly)
- Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling
- Key: autoregressive world model, Transformer-XL, balanced cross-entropy loss, balanced dataset sampling
- OpenReview: 8, 6, 6, 6
- ExpEnv: [atari 100k](https://github.com/openai/gym)- [On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning](https://openreview.net/forum?id=KB1sc5pNKFv)
- Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
- Key: offline multi-task pretraining, online finetuning
- OpenReview: 6, 6, 6, 6
- ExpEnv: [atari 100k](https://github.com/openai/gym)- [Become a Proficient Player with Limited Data through Watching Pure Videos](https://openreview.net/forum?id=Sy-o2N0hF4f)
- Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao
- Key: unsupervised pre-training, finetune with down-stream tasks
- OpenReview: 8, 6, 6, 5
- ExpEnv: [atari 100k](https://github.com/openai/gym)- [EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model](https://openreview.net/forum?id=xQAjSr64PTc)
- Yifu Yuan, Jianye HAO, Fei Ni, Yao Mu, YAN ZHENG, Yujing Hu, Jinyi Liu, Yingfeng Chen, Changjie Fan
- Key: jointly pretrain the multi-headed dynamics model and unsupervised exploration policy, finetune to downstream tasks
- OpenReview: 6, 6, 6, 6
- ExpEnv: [URLB benchmark](https://github.com/rll-research/url_benchmark)- [Choreographer: Learning and Adapting Skills in Imagination](https://openreview.net/forum?id=PhkWyijGi5b)
- Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Alexandre Lacoste, Sai Rajeswar
- Key: world model, skill discovery, skill learning, Skill adaptation
- OpenReview: 8, 8, 6, 6
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [Meta-World](https://github.com/Farama-Foundation/Metaworld)### NeurIPS 2022
Toggle
- [Bidirectional Learning for Offline Infinite-width Model-based Optimization](https://openreview.net/forum?id=_j8yVIyp27Q)
- Can Chen, Yingxue Zhang, Jie Fu, Xue Liu, Mark Coates
- Key: model-based, offline
- OpenReview: 7, 6, 5
- ExpEnv: [design-bench](https://github.com/rail-berkeley/design-bench)- [A Unified Framework for Alternating Offline Model Training and Policy Learning](https://openreview.net/forum?id=5yjM1sQ1uKZ)
- Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
- Key: model-based, offline, marginal importance weight
- OpenReview: 7, 6, 6, 5
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](https://openreview.net/forum?id=oDWyVsHBzNT)
- Kaiyang Guo, Shao Yunfeng, Yanhui Geng
- Key: model-based, offline
- OpenReview: 8, 8, 7, 7
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination](https://openreview.net/forum?id=3e3IQMLDSLP)
- Jiafei Lyu, Xiu Li, Zongqing Lu
- Key: double check mechanism, bidirectional modeling, offline RL
- OpenReview: 7, 6, 6
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Model-Based Opponent Modeling](https://arxiv.org/abs/2108.01843)
- XiaoPeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu
- Key: multi-agent, model-based
- OpenReview: 7, 6, 4, 3
- ExpEnv: [mpe](https://github.com/openai/multiagent-particle-envs), [google research football](https://github.com/google-research/football)- [Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2204.09418)
- Zhiwei Xu, Dapeng Li, Bin Zhang, Yuan Zhan, Yunpeng Bai, Guoliang Fan
- Key: multi-agent, model-based
- OpenReview: 6, 5
- ExpEnv: [StarCraft II](https://github.com/deepmind/pysc2), [Google Research Football](https://github.com/google-research/football), [Multi-Agent Discrete MuJoCo](https://github.com/schroederdewitt/multiagent_mujoco)- [MoCoDA: Model-based Counterfactual Data Augmentation](https://openreview.net/forum?id=w6tBOjPCrIO)
- Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg
- Key: data augmentation framework, offline RL
- OpenReview: 7, 7, 7, 6
- ExpEnv: [2D Navigation](https://github.com/spitis/mocoda/blob/main/augment_offline_toy.py#L45), [Hook-Sweep](https://github.com/spitis/mrl/blob/master/envs/customfetch/custom_fetch.py#L1699)- [When to Update Your Model: Constrained Model-based Reinforcement Learning](https://openreview.net/forum?id=9a1oV7UunyP)
- Tianying Ji, Yu Luo, Fuchun Sun, Mingxuan Jing, Fengxiang He, Wenbing Huang
- Key: event-triggered mechanism, constrained model-shift lower-bound optimization
- OpenReview: 6, 6, 5, 5
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm](https://openreview.net/forum?id=hYa_lseXK8)
- Ashish Jayant, Shalabh Bhatnagar
- Key: constrained RL, model-based
- OpenReview: 7, 6, 5, 5
- ExpEnv: [safety gym](https://github.com/openai/safety-gym)- [Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework](https://openreview.net/forum?id=4OHRr7gmhd4)
- Henger Li, Xiaolin Sun, Zizhan Zheng
- Key: attack & defense, federated learning, model-based
- OpenReview: 6, 6, 6, 5
- ExpEnv: MNIST, FashionMNIST, EMNIST, CIFAR-10 and synthetic dataset- [Model-Based Imitation Learning for Urban Driving](https://openreview.net/forum?id=Zk1SbbdZwS)
- Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zachary Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton
- Key: model-based, imitation learning, autonomous driving
- OpenReview: 7, 6, 6
- ExpEnv: [CARLA](https://github.com/wayveai/mile/tree/main/carla_gym)- [Data-Driven Model-Based Optimization via Invariant Representation Learning](https://openreview.net/forum?id=gKe_A-DxzkH)
- Han Qi, Yi Su, Aviral Kumar, Sergey Levine
- Key: domain adaptation, invariant objective models, representation learning (no about model-based RL)
- OpenReview: 7, 6, 6, 5, 5
- ExpEnv: [design-bench](https://github.com/rail-berkeley/design-bench)- [Model-based Lifelong Reinforcement Learning with Bayesian Exploration](https://openreview.net/forum?id=6I3zJn9Slsb)
- Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris
- Key: lifelong RL, variational bayesian
- OpenReview: 7, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [meta-world](https://github.com/rlworkgroup/metaworld)- [Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning](https://openreview.net/forum?id=L9YayWPcHA_)
- Zifan Wu, Chao Yu, Chen Chen, Jianye Hao, Hankz Hankui Zhuo
- Key: treat the model rollout process as a sequential decision making problem
- OpenReview: 7, 7, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [d4rl](https://github.com/rail-berkeley/d4rl)- [Joint Model-Policy Optimization of a Lower Bound for Model-Based RL](https://openreview.net/forum?id=LYfFj-Vk6lt)
- Benjamin Eysenbach, Alexander Khazatsky, Sergey Levine, Russ Salakhutdinov
- Key: unified objective for model-based RL
- OpenReview: 8, 8, 7, 6
- ExpEnv: [gridworld](https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/gridworld.py), [mujoco](https://github.com/openai/mujoco-py), [ROBEL manipulation](https://github.com/google-research/robel)- [RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning](https://openreview.net/forum?id=nrksGSRT7kX)
- Marc Rigter, Bruno Lacerda, Nick Hawes
- Key: offline rl, model-based rl, two-player game, adversarial model training
- OpenReview: 6, 6, 6, 4
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)- [Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning](https://openreview.net/forum?id=xL7B5axplIe)
- Shenao Zhang
- Key: posterior sampling RL, referential update, constrained conservative update
- OpenReview: 7, 7, 5, 5
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [N-Chain MDPs](https://github.com/stratisMarkou/sample-efficient-bayesian-rl/blob/master/code/Environments.py)- [Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning](https://openreview.net/forum?id=GdHVClGh9N)
- Chenyang Wu, Tianci Li, Zongzhang Zhang, Yang Yu
- Key: optimism in the face of uncertainty(OFU), BOO Regret
- OpenReview: 6, 6, 5
- ExpEnv: [RiverSwim, Chain, Random MDPs]()- [Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity](https://openreview.net/forum?id=bEMrmaw8gOB)
- Alekh Agarwal, Tong Zhang
- Key: posterior sampling RL, Bellman error decoupling framework
- OpenReview: 7, 7, 7, 6
- ExpEnv: None- [Exponential Family Model-Based Reinforcement Learning via Score Matching](https://openreview.net/forum?id=G1uywu6vNZe)
- Gene Li, Junbo Li, Nathan Srebro, Zhaoran Wang, Zhuoran Yang
- Key: optimistic model-based, score matching
- OpenReview: 7, 7, 6
- ExpEnv: None- [Deep Hierarchical Planning from Pixels](https://openreview.net/forum?id=wZk69kjy9_d)
- Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
- Key: hierarchical RL, long-horizon and sparse reward tasks
- OpenReview: 6, 6, 5
- ExpEnv: [atari](https://github.com/openai/gym), [deepmind control suite](https://github.com/deepmind/dm_control), [deepmind lab](https://github.com/deepmind/lab), [crafter](https://github.com/danijar/crafter)- [Continuous MDP Homomorphisms and Homomorphic Policy Gradient](https://arxiv.org/abs/2209.07364)
- Sahand Rezaei-Shoshtari, Rosie Zhao, Prakash Panangaden, David Meger, Doina Precup
- Key: Homomorphic Policy Gradient, Continuous MDP Homomorphisms, Lax Bisimulation Loss
- OpenReview: 7, 7, 7
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)### ICML 2022
Toggle
- [DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations](https://arxiv.org/abs/2110.14565)
- Fei Deng, Ingook Jang, Sungjin Ahn
- Key: dreamer, prototypes
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [Denoised MDPs: Learning World Models Better Than the World Itself](https://arxiv.org/pdf/2206.15477.pdf)
- Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
- Key: representation learning, denoised model
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [RoboDesk](https://github.com/SsnL/robodesk)- [Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search](https://arxiv.org/pdf/2102.08291.pdf)
- Qi Wang, Herke van Hoof
- Key: graph structured surrogate model, meta training
- ExpEnv: [atari, mujoco](https://github.com/openai/gym)- [Towards Adaptive Model-Based Reinforcement Learning](https://arxiv.org/pdf/2204.11464.pdf)
- Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen
- Key: local change adaptation
- ExpEnv: [GridWorldLoCA, ReacherLoCA, MountaincarLoCA](https://github.com/chandar-lab/LoCA2)- [Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation](https://arxiv.org/pdf/2203.07322.pdf)
- Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause
- Key: model-based multi-agent, confidence bound
- ExpEnv: [SMART](https://github.com/huawei-noah/SMARTS)- [Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning](https://arxiv.org/pdf/2206.07166.pdf)
- Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou
- Key: offline rl, model-based rl, stationary distribution regularization
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)- [Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization](https://arxiv.org/pdf/2202.08450.pdf)
- Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine
- Key: benchmark, offline MBO
- ExpEnv: [Design-Bench Benchmark Tasks](https://github.com/rail-berkeley/design-bench)- [Temporal Difference Learning for Model Predictive Control](https://arxiv.org/pdf/2203.04955.pdf)
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: td-learning, MPC
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [Meta-World](https://github.com/rlworkgroup/metaworld)### ICLR 2022
Toggle
- [Revisiting Design Choices in Offline Model Based Reinforcement Learning](https://openreview.net/forum?id=zz9hXVhf40)
- Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, Stephen J. Roberts
- Key: model-based offline, uncertainty quantification
- OpenReview: 8, 8, 6, 6, 6
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Value Gradient weighted Model-Based Reinforcement Learning](https://openreview.net/forum?id=4-D6CZkRXxI)
- Claas A Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand
- Key: Value-Gradient weighted Model loss
- OpenReview: 8, 8, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Planning in Stochastic Environments with a Learned Model](https://openreview.net/forum?id=X6D9bAHhBQ1)
- Ioannis Antonoglou, Julian Schrittwieser, Sherjil Ozair, Thomas K Hubert, David Silver
- Key: MCTS, stochastic MuZero
- OpenReview: 10, 8, 8, 5
- ExpEnv: 2048 game, Backgammon, Go- [Policy improvement by planning with Gumbel](https://openreview.net/forum?id=bERaNdoegnO)
- Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver
- Key: Gumbel AlphaZero, Gumbel MuZero
- OpenReview: 8, 8, 8, 6
- ExpEnv: go, chess, [atari](https://github.com/openai/gym)- [Model-Based Offline Meta-Reinforcement Learning with Regularization](https://openreview.net/forum?id=EBn0uInJZWh)
- Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang
- Key: model-based offline Meta-RL
- OpenReview: 8, 6, 6, 6
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [On-Policy Model Errors in Reinforcement Learning](https://openreview.net/forum?id=81e1aeOt-sd)
- Lukas Froehlich, Maksym Lefarov, Melanie Zeilinger, Felix Berkenkamp
- Key: model errors, on-policy corrections
- OpenReview: 8, 6, 6, 5
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [pybullet](https://github.com/benelot/pybullet-gym)- [A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning](https://openreview.net/forum?id=YRq0ZUnzKoZ)
- Jiaxian Guo, Mingming Gong, Dacheng Tao
- Key: relational intervention, dynamics generalization
- OpenReview: 8, 8, 6, 6
- ExpEnv: [Pendulum](https://github.com/openai/gym), [mujoco](https://github.com/openai/mujoco-py)- [Information Prioritization through Empowerment in Visual Model-based RL](https://openreview.net/forum?id=DfUjyyRW90)
- Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine
- Key: mutual information, visual model-based RL
- OpenReview: 8, 8, 8, 6
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [Kinetics dataset](https://github.com/cvdfoundation/kinetics-dataset)- [Transfer RL across Observation Feature Spaces via Model-Based Regularization](https://openreview.net/forum?id=7KdAoOsI81C)
- Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew E Cohen, Furong Huang
- Key: latent dynamics model, transfer RL
- OpenReview: 8, 6, 5, 5
- ExpEnv: [CartPole, Acrobot and Cheetah-Run](https://github.com/openai/gym), [mujoco](https://github.com/openai/mujoco-py), [3DBall](https://github.com/Unity-Technologies/ml-agents)- [Learning State Representations via Retracing in Reinforcement Learning](https://openreview.net/forum?id=CLpxpXqqBV)
- Changmin Yu, Dong Li, Jianye HAO, Jun Wang, Neil Burgess
- Key: representation learning, learning via retracing
- OpenReview: 8, 6, 5, 3
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [Model-augmented Prioritized Experience Replay](https://openreview.net/forum?id=WuEiafqdy9H)
- Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
- Key: prioritized experience replay, mbrl
- OpenReview: 8, 8, 6, 5
- ExpEnv: [pybullet](https://github.com/benelot/pybullet-gym)- [Evaluating Model-Based Planning and Planner Amortization for Continuous Control](https://openreview.net/forum?id=SS8F6tFX3-)
- Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller
- Key: model predictive control
- OpenReview: 8, 6, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Gradient Information Matters in Policy Optimization by Back-propagating through Model](https://openreview.net/forum?id=rzvOQrnclO0)
- Chongchong Li, Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu
- Key: two-model-based method, analyze model error and policy gradient
- OpenReview: 8, 8, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Pareto Policy Pool for Model-based Offline Reinforcement Learning](https://openreview.net/forum?id=OqcZu8JIIzS)
- Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi
- Key: model-based offline, model return-uncertainty trade-off
- OpenReview: 8, 8, 6, 5
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage](https://openreview.net/forum?id=tyrJsbKAe6)
- Masatoshi Uehara, Wen Sun
- Key: model-based offline theory, PAC bounds
- OpenReview: 8, 6, 6, 5
- ExpEnv: None- [Know Thyself: Transferable Visual Control Policies Through Robot-Awareness](https://openreview.net/forum?id=o0ehFykKVtr)
- Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman
- Key: world models that transfer to new robots
- OpenReview: 8, 6, 6, 5
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), WidowX and Franka Panda robot### NeurIPS 2021
Toggle
- [On Effective Scheduling of Model-based Reinforcement Learning](https://arxiv.org/abs/2111.08550)
- Hang Lai, Jian Shen, Weinan Zhang, Yimin Huang, Xing Zhang, Ruiming Tang, Yong Yu, Zhenguo Li
- Key: extension of mbpo, hyper-controller learning
- OpenReview: 8, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py), [pybullet](https://github.com/benelot/pybullet-gym)- [COMBO: Conservative Offline Model-Based Policy Optimization](https://openreview.net/pdf?id=dUEpGV2mhf)
- Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
- Key: offline reinforcement learning, model-based reinforcement learning, deep reinforcement learning
- OpenReview: 6, 7, 6, 8
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Safe Reinforcement Learning by Imagining the Near Future](https://arxiv.org/abs/2202.07789)
- Garrett Thomas, Yuping Luo, Tengyu Ma
- Key: safe rl, reward penalty, theory about model-based rollouts
- OpenReview: 8, 6, 6
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Model-Based Reinforcement Learning via Imagination with Derived Memory](https://openreview.net/forum?id=jeATherHHGj)
- Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Eben Li, Chongjie Zhang, Jianye HAO
- Key: extension of dreamer, prediction-reliability weight
- OpenReview: 6, 6, 6, 6
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [MobILE: Model-Based Imitation Learning From Observation Alone](https://arxiv.org/abs/2102.10769)
- Rahul Kidambi, Jonathan Chang, Wen Sun
- Key: imitation learning from observations alone, mbrl
- OpenReview: 6, 6, 6, 4
- ExpEnv: [cartpole](https://github.com/openai/gym), [mujoco](https://github.com/openai/mujoco-py)- [Model-Based Episodic Memory Induces Dynamic Hybrid Controls](https://arxiv.org/abs/2111.02104)
- Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh
- Key: model-based, episodic control
- OpenReview: 7, 7, 6, 6
- ExpEnv: [2D maze navigation](https://github.com/MattChanTK/gym-maze), [cartpole, mountainCar and lunarlander](https://github.com/openai/gym), [atari](https://gym.openai.com/envs/atari), [3D navigation: gym-miniworld](https://github.com/maximecb/gym-miniworld)- [A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning](https://arxiv.org/abs/2106.02097)
- Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
- Key: mbrl, set representation
- OpenReview: 7, 7, 7, 6
- ExpEnv: [MiniGrid-BabyAI framework](https://github.com/maximecb/gym-minigrid)- [Mastering Atari Games with Limited Data](https://openreview.net/forum?id=OKrNPg3xR3T)
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
- Key: muzero, self-supervised consistency loss
- OpenReview: 7, 7, 7, 5
- ExpEnv: [atrai 100k](https://github.com/openai/gym), [deepmind control suite](https://github.com/deepmind/dm_control)- [Online and Offline Reinforcement Learning by Planning with a Learned Model](https://openreview.net/forum?id=HKtsGW-lNbw)
- Julian Schrittwieser, Thomas K Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver
- Key: muzero, reanalyse, offline
- OpenReview: 8, 8, 7, 6
- ExpEnv: [atrai dataset, deepmind control suite dataset](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged)- [Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
- Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
- Key: new model learning way
- OpenReview: 7, 7, 7, 6
- ExpEnv: tabular MDP, Sokoban, [atari](https://github.com/openai/gym)- [Proper Value Equivalence](https://arxiv.org/abs/2106.10316)
- Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, Satinder Singh
- Key: value equivalence, value-based planning, muzero
- OpenReview: 8, 7, 7, 6
- ExpEnv: [four rooms](https://github.com/maximecb/gym-minigrid), [atari](https://github.com/openai/gym)- [MOPO: Model-based Offline Policy Optimization](https://arxiv.org/abs/2005.13239)
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
- Key: model-based, offline
- OpenReview: None
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl), halfcheetah-jump and ant-angle- [RoMA: Robust Model Adaptation for Offline Model-based Optimization](https://arxiv.org/abs/2110.14188)
- Sihyun Yu, Sungsoo Ahn, Le Song, Jinwoo Shin
- Key: model-based, offline
- OpenReview: 7, 6, 6
- ExpEnv: [design-bench](https://github.com/brandontrabucco/design-bench)- [Offline Reinforcement Learning with Reverse Model-based Imagination](https://arxiv.org/abs/2110.00188)
- Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
- Key: model-based, offline
- OpenReview: 7, 6, 6, 5
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Offline Model-based Adaptable Policy Learning](https://openreview.net/forum?id=lrdXc17jm6)
- Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye
- Key: model-based, offline
- OpenReview: 6, 6, 6, 4
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Weighted model estimation for offline model-based reinforcement learning](https://openreview.net/pdf?id=zdC5eXljMPy)
- Toru Hishinuma, Kei Senda
- Key: model-based, offline, off-policy evaluation
- OpenReview: 7, 6, 6, 6
- ExpEnv: pendulum, [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation](https://arxiv.org/abs/2110.06394)
- Weitong Zhang, Dongruo Zhou, Quanquan Gu
- Key: learning theory, model-based reward-free RL, linear function approximation
- OpenReview: 6, 6, 5, 5
- ExpEnv: None- [Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature](https://arxiv.org/abs/2102.04168)
- Kefan Dong, Jiaqi Yang, Tengyu Ma
- Key: learning theory, model-based bandit RL, nonlinear function approximation
- OpenReview: 7, 7, 7, 6
- ExpEnv: None- [Discovering and Achieving Goals via World Models](https://openreview.net/forum?id=6vWuYzkp8d)
- Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak
- Key: unsupervised goal reaching, goal-conditioned RL
- OpenReview: 6, 6, 6, 6, 6
- ExpEnv: [walker, quadruped, bins, kitchen](https://github.com/orybkin/lexa-benchmark)### ICLR 2021
Toggle
- [Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization](https://arxiv.org/abs/2006.03647)
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
- Key: model-based, behavior cloning (warmup), trpo
- OpenReview: 8, 7, 7, 5
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Control-Aware Representations for Model-based Reinforcement Learning](https://arxiv.org/abs/2006.13408)
- Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
- Key: representation learning, model-based soft actor-critic
- OpenReview: 6, 6, 6
- ExpEnv: planar system, inverted pendulum â swingup, cartpole, 3-link manipulator â swingUp & balance- [Mastering Atari with Discrete World Models](https://arxiv.org/abs/2010.02193)
- Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba
- Key: DreamerV2, many tricks(multiple categorical variables, KL balancing, etc)
- OpenReview: 9, 8, 5, 4
- ExpEnv: [atari](https://github.com/openai/gym)- [Model-Based Visual Planning with Self-Supervised Functional Distances](https://openreview.net/forum?id=UcoXdfrORC)
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
- Key: goal-reaching task, dynamics learning, distance learning (goal-conditioned Q-function)
- OpenReview: 7, 7, 7, 7
- ExpEnv: [sawyer](https://github.com/rlworkgroup/metaworld/tree/master/metaworld/envs), door sliding- [Model-Based Offline Planning](https://arxiv.org/abs/2008.05556)
- Arthur Argenson, Gabriel Dulac-Arnold
- Key: model-based, offline
- OpenReview: 8, 7, 5, 5
- ExpEnv: [RL Unplugged(RLU)](https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged), [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation](https://arxiv.org/abs/2102.07970)
- Justin Fu, Sergey Levine
- Key: model-based, offline
- OpenReview: 8, 6, 6
- ExpEnv: [design-bench](https://github.com/brandontrabucco/design-bench)- [On the role of planning in model-based deep reinforcement learning](https://arxiv.org/abs/2011.04021)
- Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar VeliÄkoviÄ, ThĂ©ophane Weber
- Key: discussion about planning in MuZero
- OpenReview: 7, 7, 6, 5
- ExpEnv: [atari](https://github.com/openai/gym), go, [deepmind control suite](https://github.com/deepmind/dm_control)- [Representation Balancing Offline Model-based Reinforcement Learning](https://openreview.net/forum?id=QpNz8r_Ri2Y)
- Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim
- Key: Representation Balancing MDP, model-based, offline
- OpenReview: 7, 7, 7, 6
- ExpEnv: [d4rl dataset](https://github.com/rail-berkeley/d4rl)- [Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?](https://openreview.net/forum?id=p5uylG94S68)
- BalĂĄzs KĂ©gl, Gabriel Hurtado, Albert Thomas
- Key: mixture density nets, heteroscedasticity
- OpenReview: 7, 7, 7, 6, 5
- ExpEnv: [acrobot system](https://github.com/openai/gym)### ICML 2021
Toggle
- [Conservative Objective Models for Effective Offline Model-Based Optimization](https://arxiv.org/abs/2107.06882)
- Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
- Key: conservative objective model, offline mbrl
- ExpEnv: [design-bench](https://github.com/brandontrabucco/design-bench)- [Continuous-Time Model-Based Reinforcement Learning](https://arxiv.org/abs/2102.04764)
- ĂaÄatay Yıldız, Markus Heinonen, Harri LĂ€hdesmĂ€ki
- Key: continuous-time
- ExpEnv: [pendulum, cartPole and acrobot](https://github.com/openai/gym)- [Model-Based Reinforcement Learning via Latent-Space Collocation](https://arxiv.org/abs/2106.13229)
- Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
- Key: latent space collocation
- ExpEnv: [sparse metaworld tasks](https://github.com/rlworkgroup/metaworld/tree/master/metaworld/envs)- [Model-Free and Model-Based Policy Evaluation when Causality is Uncertain](http://proceedings.mlr.press/v139/bruns-smith21a.html)
- David A Bruns-Smith
- Key: worst-case bounds
- ExpEnv: [ope-tools](https://github.com/clvoloshin/COBS)- [Muesli: Combining Improvements in Policy Optimization](https://arxiv.org/abs/2104.06159)
- Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt
- Key: value equivalence
- ExpEnv: [atari](https://github.com/openai/gym)- [Vector Quantized Models for Planning](https://arxiv.org/pdf/2106.04615.pdf)
- Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, AĂ€ron van den Oord, Oriol Vinyals
- Key: VQVAE, MCTS
- ExpEnv: [chess datasets](https://www.ïŹcsgames.org/download.html), [DeepMind Lab](https://github.com/deepmind/lab)- [PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration](https://arxiv.org/abs/2107.07410)
- Yuda Song, Wen Sun
- Key: sample complexity, kernelized nonlinear regulators, linear MDPs
- ExpEnv: [mountain car, antmaze](https://github.com/openai/gym), [mujoco](https://github.com/openai/mujoco-py)- [Temporal Predictive Coding For Model-Based Planning In Latent Space](https://arxiv.org/abs/2106.07156)
- Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon
- Key: temporal predictive coding with a RSSM, latent space
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control)- [Model-based Reinforcement Learning for Continuous Control with Posterior Sampling](https://arxiv.org/abs/2012.09613)
- Ying Fan, Yifei Ming
- Key: regret bound of psrl, mpc
- ExpEnv: [continuous cartpole, pendulum swingup](https://github.com/openai/gym), [mujoco](https://github.com/openai/mujoco-py)- [A Sharp Analysis of Model-based Reinforcement Learning with Self-Play](https://arxiv.org/abs/2010.01604)
- Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
- Key: learning theory, multi-agent, model-based self play, two-player zero-sum Markov games
- ExpEnv: None### Other
- [Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving](https://openaccess.thecvf.com/content/CVPR2024/html/Wang_Driving_into_the_Future_Multiview_Visual_Forecasting_and_Planning_with_CVPR_2024_paper.html)
- Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang Zhang *CVPR 2024*
- Key: AutoDrive world modeling
- ExpEnv: [nuScenes]()- [DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving](https://openreview.net/pdf?id=tT3LUdmzbd)
- Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, Liping Jing, Yiming Nie, Bin Dai *CVPR 2024*
- Key: AutoDrive world modeling
- ExpEnv: [nuScenes](), [OpenScene]()- [Masked Trajectory Models for Prediction, Representation, and Control](https://openreview.net/pdf?id=tT3LUdmzbd)
- Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran *ICLR 2023 Workshop RRL*
- Key: offline RL, learning for control, sequence modeling
- ExpEnv: [d4rl](https://github.com/rail-berkeley/d4rl)- [World Models via Policy-Guided Trajectory Diffusion](https://arxiv.org/abs/2312.08533)
- Marc Rigter, Jun Yamada, Ingmar Posner *Arxiv 2023*
- Key: Diffusion model, world model
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [gridworld](https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/gridworld.py)- [Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization](https://arxiv.org/abs/2312.04386)
- Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters *Arxiv 2023*
- Key: cumulative rewards uncertainty estimation in MBRL
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)- [Sample-Efficient Learning to Solve a Real-World Labyrinth Game Using Data-Augmented Model-Based Reinforcement Learning](https://arxiv.org/abs/2312.09906)
- Thomas Bi, Raffaello D'Andrea. *Arxiv 2023*
- Key: Data-Augmented, DreamerV3
- ExpEnv: [Real-World Labyrinth Game]()- [Mastering Diverse Domains through World Models](https://arxiv.org/abs/2301.04104)
- Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap. *Arxiv 2023*
- Key: DreamerV3, scaling property to world model
- ExpEnv: [deepmind control suite](https://github.com/deepmind/dm_control), [atari](https://github.com/openai/gym), [DMLab](https://github.com/deepmind/lab), [minecraft](https://github.com/minerllabs/minerl)- [Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning](https://arxiv.org/abs/2307.12933)
- Chuming Li, Ruonan Jia, Jiawei Yao, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang. *IJCAI Workshop 2023*
- Key: extended policy improvement, model regularization, planning theorem
- ExpEnv: [mujoco](https://github.com/openai/mujoco-py)## Tutorial
- [Video] [Csaba SzepesvĂĄri - The challenges of model-based reinforcement learning and how to overcome them](https://www.youtube.com/watch?v=-Y-fHsPIQ_Q)
- [Blog] [Model-Based Reinforcement Learning: Theory and Practice](https://bair.berkeley.edu/blog/2019/12/12/mbpo/)## Codebase
- [mbrl-lib](https://github.com/facebookresearch/mbrl-lib) - Meta: Library for Model Based RL
- [DI-engine](https://github.com/opendilab/DI-engine) - OpenDILab: Decision AI Engine## Contributing
Our purpose is to make this repo even better. If you are interested in contributing, please refer to [HERE](CONTRIBUTING.md) for instructions in contribution.
## License
Awesome Model-Based RL is released under the Apache 2.0 license.