https://github.com/leofan90/awesome-world-models

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related websites.
https://github.com/leofan90/awesome-world-models
List: awesome-world-models
artificial-intelligence autonomous-driving awesome deep-learning embodied-ai future-prediction video-prediction world-model
Last synced: about 1 month ago
JSON representation
Host: GitHub
URL: https://github.com/leofan90/awesome-world-models
Owner: leofan90
License: bsd-3-clause
Created: 2024-11-09T09:25:32.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-11-17T14:03:41.000Z (7 months ago)
Last Synced: 2024-11-17T15:18:30.099Z (7 months ago)
Topics: artificial-intelligence, autonomous-driving, awesome, deep-learning, embodied-ai, future-prediction, video-prediction, world-model
Homepage:
Size: 76.2 KB
Stars: 30
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project

ultimate-awesome - awesome-world-models - A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related websites. (Other Lists / Julia Lists)
README

        # Awesome World Models for Robotics [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

This repository provides a curated list of **papers for World Models for General Video Generation, Embodied AI, and Autonomous Driving**. Template from [Awesome-LLM-Robotics](https://github.com/GT-RIPL/Awesome-LLM-Robotics) and [Awesome-World-Model](https://github.com/LMD0311/Awesome-World-Model)


#### Contributions are welcome! Please feel free to submit [pull requests](https://github.com/leofan90/Awesome-World-Models/blob/main/how-to-PR.md) or reach out via [email](mailto:chunkaifan-changetoat-stu-changetodot-pku--changetodot-changetoedu-changetocn) to add papers! 


If you find this repository useful, please consider [citing](#citation) and giving this list a star ⭐. Feel free to share it with others!

---

## Overview

  - [Foundation paper of World Model](#foundation-paper-of-world-model)

  - [Blog or Technical Report](#blog-or-technical-report)

  - [Surveys](#surveys)

  - [Benchmarks](#benchmarks)

  - [General World Models](#general-world-models)

  - [World Models for Embodied AI](#world-models-for-embodied-ai)

  - [World Models for Autonomous Driving](#world-models-for-autonomous-driving)

  - [Citation](#citation)

---

## Foundation paper of World Model

* World Models, **`NIPS 2018 Oral`**. [[Paper](https://arxiv.org/abs/1803.10122)] [[Website](https://worldmodels.github.io/)] 

## Blog or Technical Report

* **`GAIA-2`**, GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving. [[Paper](https://arxiv.org/abs/2503.20523)] [[Website](https://wayve.ai/thinking/gaia-2)]

* **`Cosmos`**, Cosmos World Foundation Model Platform for Physical AI. [[Paper](https://arxiv.org/abs/2501.03575)] [[Website](https://www.nvidia.com/en-us/ai/cosmos/)] [[Code](https://github.com/NVIDIA/Cosmos)]

* **`1X Technologies`**, 1X World Model. [[Blog](https://www.1x.tech/discover/1x-world-model)]

* **`Runway`**, Introducing General World Models. [[Blog](https://runwayml.com/research/introducing-general-world-models)]

* **`Wayve`**, Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy. [[Paper](https://arxiv.org/pdf/2309.17080)] [[Blog](https://wayve.ai/thinking/introducing-gaia1/)] 

* **`Yann LeCun`**, A Path Towards Autonomous Machine Intelligence. [[Paper](https://openreview.net/pdf?id=BZ5a1r-kVsf)]

## Surveys

* "Exploring the Evolution of Physics Cognition in Video Generation: A Survey", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.21765)] [[Code](https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation)]

* "World Models in Artificial Intelligence: Sensing, Learning, and Reasoning Like a Child", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.15168)] 

* "Simulating the Real World: A Unified Survey of Multimodal Generative Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.04641)] [[Code](https://github.com/ALEEEHU/World-Simulator)]

* "Four Principles for Physically Interpretable World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02143)]

* "The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.10498)] [[Code](https://github.com/LMD0311/Awesome-World-Model)]

* "A Survey of World Models for Autonomous Driving", **`TPAMI`**. [[Paper](https://arxiv.org/abs/2501.11260)]

* "Understanding World or Predicting Future? A Comprehensive Survey of World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.14499)]

* "World Models: The Safety Perspective", **`ISSRE WDMD`**. [[Paper](https://arxiv.org/abs/2411.07690)]

* "Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.02914)]

* "From Efficient Multimodal Models to World Models: A Survey", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.00118)]

* "Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.06886)] [[Code](https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List)]

* "Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.03520)] [[Code](https://github.com/GigaAI-research/General-World-Models-Survey)]

* "World Models for Autonomous Driving: An Initial Survey", **`TIV`**. [[Paper](https://arxiv.org/abs/2403.02622)]

* "A survey on multimodal large language models for autonomous driving", **`WACVW 2024`**. [[Paper](https://arxiv.org/abs/2311.12320)] [[Code](https://github.com/IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving)]

---

## Benchmarks

* "Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments", **`arxiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.08122)] 

* **WorldModelBench**: "WorldModelBench: Judging Video Generation Models As World Models", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2502.13092)] [[Website](https://worldmodelbench-team.github.io/)]

* **Text2World**: "Text2World: Benchmarking Large Language Models for Symbolic World Model Generation", **`arxiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.20694)] [[Website](https://text-to-world.github.io/)] 

* **ACT-Bench**: "ACT-Bench: Towards Action Controllable World Models for Autonomous Driving", **`arxiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.05337)]

* **WorldSimBench**: "WorldSimBench: Towards Video Generation Models as World Simulators", **`arxiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.18072)] [[Website](https://iranqin.github.io/WorldSimBench.github.io/)] 

* **EVA**: "EVA: An Embodied World Model for Future Video Anticipation", **`arxiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.15461)] [[Website](https://sites.google.com/view/eva-publi)] 

* **AeroVerse**: "AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models", **`arxiv 2024.08`**. [[Paper](https://arxiv.org/pdf/2408.15511)]

* **CityBench**: "CityBench: Evaluating the Capabilities of Large Language Model as World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.13945)] [[Code](https://github.com/tsinghua-fib-lab/CityBench)]

* "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models", **`NIPS 2023`**. [[Paper](https://arxiv.org/abs/2311.09064)]

---

## General World Models

* **WebEvolver**: "WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.21024)] 

* **WALL-E 2.0**: "WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.15785)] [[Code](https://github.com/elated-sawyer/WALL-E)]

* **ViMo**: "ViMo: A Generative Visual GUI World Model for App Agent", **`arxiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.13936)] 

* "Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning", **`SIGIR 2025`**. [[Paper](https://arxiv.org/abs/2504.13643)] 

* **CheXWorld**: "CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.13820)] [[Code](https://github.com/LeapLabTHU/CheXWorld)]

* **EchoWorld**: "EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.13065)] [[Code](https://github.com/LeapLabTHU/EchoWorld)]

* "Adapting a World Model for Trajectory Following in a 3D Game", **`ICLR 2025 Workshop on World Models`**. [[Paper](https://arxiv.org/abs/2504.12299)] 

* **MineWorld**: "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.07257)] [[Website](https://aka.ms/mineworld)]

* **MoSim**: "Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning", **`CVPR 2025`**. [[Paper](https://arxiv.org/abs/2504.07095)]

* "Improving World Models using Deep Supervision with Linear Probes", **`ICLR 2025 Workshop on World Models`**. [[Paper](https://arxiv.org/abs/2504.03861)]

* "Decentralized Collective World Model for Emergent Communication and Coordination", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.03353)]

* "Adapting World Models with Latent-State Dynamics Residuals", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.02252)]

* "Can Test-Time Scaling Improve World Foundation Model?", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.24320)] [[Code](https://github.com/Mia-Cong/SWIFT.git)]

* "Synthesizing world models for bilevel planning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.20124)] 

* **Aether**: "Aether: Geometric-Aware Unified World Modeling", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.18945)] [[Website](https://aether-world.github.io/)]

* **FUSDREAMER**: "FUSDREAMER: Label-efficient Remote Sensing World Model for Multimodal Data Classification", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13814)] [[Website](https://github.com/Cimy-wang/FusDreamer)]

* "Inter-environmental world modeling for continuous and compositional dynamics", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.09911)] 

* **Disentangled World Models**: "Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.08751)] 

* "Revisiting the Othello World Model Hypothesis", **`ICLR World Models Workshop`**. [[Paper](https://arxiv.org/abs/2503.04421)] 

* "Learning Transformer-based World Models with Contrastive Predictive Coding", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.04416)] 

* "Surgical Vision World Model", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02904)] 

* "World Models for Anomaly Detection during Model-Based Reinforcement Learning Inference", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02552)] 

* **WMNav**: "WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.02247)] [[Website](https://b0b8k1ng.github.io/WMNav/)]

* **SENSEI**: "SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.01584)] [[Website](https://sites.google.com/view/sensei-paper)]

* "Learning Actionable World Models for Industrial Process Control", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.00713)]

* "Implementing Spiking World Model with Multi-Compartment Neurons for Model-based Reinforcement Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.00713)]

* "Discrete Codebook World Models for Continuous Control", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2503.00653)]

* **Multimodal Dreaming**: "Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.21142)]

* "Generalist World Model Pre-Training for Efficient Reinforcement Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.19544)]

* "Learning To Explore With Predictive World Model Via Self-Supervised Learning", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.13200)]

* **M^3**: "M^3: A Modular World Model over Streams of Tokens", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.11537)]

* "When do neural networks learn world models?", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.09297)]

* "Pre-Trained Video Generative Models as World Simulators", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.07825)]

* **DMWM**: "DMWM: Dual-Mind World Model with Long-Term Imagination", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.07591)]

* **EvoAgent**: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.05907)]

* "Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.05857)]

* "Generating Symbolic World Models via Test-time Scaling of Large Language Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.04728)] [[Website](https://vmlpddl.github.io/)]

* "Improving Transformer World Models for Data-Efficient RL", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.01591)]

* "Trajectory World Models for Heterogeneous Environments", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.01366)]

* "Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.00466)]

* "Objects matter: object-centric world models improve reinforcement learning in visually complex environments", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.16443)]

* **GLAM**: "GLAM: Global-Local Variation Awareness in Mamba-based World Model", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.11949)]

* **GAWM**: "GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.10116)]

* "Generative Emergent Communication: Large Language Model is a Collective World Model", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.00226)]

* "Towards Unraveling and Improving Generalization in World Models", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.00195)]

* "Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.12870)]

* "Transformers Use Causal World Models in Maze-Solving Tasks", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.11867)]

* "Causal World Representation in the GPT Model", **`NIPS 2024 Workshop`**. [[Paper](https://arxiv.org/abs/2412.07446)]

* **Owl-1**: "Owl-1: Omni World Model for Consistent Long Video Generation", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.09600)]

* "Navigation World Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.03572)] [[Website](https://www.amirbar.net/nwm/)]

* "Evaluating World Models with LLM for Decision Making", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.08794)] 

* **LLMPhy**: "LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.08027)] 

* **WebDreamer**: "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.06559)] [[Code](https://github.com/OSU-NLP-Group/WebDreamer)]

* "Scaling Laws for Pre-training Agents and World Models", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.04434)]

* **DINO-WM**: "DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.04983)] [[Website](https://dino-wm.github.io/)]

* "Learning World Models for Unconstrained Goal Navigation", **`NIPS 2024`**. [[Paper](https://arxiv.org/abs/2411.02446)]

* "How Far is Video Generation from World Model: A Physical Law Perspective", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.02385)] [[Website](https://phyworld.github.io/)] [[Code](https://github.com/phyworld/phyworld)]

* **Adaptive World Models**: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity", **`NIPS 2024 Workshop Adaptive Foundation Models`**. [[Paper](https://arxiv.org/abs/2411.01342)]

* **LLMCWM**: "Language Agents Meet Causality -- Bridging LLMs and Causal World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.19923)] [[Code](https://github.com/j0hngou/LLMCWM/)]

* "Reward-free World Models for Online Imitation Learning", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.14081)]

* "Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.13232)]

* **AVID**: "AVID: Adapting Video Diffusion Models to World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.12822)] [[Code](https://github.com/microsoft/causica/tree/main/research_experiments/avid)]

* **SMAC**: "Grounded Answers for Multi-agent Decision-making Problem through Generative World Model", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2410.02664)]

* **OSWM**: "One-shot World Models Using a Transformer Trained on a Synthetic Prior", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.14084)]

* "Making Large Language Models into World Models with Precondition and Effect Knowledge", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.12278)]

* "Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.11816)]

* **MoReFree**: "World Models Increase Autonomy in Reinforcement Learning", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.09807)] [[Project](https://sites.google.com/view/morefree)]

* **UrbanWorld**: "UrbanWorld: An Urban World Model for 3D City Generation", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.11965)]

* **PWM**: "PWM: Policy Learning with Large World Models", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.02466)] [[Code](https://www.imgeorgiev.com/pwm/)]

* "Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.02446)]

* **GenRL**: "GenRL: Multimodal foundation world models for generalist embodied agents", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.18043)] [[Code](https://github.com/mazpie/genrl)]

* **DLLM**: "World Models with Hints of Large Language Models for Goal Achieving", **`arXiv 2024.06`**. [[Paper](http://arxiv.org/pdf/2406.07381)]

* "Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.15275)]

* **CoDreamer**: "CoDreamer: Communication-Based Decentralised World Models", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.13600)]

* **Pandora**: "Pandora: Towards General World Model with Natural Language Actions and Video States", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.09455)] [[Code](https://github.com/maitrix-org/Pandora)]

* **EBWM**: "Cognitively Inspired Energy-Based World Models", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.08862)]

* "Evaluating the World Model Implicit in a Generative Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.03689)] [[Code](https://github.com/keyonvafa/world-model-evaluation)]

* "Transformers and Slot Encoding for Sample Efficient Physical World Modelling", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.20180)] [[Code](https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm)]

* **Puppeteer**: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.18418)] [[Code](https://nicklashansen.com/rlpuppeteer)]

* **BWArea Model**: "BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.17039)]

* **WKM**: "Agent Planning with World Knowledge Model", **`arXiv 2024.05`**.  [[Paper](https://arxiv.org/abs/2405.14205)] [[Code](https://github.com/zjunlp/WKM)]

* **Diamond**: "Diffusion for World Modeling: Visual Details Matter in Atari", **`arXiv 2024.05`**.  [[Paper](https://arxiv.org/abs/2405.12399)] [[Code](https://github.com/eloialonso/diamond)]

* "Compete and Compose: Learning Independent Mechanisms for Modular World Models", **`arXiv 2024.04`**.  [[Paper](https://arxiv.org/abs/2404.15109)]

* "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization", **`arXiv 2024.03`**.  [[Paper](https://arxiv.org/abs/2403.10967)] [[Code](https://github.com/sai-prasanna/dreaming_of_many_worlds)]

* **V-JEPA**: "V-JEPA: Video Joint Embedding Predictive Architecture", **`Meta AI`**. [[Blog](https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/)] [[Paper](https://ai.meta.com/research/publications/revisiting-feature-prediction-for-learning-visual-representations-from-video/)] [[Code](https://github.com/facebookresearch/jepa)]

* **IWM**: "Learning and Leveraging World Models in Visual Representation Learning", **`Meta AI`**. [[Paper](https://arxiv.org/abs/2403.00504)] 

* **Genie**: "Genie: Generative Interactive Environments", **`DeepMind`**. [[Paper](https://arxiv.org/abs/2402.15391)] [[Blog](https://sites.google.com/view/genie-2024/home)]

* **Sora**: "Video generation models as world simulators", **`OpenAI`**. [[Technical report](https://openai.com/research/video-generation-models-as-world-simulators)]

* **LWM**: "World Model on Million-Length Video And Language With RingAttention", **`arXiv 2024.02`**.  [[Paper](https://arxiv.org/abs/2402.08268)] [[Code](https://github.com/LargeWorldModel/LWM)]

* "Planning with an Ensemble of World Models", **`OpenReview`**. [[Paper](https://openreview.net/forum?id=cvGdPXaydP)]

* **WorldDreamer**: "WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens", **`arXiv 2024.01`**. [[Paper](https://arxiv.org/abs/2401.09985)] [[Code](https://github.com/JeffWang987/WorldDreamer)]

* **CWM**: "Understanding Physical Dynamics with Counterfactual World Modeling", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03523.pdf)] [[Code](https://neuroailab.github.io/cwm-physics/)]

* **Δ-IRIS**: "Efficient World Models with Context-Aware Tokenization", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2406.19320)] [[Code](https://github.com/vmicheli/delta-iris)]

* **LLM-Sim**: "Can Language Models Serve as Text-Based World Simulators?", **`ACL`**. [[Paper](https://arxiv.org/abs/2406.06485)] [[Code](https://github.com/cognitiveailab/GPT-simulator)]

* **AD3**: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2403.09976)]

* **MAMBA**: "MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning", **`ICLR 2024`**.  [[Paper](https://arxiv.org/abs/2403.09859)] [[Code](https://github.com/zoharri/mamba)]

* **R2I**: "Mastering Memory Tasks with World Models", **`ICLR 2024`**. [[Paper](http://arxiv.org/pdf/2403.04253)] [[Website](https://recall2imagine.github.io/)] [[Code](https://github.com/chandar-lab/Recall2Imagine)]

* **HarmonyDream**: "HarmonyDream: Task Harmonization Inside World Models", **`ICML 2024`**. [[Paper](https://openreview.net/forum?id=x0yIaw2fgk)] [[Code](https://github.com/thuml/HarmonyDream)]

* **REM**: "Improving Token-Based World Models with Parallel Observation Prediction", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2402.05643)] [[Code](https://github.com/leor-c/REM)]

* "Do Transformer World Models Give Better Policy Gradients?"", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2402.05290)]

* **DreamSmooth**: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing", **`ICLR 2024`**. [[Paper](https://arxiv.org/pdf/2311.01450)]

* **TD-MPC2**: "TD-MPC2: Scalable, Robust World Models for Continuous Control", **`ICLR 2024`**. [[Paper](https://arxiv.org/pdf/2310.16828)] [[Torch Code](https://github.com/nicklashansen/tdmpc2)]

* **Hieros**: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2310.05167)]

* **CoWorld**: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2305.15260)]

---

## World Models for Embodied AI

* **TesserAct**: "TesserAct: Learning 4D Embodied World Models", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.20995)] [[Website](https://tesseractworld.github.io/)]

* **PIN-WM**: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16693)] 

* "Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16680)] 

* **ManipDreamer**: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.16464)] 

* **UWM**: "Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.02792)] [[Website](https://weirdlabuw.github.io/uwm/)]

* "Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.20425)] 

* **AdaWorld**: "AdaWorld: Learning Adaptable World Models with Latent Actions", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.18938)] [[Website](https://adaptable-world-model.github.io/)] 

* **DyWA**: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.16806)] [[Website](https://pku-epic.github.io/DyWA/)] 

* "Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.12531)] [[Website](https://mkturkcan.github.io/suturingmodels/)] 

* "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.10480)] 

* **LUMOS**: "LUMOS: Language-Conditioned Imitation Learning with World Models", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2503.10370)] [[Website](http://lumos.cs.uni-freiburg.de/)] 

* "Object-Centric World Model for Language-Guided Manipulation", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.06170)] 

* **DEMO^3**: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.01837)] [[Website](https://adrialopezescoriza.github.io/demo3/)] 

* "Accelerating Model-Based Reinforcement Learning with State-Space World Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.20168)] 

* "Learning Humanoid Locomotion with World Model Reconstruction", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.16230)] 

* "Strengthening Generative Robot Policies through Predictive World Modeling", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.00622)] [[Website](https://computationalrobotics.seas.harvard.edu/GPC)] 

* **Robotic World Model**: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.10100)]

* **RoboHorizon**: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.06605)] 

* **Dream to Manipulate**: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.14957)] [[Website](https://leobarcellona.github.io/DreamToManipulate/)] 

* **WHALE**: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.05619)]

* **VisualPredicator**: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.23156)] 

* "Multi-Task Interactive Robot Fleet Learning with Visual World Models", **`CoRL 2024`**. [[Paper](https://arxiv.org/abs/2410.22689)] [[Code](https://ut-austin-rpl.github.io/sirius-fleet/)]

* **X-MOBILITY**: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.17491)]

* **PIVOT-R**: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/pdf/2410.10394)]

* **GLIMO**: "Grounding Large Language Models In Embodied Environment With Imperfect World Models", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.02664)]

* **EVA**: "EVA: An Embodied World Model for Future Video Anticipation", **`arxiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.15461)] [[Website](https://sites.google.com/view/eva-publi)] 

* **PreLAR**: "PreLAR: World Model Pre-training with Learnable Action Representation", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03363.pdf)] [[Code](https://github.com/zhanglixuan0720/PreLAR)]

* **WMP**: "World Model-based Perception for Visual Legged Locomotion", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.16784)] [[Project](https://wmp-loco.github.io/)]

* **R-AIF**: "R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.14216)]

* "Representing Positional Information in Generative World Models for Object Manipulation" **`arXiv 2024.09`** [[Paper](https://arxiv.org/abs/2409.12005)]

* **DexSim2Real$^2$**: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.08750)]

* **DWL**: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning", **`RSS 2024 (Best Paper Award Finalist)`**. [[Paper](https://arxiv.org/abs/2408.14472)]

* "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.10788)] [[Website](https://embodied-gaussians.github.io/)]

* **HRSSM**: "Learning Latent Dynamic Robust Representations for World Models", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2405.06263)] [[Code](https://github.com/bit1029public/HRSSM)]

* **RoboDreamer**: "RoboDreamer: Learning Compositional World Models for Robot Imagination", **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2404.12377)] [[Code](https://robovideo.github.io/)]

* **COMBO**: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2404.10775)] [[Website](https://vis-www.cs.umass.edu/combo/)] [[Code](https://github.com/UMass-Foundation-Model/COMBO)]

* **3D-VLA**: "3D-VLA: A 3D Vision-Language-Action Generative World Model",  **`ICML 2024`**. [[Paper](https://arxiv.org/abs/2403.09631)]

* **ManiGaussian**: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", **`arXiv 2024.03`**.  [[Paper](https://arxiv.org/abs/2403.08321)] [[Code](https://guanxinglu.github.io/ManiGaussian/)]

---

## World Models for Autonomous Driving

### Refer to https://github.com/LMD0311/Awesome-World-Model

* "Learning to Drive from a World Model", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.19077)]

* **DriVerse**: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.18576)] 

* "End-to-End Driving with Online Trajectory Evaluation via BEV World Model", **`arXiv 2025.04`**. [[Paper](https://arxiv.org/abs/2504.01941)] [[Code](https://github.com/liyingyanUCAS/WoTE)] 

* "Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.21232)]

* **MiLA**: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.15875)] [[Project Page](https://github.com/xiaomi-mlab/mila.github.io)] 

* **SimWorld**: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13952)] [[Project Page](https://github.com/Li-Zn-H/SimWorld)] 

* **UniFuture**: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.13587)] [[Project Page](https://github.com/dk-liang/UniFuture)] 

* **EOT-WM**: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.09215)]

* "Temporal Triplane Transformers as Occupancy World Models", **`arXiv 2025.03`**. [[Paper](https://arxiv.org/abs/2503.07338)]

* **InDRiVE**: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2503.05573)]

* **MaskGWM**: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.11663)]

* **Dream to Drive**: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models", **`arXiv 2025.02`**. [[Paper](https://arxiv.org/abs/2502.10012)]

* "Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2502.07309)]

* "Dream to Drive with Predictive Individual World Model", **`IEEE TIV`**. [[Paper](https://arxiv.org/abs/2501.16733)] [[Code](https://github.com/gaoyinfeng/PIWM)]

* **HERMES**: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.14729)] 

* **AdaWM**: "AdaWM: Adaptive World Model based Planning for Autonomous Driving", **`ICLR 2025`**. [[Paper](https://arxiv.org/abs/2501.13072)] 

* **AD-L-JEPA**: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data", **`arXiv 2025.01`**. [[Paper](https://arxiv.org/abs/2501.04969)]  

* **DrivingWorld**: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.19505)] [[Code](https://github.com/YvanYin/DrivingWorld)] [[Project Page](https://huxiaotaostasy.github.io/DrivingWorld/index.html)] 

* **DrivingGPT**: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.18607)] [[Project Page](https://rogerchern.github.io/DrivingGPT/)]

* "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.13772)]

* **GEM**: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.11198)] [[Project Page](https://vita-epfl.github.io/GEM.github.io/)]

* **GaussianWorld**: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.04380)] [[Code](https://github.com/zuosc19/GaussianWorld)]

* **Doe-1**: "Doe-1: Closed-Loop Autonomous Driving with Large World Model", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.09627)] [[Project Page](https://wzzheng.net/Doe/)] [[Code](https://github.com/wzzheng/Doe)]

* "Pysical Informed Driving World Model", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.08410)] [[Project Page](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html)]

* **InfiniCube**: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.03934)] [[Project Page](https://research.nvidia.com/labs/toronto-ai/infinicube/)]

* **InfinityDrive**: "InfinityDrive: Breaking Time Limits in Driving World Models", **`arXiv 2024.12`**. [[Paper](https://arxiv.org/abs/2412.01522)] [[Project Page](https://metadrivescape.github.io/papers_project/InfinityDrive/page.html)]

* **ReconDreamer**: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration", **`arXiv 2024.11`**. [[Paper](https://arxiv.org/abs/2411.19548)] [[Project Page](https://recondreamer.github.io/)]

* **Imagine-2-Drive**: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles", **`ICRA 2025`**. [[Paper](https://arxiv.org/abs/2411.10171)] [[Project Page](https://anantagrg.github.io/Imagine-2-Drive.github.io/)]

* **DriveDreamer4D**: "World Models Are Effective Data Machines for 4D Driving Scene Representation", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.13571)] [[Project Page](https://drivedreamer4d.github.io/)]

* **DOME**: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model", **`arXiv 2024.10`**. [[Paper](https://arxiv.org/abs/2410.10429)] [[Project Page](https://gusongen.github.io/DOME)]

* **SSR**: "Does End-to-End Autonomous Driving Really Need Perception Tasks?", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.18341)] [[Code](https://github.com/PeidongLi/SSR)]

* "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.16663)]

* **LatentDriver**: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.15730)] [[Code](https://github.com/Sephirex-X/LatentDriver)]

* **RenderWorld**: "World Model with Self-Supervised 3D Label", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.11356)]

* **OccLLaMA**: "An Occupancy-Language-Action Generative World Model for Autonomous Driving", **`arXiv 2024.09`**. [[Paper](https://arxiv.org/abs/2409.03272)]

* **DriveGenVLM**: "Real-world Video Generation for Vision Language Model based Autonomous Driving", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.16647)]

* **Drive-OccWorld**: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving", **`arXiv 2024.08`**. [[Paper](https://arxiv.org/abs/2408.14197)]

* **CarFormer**: "Self-Driving with Learned Object-Centric Representations", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2407.15843)] [[Code](https://kuis-ai.github.io/CarFormer/)]

* **BEVWorld**: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.05679)] [[Code](https://github.com/zympsyche/BevWorld)]

* **TOKEN**: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving", **`arXiv 2024.07`**. [[Paper](https://arxiv.org/abs/2407.00959)]

* **UMAD**: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.06370)]

* **SimGen**: "Simulator-conditioned Driving Scene Generation", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.09386)] [[Code](https://metadriverse.github.io/simgen/)]

* **AdaptiveDriver**: "Planning with Adaptive World Models for Autonomous Driving", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.10714)] [[Code](https://arunbalajeev.github.io/world_models_planning/world_model_paper.html)]

* **UnO**: "Unsupervised Occupancy Fields for Perception and Forecasting", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2406.08691)] [[Code](https://waabi.ai/research/uno)]

* **LAW**: "Enhancing End-to-End Autonomous Driving with Latent World Model", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.08481)] [[Code](https://github.com/BraveGroup/LAW)]

* **Delphi**: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation", **`arXiv 2024.06`**. [[Paper](https://arxiv.org/abs/2406.01349)] [[Code](https://github.com/westlake-autolab/Delphi)]

* **OccSora**: "4D Occupancy Generation Models as World Simulators for Autonomous Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.20337)] [[Code](https://github.com/wzzheng/OccSora)]

* **MagicDrive3D**: "Controllable 3D Generation for Any-View Rendering in Street Scenes", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.14475)] [[Code](https://gaoruiyuan.com/magicdrive3d/)]

* **Vista**: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability", **`NeurIPS 2024`**. [[Paper](https://arxiv.org/abs/2405.17398)] [[Code](https://github.com/OpenDriveLab/Vista)]

* **CarDreamer**: "Open-Source Learning Platform for World Model based Autonomous Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.09111)] [[Code](https://github.com/ucd-dare/CarDreamer)]

* **DriveSim**: "Probing Multimodal LLMs as World Models for Driving", **`arXiv 2024.05`**. [[Paper](https://arxiv.org/abs/2405.05956)] [[Code](https://github.com/sreeramsa/DriveSim)]

* **DriveWorld**: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2405.04390)]

* **LidarDM**: "Generative LiDAR Simulation in a Generated World", **`arXiv 2024.04`**. [[Paper](https://arxiv.org/abs/2404.02903)] [[Code](https://github.com/vzyrianov/lidardm)]

* **SubjectDrive**: "Scaling Generative Data in Autonomous Driving via Subject Control", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.19438)] [[Project](https://subjectdrive.github.io/)]

* **DriveDreamer-2**: "LLM-Enhanced World Models for Diverse Driving Video Generation", **`arXiv 2024.03`**. [[Paper](https://arxiv.org/abs/2403.06845)] [[Code](https://drivedreamer2.github.io/)]

* **Think2Drive**: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2402.16720)]

* **MARL-CCE**: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05085.pdf)] [[Code](https://github.com/qiaoguanren/MARL-CCE)]

* **GenAD**: "Generalized Predictive Model for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2403.09630)] [[Data](https://github.com/OpenDriveLab/DriveAGI?tab=readme-ov-file#genad-dataset-opendv-youtube)]

* **GenAD**: "Generative End-to-End Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2402.11502)] [[Code](https://github.com/wzzheng/GenAD)]

* **NeMo**: "Neural Volumetric World Models for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02571.pdf)]

* **MARL-CCE**: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model", **`ECCV 2024`**. [[Code](https://github.com/qiaoguanren/MARL-CCE)]

* **ViDAR**: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2312.17655)] [[Code](https://github.com/OpenDriveLab/ViDAR)]

* **Drive-WM**: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.17918)] [[Code](https://github.com/BraveGroup/Drive-WM)]

* **Cam4DOCC**: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.17663)] [[Code](https://github.com/haomo-ai/Cam4DOcc)]

* **Panacea**: "Panoramic and Controllable Video Generation for Autonomous Driving", **`CVPR 2024`**. [[Paper](https://arxiv.org/abs/2311.16813)] [[Code](https://panacea-ad.github.io/)]

* **OccWorld**: "Learning a 3D Occupancy World Model for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2311.16038)] [[Code](https://github.com/wzzheng/OccWorld)]

* **Copilot4D**: "Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion", **`ICLR 2024`**. [[Paper](https://arxiv.org/abs/2311.01017)]

* **DrivingDiffusion**: "Layout-Guided multi-view driving scene video generation with latent diffusion model", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2310.07771)] [[Code](https://github.com/shalfun/DrivingDiffusion)]

* **SafeDreamer**: "Safe Reinforcement Learning with World Models", **`ICLR 2024`**. [[Paper](https://openreview.net/forum?id=tsE5HLYtYg)] [[Code](https://github.com/PKU-Alignment/SafeDreamer)]

* **MagicDrive**: "Street View Generation with Diverse 3D Geometry Control", **`ICLR 2024`**. [[Paper](https://arxiv.org/abs/2310.02601)] [[Code](https://github.com/cure-lab/MagicDrive)]

* **DriveDreamer**: "Towards Real-world-driven World Models for Autonomous Driving", **`ECCV 2024`**. [[Paper](https://arxiv.org/abs/2309.09777)] [[Code](https://github.com/JeffWang987/DriveDreamer)]

* **SEM2**: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model", **`TITS`**. [[Paper](https://ieeexplore.ieee.org/abstract/document/10538211/)]

----

## Citation

If you find this repository useful, please consider citing this list:

```

@misc{leo2024worldmodelspaperslist,

    title = {Awesome-World-Models},

    author = {Leo Fan},

    journal = {GitHub repository},

    url = {https://github.com/leofan90/Awesome-World-Models},

    year = {2024},

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/leofan90/awesome-world-models

Awesome Lists containing this project

README