https://github.com/knightnemo/Awesome-World-Models
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
https://github.com/knightnemo/Awesome-World-Models
List: Awesome-World-Models
Last synced: 7 months ago
JSON representation
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
- Host: GitHub
- URL: https://github.com/knightnemo/Awesome-World-Models
- Owner: knightnemo
- License: bsd-3-clause
- Created: 2025-10-31T14:38:53.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-23T15:55:10.000Z (7 months ago)
- Last Synced: 2025-11-23T17:29:56.036Z (7 months ago)
- Homepage:
- Size: 2.24 MB
- Stars: 849
- Watchers: 13
- Forks: 25
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
Awesome Lists containing this project
- ultimate-awesome - Awesome-World-Models - A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling. (Other Lists / TeX Lists)
- awesome-physical-ai - [GitHub
- awesome-embodied-vla-va-vln - [repo
- awesome-code-agents - Awesome-World-Models - World-Models?label=stars)](https://github.com/knightnemo/Awesome-World-Models/stargazers) <a href="https://img.shields.io/github/last-commit/knightnemo/Awesome-World-Models?color=green"><img src="https://img.shields.io/github/last-commit/knightnemo/Awesome-World-Models?color=green" alt="Last Commit"></a> (🙏 Acknowledgements / 🧪 Frontier Labs and Teams)
- awesome-latent-refinement - Awesome World Models
- Awesome-AI4DigitalPathology - Awesome World Models
README
# 🌍 Awesome World Models
[](https://github.com/sindresorhus/awesome) [](https://github.com/knightnemo/Awesome-World-Models/stargazers) [](LICENSE.txt) [](CONTRIBUTING.md)
**📜 A Curated List of Amazing Works in World Modeling, spanning applications in Embodied AI, Autonomous Driving, Natural Language Processing and Agents.**
*Based on [Awesome-World-Model-for-Autonomous-Driving](https://github.com/LMD0311/Awesome-World-Model) and [Awesome-World-Model-for-Robotics](https://github.com/leofan90/Awesome-World-Models)*.
*Photo Credit: [Gemini-Nano-Banana🍌](https://aistudio.google.com/models/gemini-2-5-flash-image)*.
---
## 🚩 News & Updates
_Major updates and announcements are shown below. Scroll for full timeline._
🗺️ **[2025-10] Enhanced Visual Navigation** — Introduced badge system for papers! All entries now display [](#) [](#) [](#) for quick access to resources.
🔥 **[2025-10] Repository Launch** — Awesome World Models is now live! We're building a comprehensive collection spanning Embodied AI, Autonomous Driving, NLP, and more. See [CONTRIBUTING.md](CONTRIBUTING.md) for how to contribute.
💡 **[Ongoing] Community Contributions Welcome** — Help us maintain the most up-to-date world models resource! Submit papers via PR or contact us at [email](mailto:siqiaohuang981@gmail.com).
⭐ **[Ongoing] Support This Project** — If you find this useful, please [cite](#citation) our work and give us a star. Share with your research community!
---
## Overview
- 🎯 [Aim of the project](#aim-of-the-project)
- 📚 [Definition of World Models](#definition-of-world-models)
- 📖 [Surveys of World Models](#surveys-of-world-models)
- 🎮 [World Models for Game Simulation](#world-models-for-game-simulation)
- 🚗 [World Models for Autonomous Driving](#world-models-for-autonomous-driving)
- 🤖 [World Models for Embodied AI](#world-models-for-embodied-ai)
- 🔬 [World Models for Science](#world-models-for-science)
- 💭 [Positions on World Models](#positions-on-world-models)
- 📐 [Theory & World Models Explainability](#theory--world-models-explainability)
- 🛠️ [General Approaches to World Models](#general-approaches-to-world-models)
- 📊 [Evaluating World Models](#evaluating-world-models)
- 🙏 [Acknowledgements](#acknowledgements)
- 📝 [Citation](#citation)
---
## Aim of the Project
World Models have become a hot topic in both research and industry, attracting unprecedented attention from the AI community and beyond. However, due to the **interdisciplinary nature** of the field (_and because the term "world model" simply sounds amazing_), the concept has been used with varying definitions across different domains.
This repository aims to:
- 🔍 **Organize** the rapidly growing body of world model research across multiple application domains
- 🗺️ **Provide** a minimalist map of how world models are utilized in different fields (Embodied AI, Autonomous Driving, NLP, etc.)
- 🤝 **Bridge** the gap between different communities working on world models with varying perspectives
- 📚 **Serve** as a one-stop resource for researchers, practitioners, and enthusiasts interested in world modeling
- 🚀 **Track** the latest developments and breakthroughs in this exciting field
Whether you're a researcher looking for related work, a practitioner seeking implementation references, or simply curious about world models, we hope this curated list helps you navigate the landscape!
---
## Definition of World Models
While world models' outreach has been expanded again and again, it is widely adopted that the original sources of world models come from these two papers:
* [⭐️] **World Models**, World Models. [](https://arxiv.org/abs/1803.10122) [](https://worldmodels.github.io/)
* [⭐️] **Yann Lecun's Speech**, "A Path Towards Autonomous Machine Intelligence". [](https://openreview.net/pdf?id=BZ5a1r-kVsf)
Some other great blogposts on world models include:
- [⭐️] **Towards Video World Models**, "Towards Video World Models". [](https://www.xunhuang.me/blogs/world_model.html)
- **Status of World Models in 2025**, "Beyond the Hype: How I See World Models Evolving in 2025". [](https://knightnemo.github.io/blog/posts/wm_2025/)
- [⭐️] **Jim Fan's tweet**. [](https://x.com/DrJimFan/status/1709947595525951787)
---
## Surveys of World Models
### 1. World Models and Video Generation:
- [⭐️] **Is Sora a World Simulator**, "Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond". [](https://arxiv.org/abs/2405.03520) [](https://github.com/GigaAI-research/General-World-Models-Survey)
- **Physics Cognition in Video Generation**, "Exploring the Evolution of Physics Cognition in Video Generation: A Survey". [](https://arxiv.org/abs/2503.21765) [](https://github.com/minnie-lin/Awesome-Physics-Cognition-based-Video-Generation)
### 2. World Models and 3D Generation:
- [⭐️] **3D and 4D World Modeling: A Survey**, "3D and 4D World Modeling: A Survey". [](https://arxiv.org/abs/2509.07996)
- [⭐️] **Understanding World or Predicting Future?**, "Understanding World or Predicting Future? A Comprehensive Survey of World Models". [](https://arxiv.org/abs/2411.14499)
- **From 2D to 3D Cognition**, "From 2D to 3D Cognition: A Brief Survey of General World Models". [](https://arxiv.org/abs/2506.20134)
### 3. World Models and Embodied Artificial Intelligence:
- [⭐️] **World Models for Embodied AI**, "A Comprehensive Survey on World Models for Embodied AI". [](https://arxiv.org/abs/2510.16732) [](https://github.com/Li-Zn-H/AwesomeWorldModels)
- **World Models and Physical Simulation**, "A Survey: Learning Embodied Intelligence from Physical Simulators and World Models". [](https://arxiv.org/abs/2507.00917) [](https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey)
- **Embodied AI Agents: Modeling the World**, "Embodied AI Agents: Modeling the World". [](https://arxiv.org/abs/2506.22355)
- **Aligning Cyber Space with Physical World**, "Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI". [](https://arxiv.org/abs/2407.06886) [](https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List)
### 4. World Models for Autonomous Driving:
- [⭐️] **A Survey of World Models for Autonomous Driving**, "A Survey of World Models for Autonomous Driving". [](https://arxiv.org/abs/2501.11260)
- **World Models for Autonomous Driving: An Initial Survey**, "World Models for Autonomous Driving: An Initial Survey". [](https://arxiv.org/abs/2403.02622)
- **Interplay Between Video Generation and World Models in Autonomous Driving**, "Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey". [](https://arxiv.org/abs/2411.02914)
### 5. Other Good Surveys:
- **From Masks to Worlds**, "From Masks to Worlds: A Hitchhiker's Guide to World Models". [](https://arxiv.org/abs/2510.20668) [](https://github.com/M-E-AGI-Lab/Awesome-World-Models)
- **The Safety Challenge of World Models**, "The Safety Challenge of World Models for Embodied AI Agents: A Review". [](https://arxiv.org/abs/2510.05865)
- **World Models in AI: Like a Child**, "World Models in Artificial Intelligence: Sensing, Learning, and Reasoning Like a Child". [](https://arxiv.org/abs/2503.15168)
- **World Model Safety**, "World Models: The Safety Perspective". [](https://arxiv.org/abs/2411.07690)
- **Model-based reinforcement learning**: "A survey on model-based reinforcement learning". [](https://link.springer.com/article/10.1007/s11432-022-3696-5)
---
## World Models for Game Simulation
Pixel Space:
- [⭐️] **GameNGen**, "Diffusion Models Are Real-Time Game Engines". [](https://arxiv.org/abs/2408.14837)
- [⭐️] **DIAMOND**, "Diffusion for World Modeling: Visual Details Matter in Atari". [](https://arxiv.org/abs/2405.12399) [](https://github.com/eloialonso/diamond)
- **MineWorld**, "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft". [](https://arxiv.org/abs/2504.07257) [](https://aka.ms/mineworld)
- **Oasis**, "Oasis: A Universe in a Transformer". [](https://oasis-model.github.io/)
- **AnimeGamer**, "AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction". [](http://arxiv.org/abs/2504.01014)[](https://howe125.github.io/AnimeGamer.github.io/)
- [⭐️] **Matrix-Game**, "Matrix-Game: Interactive World Foundation Model." [](https://arxiv.org/abs/2506.18701) [](https://github.com/SkyworkAI/Matrix-Game)
- [⭐️] **Matrix-Game 2.0**, Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model. [](https://arxiv.org/abs/2508.13009) [](https://matrix-game-v2.github.io/)
- **RealPlay**, "From Virtual Games to Real-World Play". [](https://arxiv.org/abs/2506.18901) [](https://wenqsun.github.io/RealPlay/) [](https://github.com/wenqsun/Real-Play)
- **GameFactory**, "GameFactory: Creating New Games with Generative Interactive Videos". [](http://arxiv.org/abs/2501.08325) [](https://yujiwen.github.io/gamefactory/) [](https://github.com/KwaiVGI/GameFactory)
- **WORLDMEM**, "Worldmem: Long-term Consistent World Simulation with Memory". [](http://arxiv.org/abs/2504.12369) [](https://xizaoqu.github.io/worldmem/) [](https://github.com/xizaoqu/WorldMem)
3D Mesh Space:
- [⭐️] **HunyuanWorld 1.0**, HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. [](https://arxiv.org/abs/2507.21809) [](https://3d-models.hunyuan.tencent.com/world/) [](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0)
- [⭐️] **Matrix-3D**, Matrix-3D: Omnidirectional Explorable 3D World Generation. [](https://arxiv.org/abs/2508.08086) [](https://matrix-3d.github.io)
---
## World Models for Autonomous Driving
_Refer to https://github.com/LMD0311/Awesome-World-Model for full list._
> [!NOTE]
> 📢 [Call for Maintenance] The repo creator is no expert of autonomous driving, so this is a more-than-concise list of works without classification. We anticipate community effort on turning this section cleaner and more well-sorted.
- [⭐️] **Cosmos-Drive-Dreams**, "Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models". [](https://arxiv.org/abs/2506.09042) [](https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams)
- [⭐️] **GAIA-2**, "GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving". [](https://arxiv.org/abs/2503.20523) [](https://wayve.ai/thinking/gaia-2)
- **Copilot4D**, "Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion". [](https://arxiv.org/abs/2311.01017)
- **OmniNWM**: "OmniNWM: Omniscient Driving Navigation World Models". [](https://arxiv.org/abs/2510.18313) [](https://arlo0o.github.io/OmniNWM/)
- **GAIA-1**, "Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy". [](https://arxiv.org/abs/2309.17080) [](https://wayve.ai/thinking/introducing-gaia1/)
* **PWM**, "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction". [](https://arxiv.org/abs/2510.19654) [](https://github.com/6550Zhao/Policy-World-Model)
* **Dream4Drive**, "Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks". [](https://arxiv.org/abs/2510.19195) [](https://wm-research.github.io/Dream4Drive/)
* **SparseWorld**, "SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries". [](https://arxiv.org/abs/2510.17482) [](https://github.com/MSunDYY/SparseWorld)
* **DriveVLA-W0**: "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving". [](https://arxiv.org/abs/2510.12796) [](https://github.com/BraveGroup/DriveVLA-W0)
* "Enhancing Physical Consistency in Lightweight World Models". [](https://arxiv.org/abs/2509.12437)
* **IRL-VLA**: "IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model". [](https://arxiv.org/abs/2508.06571) [](https://lidarcrafter.github.io) [](https://github.com/lidarcrafter/toolkit)
* **LiDARCrafter**: "LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences". [](https://arxiv.org/abs/2508.03692) [](https://lidarcrafter.github.io) [](https://github.com/lidarcrafter/toolkit)
* **FASTopoWM**: "FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models". [](https://arxiv.org/abs/2507.23325) [](https://github.com/YimingYang23/FASTopoWM)
* **Orbis**: "Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models". [](https://arxiv.org/abs/2507.13162) [](https://lmb-freiburg.github.io/orbis.github.io/)
* "World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving". [](https://arxiv.org/abs/2507.12762)
* **NRSeg**: "NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models" [](https://arxiv.org/abs/2507.04002) [](https://github.com/lynn-yu/NRSeg)
* **World4Drive**: "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model". [](https://arxiv.org/abs/2507.00603) [](https://github.com/ucaszyp/World4Drive)
* **Epona**: "Epona: Autoregressive Diffusion World Model for Autonomous Driving". [](https://arxiv.org/abs/2506.24113) [](https://kevin-thu.github.io/Epona/)
* "Towards foundational LiDAR world models with efficient latent flow matching". [](https://arxiv.org/abs/2506.23434)
* **SceneDiffuser++**: "SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model". [](https://arxiv.org/abs/2506.21976)
* **COME**: "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model" [](https://arxiv.org/abs/2506.13260) [](https://github.com/synsin0/COME)
* **STAGE**: "STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation". [](https://arxiv.org/abs/2506.13138)
* **ReSim**: "ReSim: Reliable World Simulation for Autonomous Driving". [](https://arxiv.org/abs/2506.09981) [](https://github.com/OpenDriveLab/ReSim) [](https://opendrivelab.com/ReSim)
* "Ego-centric Learning of Communicative World Models for Autonomous Driving". [](https://arxiv.org/abs/2506.08149)
* **Dreamland**: "Dreamland: Controllable World Creation with Simulator and Generative Models". [](https://arxiv.org/abs/2506.08006) [](https://metadriverse.github.io/dreamland/)
* **LongDWM**: "LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model". [](https://arxiv.org/abs/2506.01546) [](https://wang-xiaodong1899.github.io/longdwm/)
* **GeoDrive**: "GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control". [](https://arxiv.org/abs/2505.22421) [](https://github.com/antonioo-c/GeoDrive)
* **FutureSightDrive**: "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving". [](https://arxiv.org/abs/2505.17685) [](https://github.com/MIV-XJTU/FSDrive)
* **Raw2Drive**: "Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)". [](https://arxiv.org/abs/2505.16394)
* **VL-SAFE**: "VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving". [](https://arxiv.org/abs/2505.16377) [](https://ys-qu.github.io/vlsafe-website/)
* **PosePilot**: "PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth". [](https://arxiv.org/abs/2505.01729)
* "World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks". [](https://arxiv.org/abs/2505.01712)
* "Learning to Drive from a World Model". [](https://arxiv.org/abs/2504.19077)
* **DriVerse**: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment". [](https://arxiv.org/abs/2504.18576)
* "End-to-End Driving with Online Trajectory Evaluation via BEV World Model". [](https://arxiv.org/abs/2504.01941) [](https://github.com/liyingyanUCAS/WoTE)
* "Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles". [](https://arxiv.org/abs/2503.21232)
* **MiLA**: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving". [](https://arxiv.org/abs/2503.15875) [](https://github.com/xiaomi-mlab/mila.github.io)
* **SimWorld**: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model". [](https://arxiv.org/abs/2503.13952) [](https://github.com/Li-Zn-H/SimWorld)
* **UniFuture**: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception". [](https://arxiv.org/abs/2503.13587) [](https://github.com/dk-liang/UniFuture)
* **EOT-WM**: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space". [](https://arxiv.org/abs/2503.09215)
* "Temporal Triplane Transformers as Occupancy World Models". [](https://arxiv.org/abs/2503.07338)
* **InDRiVE**: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model". [](https://arxiv.org/abs/2503.05573)
* **MaskGWM**: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction". [](https://arxiv.org/abs/2502.11663)
* **Dream to Drive**: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models". [](https://arxiv.org/abs/2502.10012)
* "Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving". [](https://arxiv.org/abs/2502.07309)
* "Dream to Drive with Predictive Individual World Model". [](https://arxiv.org/abs/2501.16733) [](https://github.com/gaoyinfeng/PIWM)
* **HERMES**: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation". [](https://arxiv.org/abs/2501.14729)
* **AdaWM**: "AdaWM: Adaptive World Model based Planning for Autonomous Driving". [](https://arxiv.org/abs/2501.13072)
* **AD-L-JEPA**: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data". [](https://arxiv.org/abs/2501.04969)
* **DrivingWorld**: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT". [](https://arxiv.org/abs/2412.19505) [](https://github.com/YvanYin/DrivingWorld) [](https://huxiaotaostasy.github.io/DrivingWorld/index.html)
* **DrivingGPT**: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers". [](https://arxiv.org/abs/2412.18607) [](https://rogerchern.github.io/DrivingGPT/)
* "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training". [](https://arxiv.org/abs/2412.13772)
* **GEM**: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control". [](https://arxiv.org/abs/2412.11198) [](https://vita-epfl.github.io/GEM.github.io/)
* **GaussianWorld**: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction". [](https://arxiv.org/abs/2412.04380) [](https://github.com/zuosc19/GaussianWorld)
* **Doe-1**: "Doe-1: Closed-Loop Autonomous Driving with Large World Model". [](https://arxiv.org/abs/2412.09627) [](https://wzzheng.net/Doe/) [](https://github.com/wzzheng/Doe)
* "Physical Informed Driving World Model". [](https://arxiv.org/abs/2412.08410) [](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html)
* **InfiniCube**: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models". [](https://arxiv.org/abs/2412.03934) [](https://research.nvidia.com/labs/toronto-ai/infinicube/)
* **InfinityDrive**: "InfinityDrive: Breaking Time Limits in Driving World Models". [](https://arxiv.org/abs/2412.01522) [](https://metadrivescape.github.io/papers_project/InfinityDrive/page.html)
* **ReconDreamer**: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration". [](https://arxiv.org/abs/2411.19548) [](https://recondreamer.github.io/)
* **Imagine-2-Drive**: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles". [](https://arxiv.org/abs/2411.10171) [](https://anantagrg.github.io/Imagine-2-Drive.github.io/)
* **DynamicCity**: "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes". [](https://arxiv.org/abs/2410.18084) [](https://dynamic-city.github.io) [](https://github.com/3DTopia/DynamicCity)
* **DriveDreamer4D**: "World Models Are Effective Data Machines for 4D Driving Scene Representation". [](https://arxiv.org/abs/2410.13571) [](https://drivedreamer4d.github.io/)
* **DOME**: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model". [](https://arxiv.org/abs/2410.10429) [](https://gusongen.github.io/DOME)
* **SSR**: "Does End-to-End Autonomous Driving Really Need Perception Tasks?". [](https://arxiv.org/abs/2409.18341) [](https://github.com/PeidongLi/SSR)
* "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models". [](https://arxiv.org/abs/2409.16663)
* **LatentDriver**: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving". [](https://arxiv.org/abs/2409.15730) [](https://github.com/Sephirex-X/LatentDriver)
* **RenderWorld**: "World Model with Self-Supervised 3D Label". [](https://arxiv.org/abs/2409.11356)
* **OccLLaMA**: "An Occupancy-Language-Action Generative World Model for Autonomous Driving". [](https://arxiv.org/abs/2409.03272)
* **DriveGenVLM**: "Real-world Video Generation for Vision Language Model based Autonomous Driving". [](https://arxiv.org/abs/2408.16647)
* **Drive-OccWorld**: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving". [](https://arxiv.org/abs/2408.14197)
* **CarFormer**: "Self-Driving with Learned Object-Centric Representations". [](https://arxiv.org/abs/2407.15843) [](https://kuis-ai.github.io/CarFormer/)
* **BEVWorld**: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space". [](https://arxiv.org/abs/2407.05679) [](https://github.com/zympsyche/BevWorld)
* **TOKEN**: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving". [](https://arxiv.org/abs/2407.00959)
* **UMAD**: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving". [](https://arxiv.org/abs/2406.06370)
* **SimGen**: "Simulator-conditioned Driving Scene Generation". [](https://arxiv.org/abs/2406.09386) [](https://metadriverse.github.io/simgen/)
* **AdaptiveDriver**: "Planning with Adaptive World Models for Autonomous Driving". [](https://arxiv.org/abs/2406.10714) [](https://arunbalajeev.github.io/world_models_planning/world_model_paper.html)
* **UnO**: "Unsupervised Occupancy Fields for Perception and Forecasting". [](https://arxiv.org/abs/2406.08691) [](https://waabi.ai/research/uno)
* **LAW**: "Enhancing End-to-End Autonomous Driving with Latent World Model". [](https://arxiv.org/abs/2406.08481) [](https://github.com/BraveGroup/LAW)
* **Delphi**: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation". [](https://arxiv.org/abs/2406.01349) [](https://github.com/westlake-autolab/Delphi)
* **OccSora**: "4D Occupancy Generation Models as World Simulators for Autonomous Driving". [](https://arxiv.org/abs/2405.20337) [](https://github.com/wzzheng/OccSora)
* **MagicDrive3D**: "Controllable 3D Generation for Any-View Rendering in Street Scenes". [](https://arxiv.org/abs/2405.14475) [](https://gaoruiyuan.com/magicdrive3d/)
* **Vista**: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability". [](https://arxiv.org/abs/2405.17398) [](https://github.com/OpenDriveLab/Vista)
* **CarDreamer**: "Open-Source Learning Platform for World Model based Autonomous Driving". [](https://arxiv.org/abs/2405.09111) [](https://github.com/ucd-dare/CarDreamer)
* **DriveSim**: "Probing Multimodal LLMs as World Models for Driving". [](https://arxiv.org/abs/2405.05956) [](https://github.com/sreeramsa/DriveSim)
* **DriveWorld**: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving". [](https://arxiv.org/abs/2405.04390)
* **LidarDM**: "Generative LiDAR Simulation in a Generated World". [](https://arxiv.org/abs/2404.02903) [](https://github.com/vzyrianov/lidardm)
* **SubjectDrive**: "Scaling Generative Data in Autonomous Driving via Subject Control". [](https://arxiv.org/abs/2403.19438) [](https://subjectdrive.github.io/)
* **DriveDreamer-2**: "LLM-Enhanced World Models for Diverse Driving Video Generation". [](https://arxiv.org/abs/2403.06845) [](https://drivedreamer2.github.io/)
* **Think2Drive**: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving". [](https://arxiv.org/abs/2402.16720)
* **MARL-CCE**: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model". [](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05085.pdf) [](https://github.com/qiaoguanren/MARL-CCE)
* **GenAD**: "Generalized Predictive Model for Autonomous Driving". [](https://arxiv.org/abs/2403.09630) [](https://github.com/OpenDriveLab/DriveAGI?tab=readme-ov-file#genad-dataset-opendv-youtube)
* **GenAD**: "Generative End-to-End Autonomous Driving". [](https://arxiv.org/abs/2402.11502) [](https://github.com/wzzheng/GenAD)
* **NeMo**: "Neural Volumetric World Models for Autonomous Driving". [](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02571.pdf)
* **MARL-CCE**: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model". [](https://github.com/qiaoguanren/MARL-CCE)
* **ViDAR**: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving". [](https://arxiv.org/abs/2312.17655) [](https://github.com/OpenDriveLab/ViDAR)
* **Drive-WM**: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving". [](https://arxiv.org/abs/2311.17918) [](https://github.com/BraveGroup/Drive-WM)
* **Cam4DOCC**: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications". [](https://arxiv.org/abs/2311.17663) [](https://github.com/haomo-ai/Cam4DOcc)
* **Panacea**: "Panoramic and Controllable Video Generation for Autonomous Driving". [](https://arxiv.org/abs/2311.16813) [](https://panacea-ad.github.io/)
* **OccWorld**: "Learning a 3D Occupancy World Model for Autonomous Driving". [](https://arxiv.org/abs/2311.16038) [](https://github.com/wzzheng/OccWorld)
* **DrivingDiffusion**: "Layout-Guided multi-view driving scene video generation with latent diffusion model". [](https://arxiv.org/abs/2310.07771) [](https://github.com/shalfun/DrivingDiffusion)
* **SafeDreamer**: "Safe Reinforcement Learning with World Models". [](https://openreview.net/forum?id=tsE5HLYtYg) [](https://github.com/PKU-Alignment/SafeDreamer)
* **MagicDrive**: "Street View Generation with Diverse 3D Geometry Control". [](https://arxiv.org/abs/2310.02601) [](https://github.com/cure-lab/MagicDrive)
* **DriveDreamer**: "Towards Real-world-driven World Models for Autonomous Driving". [](https://arxiv.org/abs/2309.09777) [](https://github.com/JeffWang987/DriveDreamer)
* **SEM2**: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model". [](https://ieeexplore.ieee.org/abstract/document/10538211/)
* **COMPARATIVE STUDY OF WORLD MODELS**: "COMPARATIVE STUDY OF WORLD MODELS, NVAE- BASED HIERARCHICAL MODELS, AND NOISYNET- AUGMENTED MODELS IN CARRACING-V2". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **Knowledge Graphs as World Models**: "Knowledge Graphs as World Models for Material-Aware Obstacle Handling in Autonomous Vehicles". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **Uncertainty Modeling**: "Uncertainty Modeling in Autonomous Vehicle Trajectory Prediction: A Comprehensive Survey". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://worldmodelbench.github.io/)
* **Divide and Merge**: "Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **RDAR**: "RDAR: Reward-Driven Agent Relevance Estimation for Autonomous Driving". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
## World Models for Embodied AI
### 1. Foundation Embodied World Models
- [⭐️] **Genie Envisioner**: "Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation". [](https://arxiv.org/abs/2508.05635) [](https://genie-envisioner.github.io/)
- [⭐️] **WoW**, "WoW: Towards a World omniscient World model Through Embodied Interaction". [](https://arxiv.org/abs/2509.22642) [](https://wow-world-model.github.io) [](https://github.com/wow-world-model/wow-world-model)
- **UnifoLM-WMA-0**, "UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family". [](https://unigen-x.github.io/unifolm-world-model-action.github.io/) [](https://github.com/unitreerobotics/unifolm-world-model-action/tree/main)
- [⭐️] **iVideoGPT**, "iVideoGPT: Interactive VideoGPTs are Scalable World Models". [](https://arxiv.org/abs/2405.15223)[](https://thuml.github.io/iVideoGPT/)
* **Direct Robot Configuration Space Construction**: "Direct Robot Configuration Space Construction using Convolutional Encoder-Decoders". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **ViPRA**: "ViPRA: Video Prediction for Robot Actions". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **ROPES**: "ROPES: Robotic Pose Estimation via Score-based Causal Representation Learning". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
### 2. World Models for Manipulation
- [⭐️] **FLARE**, "FLARE: Robot Learning with Implicit World Modeling". [](http://arxiv.org/abs/2505.15659) [](https://research.nvidia.com/labs/gear/flare/)
- [⭐️] **Enerverse**, "EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation". [](http://arxiv.org/abs/2501.01895) [](https://sites.google.com/view/enerverse)
- [⭐️] **AgiBot-World**, "AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems". [](https://arxiv.org/abs/2503.06669) [](https://agibot-world.com/) [](https://github.com/OpenDriveLab/AgiBot-World)
- [⭐️] **DyWA**: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation" [](https://arxiv.org/abs/2503.16806) [](https://pku-epic.github.io/DyWA/)
- [⭐️] **TesserAct**, "TesserAct: Learning 4D Embodied World Models". [](https://arxiv.org/abs/2504.20995) [](https://tesseractworld.github.io/)
- [⭐️] **DreamGen**: "DreamGen: Unlocking Generalization in Robot Learning through Video World Models". [](https://arxiv.org/abs/2505.12705) [](https://github.com/nvidia/GR00T-dreams)
- [⭐️] **HiP**, "Compositional Foundation Models for Hierarchical Planning". [](http://arxiv.org/abs/2309.08587) [](https://hierarchical-planning-foundation-model.github.io/)
- **PAR**: "Physical Autoregressive Model for Robotic Manipulation without Action Pretraining". [](https://arxiv.org/abs/2508.09822) [](https://songzijian1999.github.io/PAR_ProjectPage/)
- **iMoWM**: "iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation". [](https://arxiv.org/abs/2510.07313) [](https://xingyoujun.github.io/imowm/)
- **WristWorld**: "WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation". [](https://arxiv.org/abs/2510.07313)
- "A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models". [](https://arxiv.org/abs/2510.02538)
- **EMMA**: "EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer". [](https://arxiv.org/abs/2509.22407)
- **PhysTwin**, "PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos". [](http://arxiv.org/abs/2503.17973) [](https://jianghanxiao.github.io/phystwin-web/) [](https://github.com/Jianghanxiao/PhysTwin)
- [⭐️] **KeyWorld**: "KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models". [](https://arxiv.org/abs/2509.21027)
- **World4RL**: "World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation". [](https://arxiv.org/abs/2509.19080)
- [⭐️] **SAMPO**: "SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models". [](https://arxiv.org/abs/2509.15536)
- **PhysicalAgent**: "PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models". [](https://arxiv.org/abs/2509.13903)
- "Empowering Multi-Robot Cooperation via Sequential World Models". [](https://arxiv.org/abs/2509.13095)
- [⭐️] "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning". [](https://arxiv.org/pdf/2508.20840) [](https://qiaosun22.github.io/PrimitiveWorld/)
- [⭐️] **GWM**: "GWM: Towards Scalable Gaussian World Models for Robotic Manipulation". [](https://arxiv.org/abs/2508.17600) [](https://gaussian-world-model.github.io/)
- [⭐️] **Flow-as-Action**, "Latent Policy Steering with Embodiment-Agnostic Pretrained World Models". [](https://arxiv.org/abs/2507.13340)
- **EmbodieDreamer**: "EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling". [](https://arxiv.org/pdf/2507.05198) [](https://embodiedreamer.github.io/)
- **RoboScape**: "RoboScape: Physics-informed Embodied World Model". [](https://arxiv.org/abs/2506.23135) [](https://github.com/tsinghua-fib-lab/RoboScape)
- **FWM**, "Factored World Models for Zero-Shot Generalization in Robotic Manipulation". [](http://arxiv.org/abs/2202.05333)
- [⭐️] **ParticleFormer**: "ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation". [](https://arxiv.org/abs/2506.23126) [](https://particleformer.github.io/)
- **ManiGaussian++**: "ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model". [](https://arxiv.org/abs/2506.19842) [](https://github.com/April-Yz/ManiGaussian_Bimanual)
- **ReOI**: "Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control". [](https://arxiv.org/abs/2506.16565)
- **GAF**: "GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation". [](https://arxiv.org/abs/2506.14135) [](http://chaiying1.github.io/GAF.github.io/project_page/)
- "Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins". [](https://arxiv.org/abs/2506.13761) [](https://prompting-with-the-future.github.io/)
- "Time-Aware World Model for Adaptive Prediction and Control". [](https://arxiv.org/abs/2506.08441)
- [⭐️] **3DFlowAction**: "3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model". [](https://arxiv.org/abs/2506.06199)
- [⭐️] **ORV**: "ORV: 4D Occupancy-centric Robot Video Generation". [](https://arxiv.org/abs/2506.03079) [](https://github.com/OrangeSodahub/ORV) [](https://orangesodahub.github.io/ORV/)
- [⭐️] **WoMAP**: "WoMAP: World Models For Embodied Open-Vocabulary Object Localization". [](https://arxiv.org/abs/2506.01600)
- "Sparse Imagination for Efficient Visual World Model Planning". [](https://arxiv.org/abs/2506.01392)
- [⭐️] **OSVI-WM**: "OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation". [](https://arxiv.org/abs/2505.20425)
- [⭐️] **LaDi-WM**: "LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation". [](https://arxiv.org/abs/2505.11528)
- **FlowDreamer**: "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation". [](https://arxiv.org/abs/2505.10075) [](https://sharinka0715.github.io/FlowDreamer/)
- **PIN-WM**: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation". [](https://arxiv.org/abs/2504.16693)
- **RoboMaster**, "Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control". [](http://arxiv.org/abs/2506.01943) [](https://fuxiao0719.github.io/projects/robomaster/) [](https://github.com/KwaiVGI/RoboMaster)
- **ManipDreamer**: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance". [](https://arxiv.org/abs/2504.16464)
- [⭐️] **AdaWorld**: "AdaWorld: Learning Adaptable World Models with Latent Actions" [](https://arxiv.org/abs/2503.18938) [](https://adaptable-world-model.github.io/)
- "Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks" [](https://arxiv.org/abs/2503.12531) [](https://mkturkcan.github.io/suturingmodels/)
- [⭐️] **EVA**: "EVA: An Embodied World Model for Future Video Anticipation". [](https://arxiv.org/abs/2410.15461) [](https://sites.google.com/view/eva-publi)
- "Representing Positional Information in Generative World Models for Object Manipulation". [](https://arxiv.org/abs/2409.12005)
- **DexSim2Real$^2$**: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation". [](https://arxiv.org/abs/2409.08750)
- "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics". [](https://arxiv.org/abs/2406.10788) [](https://embodied-gaussians.github.io/)
- [⭐️] **LUMOS**: "LUMOS: Language-Conditioned Imitation Learning with World Models". [](https://arxiv.org/abs/2503.10370) [](http://lumos.cs.uni-freiburg.de/)
- [⭐️] "Object-Centric World Model for Language-Guided Manipulation" [](https://arxiv.org/abs/2503.06170)
- [⭐️] **DEMO^3**: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning" [](https://arxiv.org/abs/2503.01837) [](https://adrialopezescoriza.github.io/demo3/)
- "Strengthening Generative Robot Policies through Predictive World Modeling". [](https://arxiv.org/abs/2502.00622) [](https://computationalrobotics.seas.harvard.edu/GPC)
- **RoboHorizon**: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation. [](https://arxiv.org/abs/2501.06605)
- **Dream to Manipulate**: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination". [](https://arxiv.org/abs/2412.14957) [](https://leobarcellona.github.io/DreamToManipulate/)
- [⭐️] **RoboDreamer**: "RoboDreamer: Learning Compositional World Models for Robot Imagination". [](https://arxiv.org/abs/2404.12377) [](https://robovideo.github.io/)
- **ManiGaussian**: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation". [](https://arxiv.org/abs/2403.08321) [](https://guanxinglu.github.io/ManiGaussian/)
- [⭐️] **WHALE**: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making". [](https://arxiv.org/abs/2411.05619)
- [⭐️] **VisualPredicator**: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning". [](https://arxiv.org/abs/2410.23156)
- [⭐️] "Multi-Task Interactive Robot Fleet Learning with Visual World Models". [](https://arxiv.org/abs/2410.22689) [](https://ut-austin-rpl.github.io/sirius-fleet/)
- **PIVOT-R**: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation". [](https://arxiv.org/pdf/2410.10394)
- **Video2Action**, "Grounding Video Models to Actions through Goal Conditioned Exploration". [](http://arxiv.org/abs/2411.07223) [](https://video-to-action.github.io/) [](https://github.com/video-to-action/video-to-action-release)
- **Diffuser**, "Planning with Diffusion for Flexible Behavior Synthesis". [](http://arxiv.org/abs/2205.09991)
- **Decision Diffuser**, "Is Conditional Generative Modeling all you need for Decision-Making?". [](http://arxiv.org/abs/2211.15657)
- **Potential Based Diffusion Motion Planning**, "Potential Based Diffusion Motion Planning". [](http://arxiv.org/abs/2407.06169)
* **GRIM**: "GRIM: Task-Oriented Grasping with Conditioning on Generative Examples". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **World4Omni**: "World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **In-Context Policy Iteration**: "In-Context Policy Iteration for Dynamic Manipulation". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **HDFlow**: "HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Robotic Assembly". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Mobile Manipulation with Active Inference**: "Mobile Manipulation with Active Inference for Long-Horizon Rearrangement Tasks". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
### 3. World Models for Navigation
- [⭐️] **NWM**, "Navigation World Models". [](https://arxiv.org/abs/2412.03572) [](https://www.amirbar.net/nwm/)
- [⭐️] **MindJourney**: "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning". [](https://arxiv.org/abs/2507.12508) [](https://umass-embodied-agi.github.io/MindJourney)
* **Test-Time Scaling**: "Test-Time Scaling with World Models for Spatial Reasoning". [](https://arxiv.org/abs/2507.12508) [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://umass-embodied-agi.github.io/MindJourney/)
* **Scaling Inference-Time Search**: "Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **FalconWing**: "FalconWing: An Ultra-Light Fixed-Wing Platform for Indoor Aerial Applications". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Foundation Models as World Models**: "Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Geosteering Through the Lens of Decision Transformers**: "Geosteering Through the Lens of Decision Transformers: Toward Embodied Sequence Decision-Making". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Latent Weight Diffusion**: "Latent Weight Diffusion: Generating reactive policies instead of trajectories". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Abstract Sim2Real**: "Abstract Sim2Real through Approximate Information States". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **FLAM**: "FLAM: Scaling Latent Action Models with Factorization". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
- **NavMorph**: "NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments". [](https://arxiv.org/abs/2506.23468) [](https://github.com/Feliciaxyao/NavMorph)
- **Unified World Models**: "Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation". [](https://arxiv.org/abs/2510.08713) [[code](https://github.com/F1y1113/UniWM)]
- **RECON**, "Rapid Exploration for Open-World Navigation with Latent Goal Models". [](http://arxiv.org/abs/2104.05859) [](https://sites.google.com/view/recon-robot)
- **WMNav**: "WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation". [](https://arxiv.org/abs/2503.02247) [](https://b0b8k1ng.github.io/WMNav/)
- **NaVi-WM**, "Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model". [](https://arxiv.org/abs/2510.23509) [](https://sites.google.com/view/NaviWM)
- **AIF**, "Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation". [](https://arxiv.org/abs/2510.23258)
- "Kinodynamic Motion Planning for Mobile Robot Navigation across Inconsistent World Models". [](https://arxiv.org/abs/2509.26339)
- "World Model Implanting for Test-time Adaptation of Embodied Agents". [](https://arxiv.org/abs/2509.03956)
- "Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation". [](https://arxiv.org/abs/2508.06990)
- [⭐️] **Persistent Embodied World Models**, "Learning 3D Persistent Embodied World Models". [](https://arxiv.org/abs/2505.05495)
- "Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation" [](https://arxiv.org/abs/2503.20425)
- **X-MOBILITY**: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling". [](https://arxiv.org/abs/2410.17491)
- **MWM**, "Masked World Models for Visual Control". [](http://arxiv.org/abs/2206.14244) [](https://sites.google.com/view/mwm-rl) [](https://github.com/younggyoseo/MWM)
### 4. World Models for Locomotion
Locomotion:
- [⭐️] **Ego-VCP**, "Ego-Vision World Model for Humanoid Contact Planning". [](https://arxiv.org/abs/2510.11682) [](https://ego-vcp.github.io/) [](https://github.com/HybridRobotics/Ego-VCP)
- [⭐️] **RWM-O**, "Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator". [](https://arxiv.org/abs/2504.16680)
- [⭐️] **DWL**: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning". [](https://arxiv.org/abs/2408.14472)
- **HRSSM**: "Learning Latent Dynamic Robust Representations for World Models". [](https://arxiv.org/abs/2405.06263) [](https://github.com/bit1029public/HRSSM)
- **WMP**: "World Model-based Perception for Visual Legged Locomotion". [](https://arxiv.org/abs/2409.16784) [](https://wmp-loco.github.io/)
- **TrajWorld**, "Trajectory World Models for Heterogeneous Environments". [](https://arxiv.org/abs/2502.01366) [](https://github.com/thuml/TrajWorld)
- **Puppeteer**: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers". [](https://arxiv.org/abs/2405.18418) [](https://nicklashansen.com/rlpuppeteer)
- **ProTerrain**: "ProTerrain: Probabilistic Physics-Informed Rough Terrain World Modeling". [](https://arxiv.org/abs/2510.19364)
- **Occupancy World Model**, "Occupancy World Model for Robots". [](https://arxiv.org/abs/2505.05512)
- [⭐️] "Accelerating Model-Based Reinforcement Learning with State-Space World Models". [](https://arxiv.org/abs/2502.20168)
- [⭐️] "Learning Humanoid Locomotion with World Model Reconstruction". [](https://arxiv.org/abs/2502.16230)
- [⭐️] **Robotic World Model**: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics. [](https://arxiv.org/abs/2501.10100)
Loco-Manipulation:
- [⭐️] **1X World Model**, 1X World Model. [](https://www.1x.tech/discover/1x-world-model)
- [⭐️] **GROOT-Dreams**, "Dream Come True — NVIDIA Isaac GR00T-Dreams Advances Robot Training With Synthetic Data and Neural Simulation". [](https://blogs.nvidia.com/blog/nvidia-gtc-washington-dc-2025-news/#gr00t-dreams)
- **Humanoid World Models**: "Humanoid World Models: Open World Foundation Models for Humanoid Robotics". [](https://arxiv.org/abs/2506.01182)
- **Ego-Agent**, "EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds". [](https://arxiv.org/abs/2502.05857)
- **D^2PO**, "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning" [](https://arxiv.org/abs/2503.10480)
- **COMBO**: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation. [](https://arxiv.org/abs/2404.10775) [](https://vis-www.cs.umass.edu/combo/) [](https://github.com/UMass-Foundation-Model/COMBO)
* **Scalable Humanoid Whole-Body Control**: "Scalable Humanoid Whole-Body Control via Differentiable Neural Network Dynamics". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **HuWo**: "HuWo: Building Physical Interaction World Models for Humanoid Robot Locomotion". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **Bridging the Sim-to-Real Gap**: "Bridging the Sim-to-Real Gap in Humanoid Dynamics via Learned Nonlinear Operators". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
### 5. World Models x VLAs
Unifying World Models and VLAs in one model:
- [⭐️] **CoT-VLA**: "CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models". [](https://arxiv.org/abs/2503.22020) [](https://cot-vla.github.io/)
- [⭐️] **UP-VLA**, "UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent". [](https://arxiv.org/abs/2501.18867) [](https://github.com/CladernyJorn/UP-VLA)
- [⭐️] **VPP**, "Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations". [](https://arxiv.org/abs/2412.14803) [](https://video-prediction-policy.github.io)
- [⭐️] **FLARE**: "FLARE: Robot Learning with Implicit World Modeling". [](https://arxiv.org/abs/2505.15659) [](https://github.com/NVIDIA/Isaac-GR00T) [](https://research.nvidia.com/labs/gear/flare)
- [⭐️] **MinD**: "MinD: Unified Visual Imagination and Control via Hierarchical World Models". [](https://arxiv.org/abs/2506.18897) [](https://manipulate-in-dream.github.io/)
- [⭐️] **DreamVLA**, "DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge". [](https://arxiv.org/abs/2507.04447) [](https://github.com/Zhangwenyao1/DreamVLA) [](https://zhangwenyao1.github.io/DreamVLA/)
- [⭐️] **WorldVLA**: "WorldVLA: Towards Autoregressive Action World Model". [](https://arxiv.org/abs/2506.21539) [](https://github.com/alibaba-damo-academy/WorldVLA)
- **3D-VLA**: "3D-VLA: A 3D Vision-Language-Action Generative World Model". [](https://arxiv.org/abs/2403.09631)
- **LAWM**: "Latent Action Pretraining Through World Modeling". [](https://arxiv.org/abs/2509.18428) [](https://github.com/baheytharwat/lawm)
- [⭐️] **UniVLA**: "UniVLA: Unified Vision-Language-Action Model". [](https://arxiv.org/abs/2506.19850) [](https://robertwyq.github.io/univla.github.)
- [⭐️] **dVLA**, "dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought". [](https://arxiv.org/abs/2509.25681)
- [⭐️] **Vidar**, "Vidar: Embodied Video Diffusion Model for Generalist Manipulation". [](https://arxiv.org/pdf/2507.12898)
- [⭐️] **UD-VLA**, "Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process". [](https://arxiv.org/abs/2511.01718) [](https://github.com/OpenHelix-Team/UD-VLA) [](https://irpn-eai.github.io/UD-VLA.github.io/)
- **Goal-VLA**: "Goal-VLA: Image-Generative VLMs as Object-Centric World Models Empowering Zero-shot Robot Manipulation". [](https://arxiv.org/abs/2506.23919) [](https://nus-lins-lab.github.io/goalvlaweb/)
Combining World Models and VLAs:
- [⭐️] **Ctrl-World**: "Ctrl-World: A Controllable Generative World Model for Robot Manipulation". [](https://arxiv.org/pdf/2510.10125) [](https://ctrl-world.github.io/) [](https://github.com/Robert-gyj/Ctrl-World)
- **VLA-RFT**: "VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators". [](https://arxiv.org/abs/2510.00406)
- **World-Env**: "World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training". [](https://arxiv.org/abs/2509.24948)
- [⭐️] **Self-Improving Embodied Foundation Models**, "Self-Improving Embodied Foundation Models". [](https://arxiv.org/abs/2509.15155)
- **GigaBrain-0**, GigaBrain-0: A World Model-Powered Vision-Language-Action Model. [](https://arxiv.org/abs/2510.19430) [](https://gigabrain0.github.io/)
* **NinA**: "NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Ada-Diffuser**: "Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Steering Diffusion Policies**: "Steering Diffusion Policies with Value-Guided Denoising". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **SPUR**: "SPUR: Scaling Reward Learning from Human Demonstrations". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **A Smooth Sea Never Made a Skilled SAILOR**: "A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **RADI**: "RADI: LLMs as World Models for Robotic Action Decomposition and Imagination". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
- **WMPO**: "WMPO: World Model-based Policy Optimization for Vision-Language-Action Models". [](https://arxiv.org/abs/2511.09515) [](https://wm-po.github.io)
### 6. World Models x Policy Learning
This subsection focuses on general policy learning methods in embodied intelligence via leveraging world models.
- [⭐️] **UWM**, "Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets". [](https://arxiv.org/abs/2504.02792) [](https://weirdlabuw.github.io/uwm/)
- [⭐️] **UVA**, Unified Video Action Model. [](https://arxiv.org/abs/2503.00200) [](https://unified-video-action-model.github.io/) [](https://github.com/ShuangLI59/unified_video_action)
- **DiWA**, "DiWA: Diffusion Policy Adaptation with World Models". [](https://arxiv.org/abs/2508.03645) [](https://diwa.cs.uni-freiburg.de)
- [⭐️] **Dreamerv4**, "Training Agents Inside of Scalable World Models". [](https://arxiv.org/abs/2509.24527) [](https://danijar.com/project/dreamer4/)
* **Latent Action Learning Requires Supervision**: "Latent Action Learning Requires Supervision in the Presence of Distractors". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **Beyond Experience**: "Beyond Experience: Fictive Learning as an Inherent Advantage of World Models". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Robotic World Model**: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Sim-to-Real Contact-Rich Pivoting**: "Sim-to-Real Contact-Rich Pivoting via Optimization-Guided RL with Vision and Touch". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
* **Hierarchical Task Environments**: "Hierarchical Task Environments as the Next Frontier for Embodied World Models in Robot Soccer". [](https://openreview.net/group?id=NeurIPS.cc/2025/Workshop/EWM#tab-accept-oral) [](https://embodied-world-models.github.io/)
### 7. World Models for Policy evaluation
Real-world policy evaluation is expensive and noisy. The promise of world models is by accurately capturing environment dynamics, it can serve as a surrogate evaluation environment with high correlation to the policy performance in the real world. Before world models, the role for that was simulators:
- [⭐️] **Simpler**, "Evaluating Real-World Robot Manipulation Policies in Simulation". [](https://arxiv.org/abs/2405.05941) [](https://github.com/simpler-env/SimplerEnv)
For World Model Evaluation:
- [⭐️] **WorldGym**, "WorldGym: Evaluating Robot Policies in a World Model". [](https://arxiv.org/abs/2506.00613) [](https://world-model-eval.github.io)
- [⭐️] **WorldEval**: "WorldEval: World Model as Real-World Robot Policies Evaluator". [](https://arxiv.org/abs/2505.19017) [](https://worldeval.github.io)
- [⭐️] **WoW!**: "WOW!: World Models in a Closed-Loop World". [](https://openreview.net/pdf/e6aed49462d9e080633e727436cc95a0a8d61c57.pdf) [](https://wow202509.github.io/WOW_project_page/)
- **Cosmos-Surg-dVRK**: "Cosmos-Surg-dVRK: World Foundation Model-based Automated Online Evaluation of Surgical Robot Policy Learning". [](https://arxiv.org/abs/2510.16240)
---
## World Models for Science
Natural Science:
- [⭐️] **CellFlux**, "CellFlux: Simulating Cellular Morphology Changes via Flow Matching". [](https://arxiv.org/abs/2502.09775)[](https://yuhui-zh15.github.io/CellFlux/).
- **CheXWorld**, "CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning". [](http://arxiv.org/abs/2504.13820)[](https://github.com/LeapLabTHU/CheXWorld)
- **EchoWorld**: "EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance". [](https://arxiv.org/abs/2504.13065) [](https://github.com/LeapLabTHU/EchoWorld)
- **ODesign**, "ODesign: A World Model for Biomolecular Interaction Design." [](https://arxiv.org/pdf/2510.22304) [](https://odesign.lglab.ac.cn)
- [⭐️] **SFP**, "Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models". [](https://arxiv.org/abs/2510.04020)
- **Xray2Xray**, "Xray2Xray: World Model from Chest X-rays with Volumetric Context". [](https://arxiv.org/abs/2506.19055)
- [⭐️] **Medical World Model**: "Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning". [](https://arxiv.org/abs/2506.02327)
- **Surgical Vision World Model**, "Surgical Vision World Model". [](https://arxiv.org/abs/2503.02904)
Social Science:
- **Social World Models**, "Social World Models". [](https://arxiv.org/abs/2509.00559)
- "Social World Model-Augmented Mechanism Design Policy Learning". [](https://arxiv.org/abs/2510.19270)
- **SocioVerse**, "SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users". [](http://arxiv.org/abs/2504.10157) [](https://github.com/FudanDISC/SocioVerse)
* **Effectively Designing 2-Dimensional Sequence Models**: "Effectively Designing 2-Dimensional Sequence Models for Multivariate Time Series". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **A Virtual Reality-Integrated System**: "A Virtual Reality-Integrated System for Behavioral Analysis in Neurological Decline". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **TwinMarket**: "TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **Latent Representation Encoding**: "Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **Reconstructing Dynamics**: "Reconstructing Dynamics from Steady Spatial Patterns with Partial Observations". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)
* **SP: Learning Physics from Sparse Observations**: "SP: Learning Physics from Sparse Observations — Three Pitfalls of PDE-Constrained Diffusion Models". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **SP: Continuous Autoregressive Generation**: "SP: Continuous Autoregressive Generation with Mixture of Gaussians". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **EquiReg**: "EquiReg: Symmetry-Driven Regularization for Physically Grounded Diffusion-based Inverse Solvers". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **Neural Modular World Model**: "Neural Modular World Model". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **Bidding for Influence**: "Bidding for Influence: Auction-Driven Diffusion Image Generation". [](https://openreview.net/group?id=ICML.cc/2025/Workshop/World_Models#tab-accept) [](https://physical-world-modeling.github.io/)
* **PINT**: "PINT: Physics-Informed Neural Time Series Models with Applications to Long-term Inference on WeatherBench 2m-Temperature Data". [](https://openreview.net/group?id=ICLR.cc/2025/Workshop/World_Models#tab-accept) [](https://sites.google.com/view/worldmodel-iclr2025/accepted-papers)