{"id":13440555,"url":"https://github.com/LMD0311/Awesome-World-Model","last_synced_at":"2025-03-20T10:31:24.897Z","repository":{"id":215072193,"uuid":"738046322","full_name":"LMD0311/Awesome-World-Model","owner":"LMD0311","description":"Collect some World Models for Autonomous Driving papers. ","archived":false,"fork":false,"pushed_at":"2024-05-22T13:51:08.000Z","size":74,"stargazers_count":204,"open_issues_count":0,"forks_count":3,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-05-22T14:57:29.387Z","etag":null,"topics":["artificial-intelligence","artificial-intelligence-algorithms","autonomous-driving","autonomous-vehicles","awesome","computer-vision","deep-learning","future-predict","world-model"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LMD0311.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-02T09:38:21.000Z","updated_at":"2024-05-28T16:48:44.396Z","dependencies_parsed_at":"2024-01-18T16:26:50.963Z","dependency_job_id":"7acfc077-bc63-41d0-b7c7-dc11ccd71d63","html_url":"https://github.com/LMD0311/Awesome-World-Model","commit_stats":null,"previous_names":["lmd0311/awesome-world-model"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LMD0311%2FAwesome-World-Model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LMD0311%2FAwesome-World-Model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LMD0311%2FAwesome-World-Model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LMD0311%2FAwesome-World-Model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LMD0311","download_url":"https://codeload.github.com/LMD0311/Awesome-World-Model/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":213296150,"owners_count":15566157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","artificial-intelligence-algorithms","autonomous-driving","autonomous-vehicles","awesome","computer-vision","deep-learning","future-predict","world-model"],"created_at":"2024-07-31T03:01:23.857Z","updated_at":"2025-03-20T10:31:24.890Z","avatar_url":"https://github.com/LMD0311.png","language":null,"funding_links":[],"categories":["Others","Tools","Other Lists","💻 Open-Source Projects","🙏 Acknowledgements","Acknowledgements"],"sub_categories":["Australia","TeX Lists","Papers","🧪 Frontier Labs and Teams","10. Memory in World Model"],"readme":"# Awesome World Models for Autonomous Driving [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) [![arXiv](https://img.shields.io/badge/Arxiv-2502.10498-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2502.10498)\n\nThis repo is used for recording, tracking, and benchmarking several recent World Models (for Autonomous Driving or Robotic) methods, as a supplement to our [**survey**](https://arxiv.org/abs/2502.10498).\n\nIf you find some ignored papers, **feel free to [*create pull requests*](https://github.com/LMD0311/Awesome-World-Model/blob/main/ContributionGuidelines.md), or [*open issues*](https://github.com/LMD0311/Awesome-World-Model/issues/new)**. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣\n\nIf you find this repository useful, please consider  **giving us a star** 🌟 and a [**cite**](https://github.com/LMD0311/Awesome-World-Model#citation).\n\n## Workshop \u0026 Challenge\n\n- [`CVPR 2024 Workshop \u0026 Challenge | OpenDriveLab`](https://opendrivelab.com/challenge2024/#predictive_world_model) Track #4: Predictive World Model.\n  \u003e Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to elevate a pre-trained foundation model to the next level. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.\n  \n- [`CVPR 2023 Workshop on Autonomous Driving`](https://cvpr2023.wad.vision/) CHALLENGE 3: ARGOVERSE CHALLENGES, [3D Occupancy Forecasting](https://eval.ai/web/challenges/challenge-page/1977/overview) using the [Argoverse 2 Sensor Dataset](https://www.argoverse.org/av2.html#sensor-link). Predict the spacetime occupancy of the world for the next 3 seconds.\n\n## Papers\n\n### World model original paper\n\n- Using Occupancy Grids for Mobile Robot Perception and Navigation [[paper](http://www.sci.brooklyn.cuny.edu/~parsons/courses/3415-fall-2011/papers/elfes.pdf)]\n\n### Technical blog or video\n\n- **`Yann LeCun`**: A Path Towards Autonomous Machine Intelligence [[paper](https://openreview.net/pdf?id=BZ5a1r-kVsf)] [[Video](https://www.youtube.com/watch?v=OKkEdTchsiE)]\n- **`CVPR'23 WAD`** Keynote - Ashok Elluswamy, Tesla [[Video](https://www.youtube.com/watch?v=6x-Xb_uT7ts)]\n- **`Wayve`** Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [[blog](https://wayve.ai/thinking/introducing-gaia1/)] \n  \u003e World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.\n  \n\n### Survey\n\n- A survey on multimodal large language models for autonomous driving. **`WACVW 2024`** [[Paper](https://arxiv.org/abs/2311.12320)] [[Code](https://github.com/IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving)]\n- World Models: The Safety Perspective. **`ISSREW`** [[Paper](https://arxiv.org/abs/2411.07690)]\n- The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey. **`arXiv 2025.02`** [[Paper](https://arxiv.org/abs/2502.10498)]\n- A Survey of World Models for Autonomous Driving. **`arXiv 2025.01`** [[Paper](https://arxiv.org/abs/2501.11260)]\n- Generative Physical AI in Vision: A Survey. **`arXiv 2025.01`** [[Paper](https://arxiv.org/abs/2501.10928)] [[Code](https://github.com/BestJunYu/Awesome-Physics-aware-Generation)]\n- Understanding World or Predicting Future? A Comprehensive Survey of World Models. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.14499)]\n- Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.02914)]\n- Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI. **`arXiv 2024.7`** [[Paper](https://arxiv.org/abs/2407.06886)] [[Code](https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List)]\n- Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond. **`arXiv 2024.5`** [[Paper](https://arxiv.org/abs/2405.03520)] [[Code](https://github.com/GigaAI-research/General-World-Models-Survey)]\n- World Models for Autonomous Driving: An Initial Survey. **`2024.3, arxiv`** [[Paper](https://arxiv.org/abs/2403.02622)]\n\n### 2025\n- [**PIWM**] Dream to Drive with Predictive Individual World Model.  **`TIV 2025`** [[Paper](https://arxiv.org/abs/2501.16733)]  [[Code](https://github.com/gaoyinfeng/PIWM)]\n- **DriveDreamer4D**: World Models Are Effective Data Machines for 4D Driving Scene Representation. **`CVPR 25`** [[Paper](https://arxiv.org/abs/2410.13571)] [[Project Page](https://drivedreamer4d.github.io/)]\n- **GaussianWorld**: Gaussian World Model for Streaming 3D Occupancy Prediction. **`CVPR 25`** [[Paper](https://arxiv.org/abs/2412.10373)] [[Code](https://github.com/zuosc19/GaussianWorld)]\n- **ReconDreamer**: Crafting World Models for Driving Scene Reconstruction via Online Restoration. **`CVPR 25`** [[Paper](https://arxiv.org/abs/2411.19548)] [[Code](https://github.com/GigaAI-research/ReconDreamer)]\n- **MaskGWM**: A Generalizable Driving World Model with Video Mask Reconstruction.  **`CVPR 25`** [[Paper](https://arxiv.org/abs/2502.11663)] [[Code](https://github.com/SenseTime-FVG/OpenDWM)]\n- **UniScene**: Unified Occupancy-centric Driving Scene Generation. **`CVPR 25`** [[Paper](https://arxiv.org/abs/2412.05435)] [[Project](https://arlo0o.github.io/uniscene/)]\n- **GEM**: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control. **`CVPR 25`** [[Paper](https://arxiv.org/abs/2412.11198)] [[Project](https://vita-epfl.github.io/GEM.github.io/)]\n- **DynamicCity**: Large-Scale LiDAR Generation from Dynamic Scenes  **`ICLR 2025`** [[Paper](https://arxiv.org/abs/2410.18084)] [[Code](https://github.com/3DTopia/DynamicCity)]\n- **AdaWM**: Adaptive World Model based Planning for Autonomous Driving.  **`ICLR 2025`** [[Paper](https://arxiv.org/abs/2501.13072)]\n- **OccProphet**: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework.  **`ICLR 2025`** [[Paper](https://arxiv.org/abs/2502.15180)] [[Code](https://github.com/JLChen-C/OccProphet)]\n- [**PreWorld**] Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving.  **`ICLR 2025`** [[Paper](https://arxiv.org/abs/2502.07309)] [[Code](https://github.com/getterupper/PreWorld)]\n- [**SSR**] Does End-to-End Autonomous Driving Really Need Perception Tasks? **`ICLR 2025`** [[Paper](https://arxiv.org/abs/2409.18341)] [[Code](https://github.com/PeidongLi/SSR)]\n- **Occ-LLM**: Enhancing Autonomous Driving with Occupancy-Based Large Language Models.  **`ICRA 2025`** [[Paper](https://arxiv.org/abs/2502.06419)]\n- [**UniFuture**] Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception.  **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.13587)] [[Code](https://github.com/dk-liang/UniFuture)] [[Project](https://dk-liang.github.io/UniFuture/)]\n- **HERMES**: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation.  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.14729)] [[Code](https://github.com/LMD0311/HERMES)] [[Project](https://lmd0311.github.io/HERMES/)]\n- **DiST-4D**: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation.  **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.15208)] [[Project](https://royalmelon0505.github.io/DiST-4D/)]\n- [**EOT-WM**] Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.09215)]\n- [**T^3Former**] Temporal Triplane Transformers as Occupancy World Models. **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.07338)]\n- **AVD2**: Accident Video Diffusion for Accident Video Description. **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2502.14801)] [[Project](https://an-answer-tree.github.io/)]\n- **VaViM and VaVAM**: Autonomous Driving through Video Generative Modeling.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.15672)] [[Code](https://github.com/valeoai/VideoActionModel)]\n- **Dream to Drive**: Model-Based Vehicle Control Using Analytic World Models.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.10012)]\n- **FUTURIST**: Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers. **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.08303)] [[Code](https://github.com/Sta8is/FUTURIST)]\n- **AD-L-JEPA**: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data.  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.04969)] [[Code](https://github.com/HaoranZhuExplorer/AD-L-JEPA-Release)]\n\n### 2024\n- [**SEM2**] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model. **`TITS`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10538211/)]\n- **Vista**: A Generalizable Driving World Model with High Fidelity and Versatile Controllability. **`NeurIPS 2024`** [[Paper](https://arxiv.org/abs/2405.17398)] [[Code](https://github.com/OpenDriveLab/Vista)]\n- **DrivingDojo Dataset**: Advancing Interactive and Knowledge-Enriched Driving World Model. **`NeurIPS 2024`** [[Paper](https://arxiv.org/abs/2410.10738)] [[Project](https://drivingdojo.github.io/)]\n- **Think2Drive**: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2402.16720)]\n- [**MARL-CCE**] Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model. **`ECCV 2024`** [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/05085.pdf)] [[Code](https://github.com/qiaoguanren/MARL-CCE)]\n- **DriveDreamer**: Towards Real-world-driven World Models for Autonomous Driving. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2309.09777)] [[Code](https://github.com/JeffWang987/DriveDreamer)]\n- **GenAD**: Generative End-to-End Autonomous Driving. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2402.11502)] [[Code](https://github.com/wzzheng/GenAD)]\n- **OccWorld**: Learning a 3D Occupancy World Model for Autonomous Driving. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2311.16038)] [[Code](https://github.com/wzzheng/OccWorld)]\n- [**NeMo**] Neural Volumetric World Models for Autonomous Driving. **`ECCV 2024`** [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02571.pdf)]\n- **CarFormer**: Self-Driving with Learned Object-Centric Representations. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2407.15843)] [[Code](https://kuis-ai.github.io/CarFormer/)]\n- [**MARL-CCE**] Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model. **`ECCV 2024`** [[Code](https://github.com/qiaoguanren/MARL-CCE)]\n- [**GUMP**] Solving Motion Planning Tasks with a Scalable Generative Model. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2407.02797)] [[Code](https://github.com/HorizonRobotics/GUMP/)]\n- **DrivingDiffusion**: Layout-Guided multi-view driving scene video generation with latent diffusion model. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2310.07771)] [[Code](https://github.com/shalfun/DrivingDiffusion)]\n- **3D-VLA**: A 3D Vision-Language-Action Generative World Model.  **`ICML 2024`** [[Paper](https://arxiv.org/abs/2403.09631)]\n- [**ViDAR**] Visual Point Cloud Forecasting enables Scalable Autonomous Driving. **`CVPR 2024`** [[Paper](https://arxiv.org/abs/2312.17655)] [[Code](https://github.com/OpenDriveLab/ViDAR)]\n- [**GenAD**] Generalized Predictive Model for Autonomous Driving. **`CVPR 2024`** [[Paper](https://arxiv.org/abs/2403.09630)] [[Data](https://github.com/OpenDriveLab/DriveAGI?tab=readme-ov-file#genad-dataset-opendv-youtube)]\n- **Cam4DOCC**: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. **`CVPR 2024`** [[Paper](https://arxiv.org/abs/2311.17663)] [[Code](https://github.com/haomo-ai/Cam4DOcc)]\n- [**Drive-WM**] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. **`CVPR 2024`** [[Paper](https://arxiv.org/abs/2311.17918)] [[Code](https://github.com/BraveGroup/Drive-WM)]\n- **DriveWorld**: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving. **`CVPR 2024`** [[Paper](https://arxiv.org/abs/2405.04390)]\n- **Panacea**: Panoramic and Controllable Video Generation for Autonomous Driving. **`CVPR 2024`** [[Paper](https://arxiv.org/abs/2311.16813)] [[Code](https://panacea-ad.github.io/)]\n- **UnO**: Unsupervised Occupancy Fields for Perception and Forecasting. **`CVPR 2024`** [[Paper](https://arxiv.org/abs/2406.08691)] [[Code](https://waabi.ai/research/uno)]\n- **MagicDrive**: Street View Generation with Diverse 3D Geometry Control. **`ICLR 2024`** [[Paper](https://arxiv.org/abs/2310.02601)] [[Code](https://github.com/cure-lab/MagicDrive)]\n- **Copilot4D**: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. **`ICLR 2024`** [[Paper](https://arxiv.org/abs/2311.01017)]\n- **SafeDreamer**: Safe Reinforcement Learning with World Models. **`ICLR 2024`** [[Paper](https://openreview.net/forum?id=tsE5HLYtYg)] [[Code](https://github.com/PKU-Alignment/SafeDreamer)]\n- **DINO-Foresight**: Looking into the Future with DINO. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.11673)] [[Code](https://github.com/Sta8is/DINO-Foresight)]\n- **DrivingWorld**: Constructing World Model for Autonomous Driving via Video GPT. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.19505)] [[Code](https://github.com/YvanYin/DrivingWorld)]\n- **DrivingGPT**: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.18607)] [[Project](https://rogerchern.github.io/DrivingGPT/)]\n- An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.13772)]\n- **Doe-1**: Closed-Loop Autonomous Driving with Large World Model. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.09627)] [[Code](https://github.com/wzzheng/Doe)]\n- [**DrivePhysica**] Physical Informed Driving World Model. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.08410)] [[Code](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html)]\n- **Terra** **ACT-Bench**: Towards Action Controllable World Models for Autonomous Driving. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.05337)] [[Code](https://github.com/turingmotors/ACT-Bench)] [[Project](https://turingmotors.github.io/actbench/)] [[Hugging Face](https://huggingface.co/turing-motors/Terra)] \n- **UniMLVG**: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.04842)] [[Project](https://sensetime-fvg.github.io/UniMLVG/)] [[Code](https://github.com/SenseTime-FVG/OpenDWM)]\n- **HoloDrive**: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.01407)]\n- **InfinityDrive**: Breaking Time Limits in Driving World Models. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.01522)] [[Project Page](https://metadrivescape.github.io/papers_project/InfinityDrive/page.html)]\n- Generating Out-Of-Distribution Scenarios Using Language Models. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.16554)]\n- **Imagine-2-Drive**: High-Fidelity World Modeling in CARLA for Autonomous Vehicles. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.10171)] [[Project Page](https://anantagrg.github.io/Imagine-2-Drive.github.io/)]\n- **WorldSimBench**: Towards Video Generation Models as World Simulator. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.18072)] [[Project Page](https://iranqin.github.io/WorldSimBench.github.io/)]\n- **DOME**: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.10429)] [[Project Page](https://gusongen.github.io/DOME)]\n- Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.16663)]\n- [**LatentDriver**] Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.15730)] [[Code](https://github.com/Sephirex-X/LatentDriver)]\n- **RenderWorld**: World Model with Self-Supervised 3D Label. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.11356)]\n- **OccLLaMA**: An Occupancy-Language-Action Generative World Model for Autonomous Driving. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.03272)]\n- **DriveGenVLM**: Real-world Video Generation for Vision Language Model based Autonomous Driving. **`arXiv 2024.8`** [[Paper](https://arxiv.org/abs/2408.16647)]\n- [**Drive-OccWorld**] Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving. **`arXiv 2024.8`** [[Paper](https://arxiv.org/abs/2408.14197)]\n- **BEVWorld**: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space. **`arXiv 2024.7`** [[Paper](https://arxiv.org/abs/2407.05679)] [[Code](https://github.com/zympsyche/BevWorld)]\n- [**TOKEN**] Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving. **`arXiv 2024.7`** [[Paper](https://arxiv.org/abs/2407.00959)]\n- **UMAD**: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.06370)]\n- **SimGen**: Simulator-conditioned Driving Scene Generation. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.09386)] [[Code](https://metadriverse.github.io/simgen/)]\n- [**AdaptiveDriver**] Planning with Adaptive World Models for Autonomous Driving. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.10714)] [[Code](https://arunbalajeev.github.io/world_models_planning/world_model_paper.html)]\n- [**LAW**] Enhancing End-to-End Autonomous Driving with Latent World Model. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.08481)] [[Code](https://github.com/BraveGroup/LAW)]\n- [**Delphi**] Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.01349)] [[Code](https://github.com/westlake-autolab/Delphi)]\n- **OccSora**: 4D Occupancy Generation Models as World Simulators for Autonomous Driving. **`arXiv 2024.5`** [[Paper](https://arxiv.org/abs/2405.20337)] [[Code](https://github.com/wzzheng/OccSora)]\n- **MagicDrive3D**: Controllable 3D Generation for Any-View Rendering in Street Scenes. **`arXiv 2024.5`** [[Paper](https://arxiv.org/abs/2405.14475)] [[Code](https://gaoruiyuan.com/magicdrive3d/)]\n- **CarDreamer**: Open-Source Learning Platform for World Model based Autonomous Driving. **`arXiv 2024.5`** [[Paper](https://arxiv.org/abs/2405.09111)] [[Code](https://github.com/ucd-dare/CarDreamer)]\n- [**DriveSim**] Probing Multimodal LLMs as World Models for Driving. **`arXiv 2024.5`** [[Paper](https://arxiv.org/abs/2405.05956)] [[Code](https://github.com/sreeramsa/DriveSim)]\n- **LidarDM**: Generative LiDAR Simulation in a Generated World. **`arXiv 2024.4`** [[Paper](https://arxiv.org/abs/2404.02903)] [[Code](https://github.com/vzyrianov/lidardm)]\n- **SubjectDrive**: Scaling Generative Data in Autonomous Driving via Subject Control. **`arXiv 2024.3`** [[Paper](https://arxiv.org/abs/2403.19438)] [[Project](https://subjectdrive.github.io/)]\n- **DriveDreamer-2**: LLM-Enhanced World Models for Diverse Driving Video Generation. **`arXiv 2024.3`** [[Paper](https://arxiv.org/abs/2403.06845)] [[Code](https://drivedreamer2.github.io/)]\n\n### 2023\n\n- **TrafficBots**: Towards World Models for Autonomous Driving Simulation and Motion Prediction. **`ICRA 2023`** [[Paper](https://arxiv.org/abs/2303.04116)] [[Code](https://github.com/zhejz/TrafficBots)]\n- **WoVoGen**: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation. **`arXiv 2023.12`** [[Paper](https://arxiv.org/abs/2312.02934)] [[Code](https://github.com/fudan-zvg/WoVoGen)]\n- [**CTT**] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent. **`arXiv 2023.11`** [[Paper](https://arxiv.org/abs/2311.18307)]\n- **MUVO**: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations. **`arXiv 2023.11`** [[Paper](https://arxiv.org/abs/2311.11762)]\n- **GAIA-1**: A Generative World Model for Autonomous Driving. **`arXiv 2023.9`** [[Paper](https://arxiv.org/abs/2309.17080)]\n- **ADriver-I**: A General World Model for Autonomous Driving. **`arXiv 2023.9`** [[Paper](https://arxiv.org/abs/2311.13549)]\n- **UniWorld**: Autonomous Driving Pre-training via World Models. **`arXiv 2023.8`** [[Paper](https://arxiv.org/abs/2308.07234)] [[Code](https://github.com/chaytonmin/UniWorld)]\n\n### 2022\n\n- [**MILE**] Model-Based Imitation Learning for Urban Driving. **`NeurIPS 2022`** [[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/hash/827cb489449ea216e4a257c47e407d18-Abstract-Conference.html)] [[Code](https://github.com/wayveai/mile)]\n- **Iso-Dream**: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models.  **`NeurIPS 2022 Spotlight`** [[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/hash/9316769afaaeeaad42a9e3633b14e801-Abstract-Conference.html)] [[Code](https://github.com/panmt/Iso-Dream)]\n- **Symphony**: Learning Realistic and Diverse Agents for Autonomous Driving Simulation. **`ICRA 2022`** [[Paper](https://arxiv.org/abs/2205.03195)] \n- Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving. **`IROS 2022`** [[Paper](https://arxiv.org/abs/2210.09539)]\n- [**SEM2**] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model. **`NeurIPS 2022 workshop`** [[Paper](https://arxiv.org/abs/2210.04017)]\n\n## Other World Model Paper\n### 2025\n- [**NWM**] Navigation World Models.  **`CVPR 25`** **`Yann LeCun`** [[Paper](https://arxiv.org/abs/2412.03572)] [[Project](https://www.amirbar.net/nwm/)]\n- **LS-Imagine**: Open-World Reinforcement Learning over Long Short-Term Imagination. **`ICLR 2025 Oral`** [[Paper](https://openreview.net/pdf?id=vzItLaEoDa)] [[Code](https://github.com/qiwang067/LS-Imagine)]\n- **Cosmos-Transfer1** **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.14492)] [[Code](https://github.com/nvidia-cosmos/cosmos-transfer1)]\n- Meta-Reinforcement Learning with Discrete World Models for Adaptive Load Balancing. **`ACMSE 2025`** [[Paper](https://arxiv.org/abs/2503.08872)]\n- **LUMOS**: Language-Conditioned Imitation Learning with World Models. **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.10370)] [[Project](http://lumos.cs.uni-freiburg.de/)]\n- **World Modeling Makes a Better Planner**: Dual Preference Optimization for Embodied Task Planning. **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.10480)]\n- [**WLA**] Inter-environmental world modeling for continuous and compositional dynamics. **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.09911)]\n- **Disentangled World Models**: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning. **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.08751)]\n- Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments. **`arXiv 2025.3`** [[Paper](https://arxiv.org/abs/2503.08122)]\n- **WorldModelBench**: Judging Video Generation Models As World Models. **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.20694)] [[Project](https://worldmodelbench-team.github.io/)]\n- **Multimodal Dreaming**: A Global Workspace Approach to World Model-Based Reinforcement Learning. **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.21142)]\n- Learning To Explore With Predictive World Model Via Self-Supervised Learning. **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.13200)]\n- **Text2World**: Benchmarking Large Language Models for Symbolic World Model Generation.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.13092)] [[Project](https://text-to-world.github.io/)]\n- **M^3** : A Modular World Model over Streams of Tokens.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.11537)]  [[Code](https://github.com/leor-c/M3)]\n- When do Neural Networks Learn World Models?.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.09297)]\n- [**DWS**] Pre-Trained Video Generative Models as World Simulators.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.07825)]\n- **DMWM**: Dual-Mind World Model with Long-Term Imagination.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.07591)]\n- **EvoAgent**: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.05907)]\n- Generating Symbolic World Models via Test-time Scaling of Large Language Models.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.04728)]\n- [**HMA**] Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression.  **`arXiv 2025.2`** [[Paper](https://arxiv.org/abs/2502.04296)] [[Code](https://github.com/liruiw/HMA)] [[Project](https://liruiw.github.io/hma/)]\n- **UP-VLA**: A Unified Understanding and Prediction Model for Embodied Agent.  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.18867)]\n- **GLAM**: Global-Local Variation Awareness in Mamba-based World Model.  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.11949)] [[Code](https://github.com/GLAM2025/glam)]\n- **Robotic World Model**: A Neural Network Simulator for Robust Policy Optimization in Robotics.  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.10100)]\n- **GAWM**: Global-Aware World Model for Multi-Agent Reinforcement Learning.  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.10116)]\n- **RoboHorizon**: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation.  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.06605)]\n- **EnerVerse**: Envisioning Embodied Future Space for Robotics Manipulation. **`AgiBot`**  **`arXiv 2025.1`** [[Paper](https://arxiv.org/abs/2501.06605)] [[Website](https://sites.google.com/view/enerverse)]\n- **Cosmos** World Foundation Model Platform for Physical AI. **`NVIDIA`** **`arXiv 2025.1`** [[Paper](https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos_4.pdf)] [[Code](https://github.com/NVIDIA/Cosmos)]\n### 2024\n- [**SMAC**] Grounded Answers for Multi-agent Decision-making Problem through Generative World Model. **`NeurIPS 2024`** [[Paper](https://arxiv.org/abs/2410.02664)]\n- [**CoWorld**] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning. **`NeurIPS 2024`** [[Paper](https://arxiv.org/pdf/2305.15260)] [[Website](https://qiwang067.github.io/coworld)] [[Torch Code](https://github.com/qiwang067/CoWorld)]\n- [**Diamond**] Diffusion for World Modeling: Visual Details Matter in Atari. **`NeurIPS 2024`**  [[Paper](https://arxiv.org/abs/2405.12399)] [[Code](https://github.com/eloialonso/diamond)]\n- **PIVOT-R**: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. **`NeurIPS 2024`** [[Paper](https://arxiv.org/pdf/2410.10394)]\n- [**MUN**]Learning World Models for Unconstrained Goal Navigation. **`NeurIPS 2024`** [[Paper](https://arxiv.org/abs/2411.02446)] [[Code](https://github.com/RU-Automated-Reasoning-Group/MUN)]\n- **VidMan**: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation. **`NeurIPS 24`** [[Paper](https://arxiv.org/abs/2411.09153)]\n- **Adaptive World Models**: Learning Behaviors by Latent Imagination Under Non-Stationarity. **`NeurIPSW 2024`** [[Paper](https://arxiv.org/abs/2411.01342)]\n- Emergence of Implicit World Models from Mortal Agents. **`NeurIPSW 2024`** [[Paper](https://arxiv.org/abs/2411.12304)]\n- Causal World Representation in the GPT Model. **`NeurIPSW 2024`** [[Paper](https://arxiv.org/abs/2412.07446)]\n- **PreLAR**: World Model Pre-training with Learnable Action Representation. **`ECCV 2024`** [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03363.pdf)] [[Code](https://github.com/zhanglixuan0720/PreLAR)]\n- [**CWM**] Understanding Physical Dynamics with Counterfactual World Modeling. **`ECCV 2024`** [[Paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03523.pdf)] [[Code](https://neuroailab.github.io/cwm-physics/)]\n- **ManiGaussian**: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. **`ECCV 2024`** [[Paper](https://arxiv.org/abs/2403.08321)] [[Code](https://github.com/GuanxingLu/ManiGaussian)]\n- [**DWL**] Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. **`RSS 2024 (Best Paper Award Finalist)`** [[Paper](https://arxiv.org/abs/2408.14472)]\n- [**LLM-Sim**] Can Language Models Serve as Text-Based World Simulators? **`ACL`** [[Paper](https://arxiv.org/abs/2406.06485)] [[Code](https://github.com/cognitiveailab/GPT-simulator)]\n- **RoboDreamer**: Learning Compositional World Models for Robot Imagination. **`ICML 2024`** [[Paper](https://arxiv.org/abs/2404.12377)] [[Code](https://robovideo.github.io/)]\n- [**Δ-IRIS**] Efficient World Models with Context-Aware Tokenization. **`ICML 2024`** [[Paper](https://arxiv.org/abs/2406.19320)] [[Code](https://github.com/vmicheli/delta-iris)]\n- **AD3**: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors. **`ICML 2024`** [[Paper](https://arxiv.org/abs/2403.09976)]\n- **Hieros**: Hierarchical Imagination on Structured State Space Sequence World Models. **`ICML 2024`** [[Paper](https://arxiv.org/abs/2310.05167)]\n- [**HRSSM**] Learning Latent Dynamic Robust Representations for World Models.**`ICML 2024`** [[Paper](https://arxiv.org/abs/2405.06263)] [[Code](https://github.com/bit1029public/HRSSM)]\n- **HarmonyDream**: Task Harmonization Inside World Models.**`ICML 2024`** [[Paper](https://openreview.net/forum?id=x0yIaw2fgk)] [[Code](https://github.com/thuml/HarmonyDream)]\n- [**REM**] Improving Token-Based World Models with Parallel Observation Prediction.**`ICML 2024`** [[Paper](https://arxiv.org/abs/2402.05643)] [[Code](https://github.com/leor-c/REM)]\n- Do Transformer World Models Give Better Policy Gradients? **`ICML 2024`** [[Paper](https://arxiv.org/abs/2402.05290)]\n- **TD-MPC2**: Scalable, Robust World Models for Continuous Control. **`ICLR 2024`** [[Paper](https://arxiv.org/pdf/2310.16828)] [[Torch Code](https://github.com/nicklashansen/tdmpc2)]\n- **DreamSmooth**: Improving Model-based Reinforcement Learning via Reward Smoothing. **`ICLR 2024`** [[Paper](https://arxiv.org/pdf/2311.01450)]\n- [**R2I**] Mastering Memory Tasks with World Models. **`ICLR 2024`** [[Paper](http://arxiv.org/pdf/2403.04253)] [[JAX Code](https://github.com/OpenDriveLab/ViDAR)]\n- **MAMBA**: an Effective World Model Approach for Meta-Reinforcement Learning. **`ICLR 2024`**  [[Paper](https://arxiv.org/abs/2403.09859)] [[Code](https://github.com/zoharri/mamba)]\n- Multi-Task Interactive Robot Fleet Learning with Visual World Models. **`CoRL 2024`** [[Paper](https://arxiv.org/abs/2410.22689)] [[Code](https://ut-austin-rpl.github.io/sirius-fleet/)]\n- **Generative Emergent Communication**: Large Language Model is a Collective World Model. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2501.00226)]\n- Towards Unraveling and Improving Generalization in World Models. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2501.00195)]\n- **Towards Physically Interpretable World Models**: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.13772)]\n- **Dream to Manipulate**: Compositional World Models Empowering Robot Imitation Learning with Imagination. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.14957)]  [[Project](https://leobarcellona.github.io/DreamToManipulate/)]\n- Transformers Use Causal World Models in Maze-Solving Tasks. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.11867)]\n- **Owl-1**: Omni World Model for Consistent Long Video Generation. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.09600)] [[Code](https://github.com/huang-yh/Owl)]\n- **StoryWeaver**: A Unified World Model for Knowledge-Enhanced Story Character Customization. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.07375)] [[Code](https://github.com/Aria-Zhangjl/StoryWeaver)]\n- **SimuDICE**: Offline Policy Optimization Through World Model Updates and DICE Estimation. **`BNAIC 2024`** [[Paper](https://arxiv.org/abs/2412.06486)]\n- Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm. **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.06139)]\n- **Genie 2**: A large-scale foundation world model.  **`2024.12`** **`Google DeepMind`** [[Blog](https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/)]\n- **The Matrix**: Infinite-Horizon World Generation with Real-Time Moving Control.  **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.03568)] [[Project](https://thematrix1999.github.io/)]\n- **Motion Prompting**: Controlling Video Generation with Motion Trajectories.  **`arXiv 2024.12`** [[Paper](https://arxiv.org/abs/2412.02700)] [[Project](https://motion-prompting.github.io/)]\n- Generative World Explorer. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.11844)] [[Project](https://generative-world-explorer.github.io/)]\n- [**WebDreamer**] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.06559)] [[Code](https://github.com/OSU-NLP-Group/WebDreamer)]\n- **WHALE**: Towards Generalizable and Scalable World Models for Embodied Decision-making. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.05619)]\n- **DINO-WM**: World Models on Pre-trained Visual Features enable Zero-shot Planning. **`arXiv 2024.11`** **`Yann LeCun`** [[Paper](https://arxiv.org/abs/2411.04983)]\n- Scaling Laws for Pre-training Agents and World Models. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.04434)]\n- [**Phyworld**] How Far is Video Generation from World Model: A Physical Law Perspective. **`arXiv 2024.11`** [[Paper](https://arxiv.org/abs/2411.02385)] [[Project](https://phyworld.github.io/)]\n- **IGOR**: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2411.00785)] [[Project](https://www.microsoft.com/en-us/research/project/igor-image-goal-representations/)]\n- **EVA**: An Embodied World Model for Future Video Anticipation. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.15461)] \n- **VisualPredicator**: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.23156)] \n- [**LLMCWM**] Language Agents Meet Causality -- Bridging LLMs and Causal World Models. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.19923)] [[Code](https://github.com/j0hngou/LLMCWM/)]\n- Reward-free World Models for Online Imitation Learning. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.14081)]\n- **Web Agents with World Models**: Learning and Leveraging Environment Dynamics in Web Navigation. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.13232)]\n- [**GLIMO**] Grounding Large Language Models In Embodied Environment With Imperfect World Models. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.02664)]\n- **AVID**: Adapting Video Diffusion Models to World Models. **`arXiv 2024.10`** [[Paper](https://arxiv.org/abs/2410.12822)] [[Code](https://github.com/microsoft/causica/tree/main/research_experiments/avid)]\n- [**WMP**] World Model-based Perception for Visual Legged Locomotion. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.16784)] [[Project](https://wmp-loco.github.io/)]\n- [**OSWM**] One-shot World Models Using a Transformer Trained on a Synthetic Prior. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.14084)]\n- **R-AIF**: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.14216)]\n- Representing Positional Information in Generative World Models for Object Manipulation. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.12005)]\n- Making Large Language Models into World Models with Precondition and Effect Knowledge. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.12278)]\n- **DexSim2Real$^2$**: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation. **`arXiv 2024.9`** [[Paper](https://arxiv.org/abs/2409.08750)]\n- Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction. **`arXiv 2024.8`** [[Paper](https://arxiv.org/abs/2408.11816)]\n- [**MoReFree**] World Models Increase Autonomy in Reinforcement Learning. **`arXiv 2024.8`** [[Paper](https://arxiv.org/abs/2408.09807)] [[Project](https://sites.google.com/view/morefree)]\n- **UrbanWorld**: An Urban World Model for 3D City Generation. **`arXiv 2024.7`** [[Paper](https://arxiv.org/abs/2407.119656)]\n- **PWM**: Policy Learning with Large World Models. **`arXiv 2024.7`** [[Paper](https://arxiv.org/abs/2407.02466)] [[Code](https://www.imgeorgiev.com/pwm/)]\n- **Predicting vs. Acting**: A Trade-off Between World Modeling \u0026 Agent Modeling. **`arXiv 2024.7`** [[Paper](https://arxiv.org/abs/2407.02446)]\n- [**GenRL**] Multimodal foundation world models for generalist embodied agents. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.18043)] [[Code](https://github.com/mazpie/genrl)]\n- [**DLLM**] World Models with Hints of Large Language Models for Goal Achieving. **`arXiv 2024.6`** [[Paper](http://arxiv.org/pdf/2406.07381)]\n- Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.15275)]\n- **CityBench**: Evaluating the Capabilities of Large Language Model as World Model. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.13945)] [[Code](https://github.com/tsinghua-fib-lab/CityBench)]\n- **CoDreamer**: Communication-Based Decentralised World Models. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.13600)]\n- [**EBWM**] Cognitively Inspired Energy-Based World Models. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.08862)]\n- Evaluating the World Model Implicit in a Generative Model. **`arXiv 2024.6`** [[Paper](https://arxiv.org/abs/2406.03689)] [[Code](https://github.com/mazpie/genrl)]\n- Transformers and Slot Encoding for Sample Efficient Physical World Modelling. **`arXiv 2024.5`** [[Paper](https://arxiv.org/abs/2405.20180)] [[Code](https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm)]\n- [**Puppeteer**] Hierarchical World Models as Visual Whole-Body Humanoid Controllers. **`arXiv 2024.5`** **`Yann LeCun`** [[Paper](https://arxiv.org/abs/2405.18418)] [[Code](https://nicklashansen.com/rlpuppeteer)]\n- **BWArea Model**: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation. **`arXiv 2024.5`** [[Paper](https://arxiv.org/abs/2405.17039)]\n- **Pandora**: Towards General World Model with Natural Language Actions and Video States. [[Paper](https://world-model.maitrix.org/assets/pandora.pdf)] [[Code](https://github.com/maitrix-org/Pandora)]\n- [**WKM**] Agent Planning with World Knowledge Model. **`arXiv 2024.5`**  [[Paper](https://arxiv.org/abs/2405.14205)] [[Code](https://github.com/zjunlp/WKM)]\n- **Newton**™ – a first-of-its-kind foundation model for understanding the physical world. **`Archetype AI`** [[Blog](https://www.archetypeai.io/blog/introducing-archetype-ai---understand-the-real-world-in-real-time)]\n- **Compete and Compose**: Learning Independent Mechanisms for Modular World Models. **`arXiv 2024.4`**  [[Paper](https://arxiv.org/abs/2404.15109)]\n- **MagicTime**: Time-lapse Video Generation Models as Metamorphic Simulators. **`arXiv 2024.4`**  [[Paper](https://arxiv.org/abs/2404.05014)] [[Code](https://github.com/PKU-YuanGroup/MagicTime)]\n- **Dreaming of Many Worlds**: Learning Contextual World Models Aids Zero-Shot Generalization. **`arXiv 2024.3`**  [[Paper](https://arxiv.org/abs/2403.10967)] [[Code](https://github.com/sai-prasanna/dreaming_of_many_worlds)]\n- **ManiGaussian**: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. **`arXiv 2024.3`**  [[Paper](https://arxiv.org/abs/2403.08321)] [[Code](https://guanxinglu.github.io/ManiGaussian/)]\n- **V-JEPA**: Video Joint Embedding Predictive Architecture. **`Meta AI`** **`Yann LeCun`** [[Blog](https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/)] [[Paper](https://ai.meta.com/research/publications/revisiting-feature-prediction-for-learning-visual-representations-from-video/)] [[Code](https://github.com/facebookresearch/jepa)]\n- [**IWM**] Learning and Leveraging World Models in Visual Representation Learning. **`Meta AI`** [[Paper](https://arxiv.org/abs/2403.00504)] \n- **Genie**: Generative Interactive Environments. **`DeepMind`** [[Paper](https://arxiv.org/abs/2402.15391)] [[Blog](https://sites.google.com/view/genie-2024/home)]\n- [**Sora**] Video generation models as world simulators. **`OpenAI`** [[Technical report](https://openai.com/research/video-generation-models-as-world-simulators)]\n- [**LWM**] World Model on Million-Length Video And Language With RingAttention. **`arXiv 2024.2`**  [[Paper](https://arxiv.org/abs/2402.08268)] [[Code](https://github.com/LargeWorldModel/LWM)]\n- Planning with an Ensemble of World Models. **`OpenReview`** [[Paper](https://openreview.net/forum?id=cvGdPXaydP)]\n- **WorldDreamer**: Towards General World Models for Video Generation via Predicting Masked Tokens. **`arXiv 2024.1`** [[Paper](https://arxiv.org/abs/2401.09985)] [[Code](https://github.com/JeffWang987/WorldDreamer)]\n\n### 2023\n- [**IRIS**] Transformers are Sample Efficient World Models. **`ICLR 2023 Oral`** [[Paper](https://arxiv.org/pdf/2209.00588)] [[Torch Code](https://github.com/eloialonso/iris)]\n- **STORM**: Efficient Stochastic Transformer based World Models for Reinforcement Learning. **`NIPS 2023`** [[Paper](https://arxiv.org/pdf/2310.09615)] [[Torch Code](https://github.com/weipu-zhang/STORM)]\n- [**TWM**] Transformer-based World Models Are Happy with 100k Interactions. **`ICLR 2023`** [[Paper](https://arxiv.org/pdf/2303.07109)] [[Torch Code](https://github.com/jrobine/twm)]\n- [**Dynalang**] Learning to Model the World with Language. **`arXiv 2023.8`** [[Paper](https://arxiv.org/pdf/2308.01399)] [[JAX Code](https://github.com/jlin816/dynalang)]\n- [**DreamerV3**] Mastering Atari with Discrete World Models. **`arXiv 2023.1`** [[Paper](https://arxiv.org/pdf/2301.04104)] [[JAX Code](https://github.com/danijar/dreamerv3)] [[Torch Code](https://github.com/NM512/dreamerv3-torch)]\n### 2022\n- [**TD-MPC**] Temporal Difference Learning for Model Predictive Control. **`ICML 2022`** [[Paper](https://arxiv.org/pdf/2203.04955)][[Torch Code](https://github.com/nicklashansen/tdmpc)]\n- **DreamerPro**: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations. **`ICML 2022`** [[Paper](https://proceedings.mlr.press/v162/deng22a/deng22a.pdf)] [[TF Code](https://github.com/fdeng18/dreamer-pro)]\n- **DayDreamer**: World Models for Physical Robot Learning. **`CoRL 2022`** [[Paper](https://proceedings.mlr.press/v205/wu23c/wu23c.pdf)] [[TF Code](https://github.com/danijar/daydreamer)]\n- Deep Hierarchical Planning from Pixels. **`NIPS 2022`** [[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/file/a766f56d2da42cae20b5652970ec04ef-Paper-Conference.pdf)] [[TF Code](https://github.com/danijar/director)]\n- **Iso-Dream**: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models. **`NIPS 2022 Spotlight`** [[Paper](https://proceedings.neurips.cc/paper_files/paper/2022/file/9316769afaaeeaad42a9e3633b14e801-Paper-Conference.pdf)] [[Torch Code](https://github.com/panmt/Iso-Dream)]\n- **DreamingV2**: Reinforcement Learning with Discrete World Models without Reconstruction. **`arXiv 2022.3`** [[Paper](https://arxiv.org/pdf/2203.00494)] \n### 2021\n- [**DreamerV2**] Mastering Atari with Discrete World Models. **`ICLR 2021`** [[Paper](https://arxiv.org/pdf/2010.02193)] [[TF Code](https://github.com/danijar/dreamerv2)] [[Torch Code](https://github.com/jsikyoon/dreamer-torch)]\n- **Dreaming**: Model-based Reinforcement Learning by Latent Imagination without Reconstruction. **`ICRA 2021`** [[Paper](https://arxiv.org/pdf/2007.14535)]\n### 2020\n- [**DreamerV1**] Dream to Control: Learning Behaviors by Latent Imagination. **`ICLR 2020`** [[Paper](https://arxiv.org/pdf/1912.01603)] [[TF Code](https://github.com/danijar/dreamer)] [[Torch Code](https://github.com/juliusfrost/dreamer-pytorch)]\n- [**Plan2Explore**] Planning to Explore via Self-Supervised World Models. **`ICML 2020`** [[Paper](https://arxiv.org/pdf/2005.05960)] [[TF Code](https://github.com/ramanans1/plan2explore)] [[Torch Code](https://github.com/yusukeurakami/plan2explore-pytorch)]\n\n### 2018\n* World Models. **`NIPS 2018 Oral`** [[Paper](https://arxiv.org/pdf/1803.10122)]\n\n## Citation\nIf you find this repository useful in your research, please consider giving a star ⭐ and a citation\n```bibtex\n@article{tu2025drivingworldmodel,\n      title={The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey}, \n      author={Tu, Sifan and Zhou, Xin and Liang, Dingkang and Jiang, Xingyu and Zhang, Yumeng and Li, Xiaofan and Bai, Xiang},\n      journal={arXiv preprint arXiv:2502.10498},\n      year={2025}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLMD0311%2FAwesome-World-Model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLMD0311%2FAwesome-World-Model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLMD0311%2FAwesome-World-Model/lists"}