https://github.com/worldbench/survey
3D and 4D World Modeling: A Survey
https://github.com/worldbench/survey
3d-generation 4d-generation awesome-list embodied-ai lidar-generation occupancy-generation spatial-intelligence video-generation world-models
Last synced: 7 months ago
JSON representation
3D and 4D World Modeling: A Survey
- Host: GitHub
- URL: https://github.com/worldbench/survey
- Owner: worldbench
- License: apache-2.0
- Created: 2025-06-20T23:42:53.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-06-21T21:08:04.000Z (7 months ago)
- Last Synced: 2025-06-21T21:36:31.432Z (7 months ago)
- Topics: 3d-generation, 4d-generation, awesome-list, embodied-ai, lidar-generation, occupancy-generation, spatial-intelligence, video-generation, world-models
- Homepage: https://worldbench.github.io/survey
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - survey - 🌐 3D and 4D World Modeling: A Survey. (Other Lists / TeX Lists)
- StarryDivineSky - worldbench/survey - D相机)、建模算法(如神经辐射场NeRF、隐式表面表示)、优化方法(如物理约束优化、多模态融合)到应用场景(如虚拟现实、自动驾驶、机器人导航)的完整技术链条。其核心特色在于通过分类框架梳理了不同建模范式(如基于网格的显式建模、基于体素的隐式建模、基于神经网络的参数化建模),并对比分析了各方法在精度、效率、可扩展性等方面的优劣。同时,项目还提出了当前研究面临的挑战,如动态场景建模的时序一致性、多模态数据融合的对齐问题、高维空间的计算复杂性等,并针对这些挑战指出了未来可能的研究方向,例如结合物理引擎的仿真建模、强化学习驱动的自适应建模、跨模态表示学习等。该综述通过结构化梳理现有文献与技术路线,为研究者和开发者提供了清晰的技术演进脉络与实践指导,尤其适合需要快速了解该领域技术全貌的研究人员或工程团队参考。 (3D视觉生成重建 / 资源传输下载)
README
[](https://github.com/sindresorhus/awesome)

[](https://github.com/worldbench/survey/pulls)
# :sunglasses: Awesome 3D and 4D World Models
### Table of Contents
- [**1. World Modeling from Video Generation**](#1-world-modeling-from-video-generation)
- [Data Engine](#one-data-engine)
- []()
- [Closed-Loop Simulator](#three-closed-loop-simulator)
- [Scene Reconstructor](#four-scene-reconstructor)
- [**2. World Modeling from Occupancy Generation**](#2-world-modeling-from-occupancy-generation)
- []()
- [Occupancy Forecaster](#two-occupancy-forecaster)
- []()
- [**3. World Modeling from LiDAR Generation**](#3-world-modeling-from-lidar-generation)
- []()
- []()
- []()
- [**4. Datasets & Benchmarks**](#4-datasets--benchmarks)
- []()
- []()
- []()
- [**5. Applications**](#5-applications)
- []()
- []()
- []()
- [**6. Other Resources**]()
- [Workshops]()
- [Tutorials]()
- [Talks & Seminars]()
- [**7. Acknowledgements**]()
# 1. World Modeling from Video Generation
### :one: Data Engine
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `BEVControl` | [](https://arxiv.org/abs/2308.01661)
BEVControl: Accurately Controlling Street-View Elements with Multi-Perspective Consistency via BEV Sketch Layout | arXiv 2023 | - | - |
| `BEVGen` | [](https://arxiv.org/abs/2301.04634)
Street-View Image Generation from a Bird's-Eye View Layout | RA-L 2024 | [](https://metadriverse.github.io/bevgen/) | [](https://github.com/alexanderswerdlow/BEVGen) |
| `MagicDrive` | [](https://arxiv.org/abs/2310.02601)
MagicDrive: Street View Generation with Diverse 3D Geometry Control | ICLR 2024 | [](https://gaoruiyuan.com/magicdrive/) | [](https://github.com/cure-lab/MagicDrive) |
| `Panacea` | [](https://arxiv.org/abs/2311.16813)
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving | CVPR 2024 | [](https://panacea-ad.github.io/) | [](https://github.com/wenyuqing/panacea) |
| `DrivingDiffusion` | [](https://arxiv.org/abs/2310.07771)
DrivingDiffusion: Layout-Guided Multi-View Driving Scene Video Generation with Latent Diffusion Model | ECCV 2024 | [](https://drivingdiffusion.github.io/) | [](https://github.com/shalfun/DrivingDiffusion) |
| `WoVoGen` | [](https://arxiv.org/abs/2312.02934)
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation | ECCV 2024 | - | [](https://github.com/fudan-zvg/WoVoGen) |
| `Delphi` | [](https://arxiv.org/abs/2406.01349)
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation | arXiv 2024 | [](https://westlake-autolab.github.io/delphi.github.io/) | [](https://github.com/westlake-autolab/Delphi) |
| `SimGen` | [](https://arxiv.org/abs/2406.09386)
SimGen: Simulator-conditioned Driving Scene Generation | NeurIPS 2024 | [](https://metadriverse.github.io/simgen/) | [](https://github.com/metadriverse/SimGen) |
| `BEVWorld` | [](https://arxiv.org/abs/2407.05679)
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents | arXiv 2024 | - | - |
| `PerLDiff` | [](https://arxiv.org/abs/2407.06109)
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models | arXiv 2024 | [](https://perldiff.github.io/) | [](https://github.com/LabShuHangGU/PerlDiff) |
| `Panacea+` | [](https://arxiv.org/abs/2408.07605)
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | arXiv 2024 | [](https://panacea-ad.github.io/) | - |
| `DiVE` | [](https://arxiv.org/abs/2409.01595)
DiVE: DiT-Based Video Generation with Enhanced Control | arXiv 2024 | [](https://liautoad.github.io/DIVE/) | [](https://github.com/LiAutoAD/DIVE) |
| `SyntheOcc` | [](https://arxiv.org/abs/2410.00337)
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs | arXiv 2024 | [](https://len-li.github.io/syntheocc-web/) | [](https://github.com/EnVision-Research/SyntheOcc) |
| `MagicDrive-V2` | [](https://arxiv.org/abs/2411.13807)
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control | arXiv 2024 | [](https://gaoruiyuan.com/magicdrive-v2/) | - |
| `HoloDrive` | [](https://arxiv.org/abs/2412.01407)
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving | arXiv 2024 | - | - |
| `CogDriving` | [](https://arxiv.org/abs/2412.03520)
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention | arXiv 2024 | [](https://luhannan.github.io/CogDrivingPage/) | - |
| `UniMLVG` | [](https://arxiv.org/abs/2412.04842)
UniMLVG: Unified Framework for Multi-View Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving | arXiv 2024 | - | [](https://github.com/SenseTime-FVG/OpenDWM) |
| `DrivePhysica` | [](https://arxiv.org/abs/2412.08410)
Physical Informed Driving World Model | arXiv 2024 | [](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html) | - |
| `DriveDreamer-2` | [](https://arxiv.org/abs/2403.06845)
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | AAAI 2025 | [](https://drivedreamer2.github.io/) | [](https://github.com/f1yfisher/DriveDreamer2) |
| `SubjectDrive` | [](https://arxiv.org/abs/2403.19438)
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control | AAAI 2025 | [](https://subjectdrive.github.io/) | - |
| `Glad` | [](https://arxiv.org/abs/2503.00045)
Glad: A Streaming Scene Generator for Autonomous Driving | ICLR 2025 | - | [](https://github.com/xb534/Glad) |
| `DualDiff` | [](https://arxiv.org/abs/2505.01857)
DualDiff: Dual-Branch Diffusion Model for Autonomous Driving with Semantic Fusion | ICRA 2025 | - | [](https://github.com/yangzhaojason/DualDiff) |
| `UniScene` | [](https://arxiv.org/abs/2412.05435)
UniScene: Unified Occupancy-Centric Driving Scene Generation | CVPR 2025 | [](https://arlo0o.github.io/uniscene/) | [](https://github.com/Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation) |
| `DriveScape` | [](https://arxiv.org/abs/2409.05463)
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation | CVPR 2025 | [](https://metadrivescape.github.io/papers_project/drivescapev1/index.html) | - |
| `Cosmos-Transfer1` | [](https://arxiv.org/abs/2503.14492)
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control | arXiv 2025 | [](https://research.nvidia.com/labs/dir/cosmos-transfer1/) | [](https://github.com/nvidia-cosmos/cosmos-transfer1) |
| `DualDiff+` | [](https://arxiv.org/abs/2503.03689)
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | arXiv 2025 | - | [](https://github.com/yangzhaojason/DualDiff) |
| `CoGen` | [](https://arxiv.org/abs/2503.22231)
CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving | arXiv 2025 | [](https://xiaomi-research.github.io/cogen/) | - |
| `NoiseController` | [](https://arxiv.org/abs/2504.18448)
NoiseController: Towards Consistent Multi-View Video Generation via Noise Decomposition and Collaboration | arXiv 2025 | - | - |
| `STAGE` | [](https://arxiv.org/abs/2506.13138)
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation | arXiv 2025 | - | - |
||
### :two:
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `GAIA-1` | [](https://arxiv.org/abs/2309.17080)
GAIA-1: A Generative World Model for Autonomous Driving | arXiv 2023 | [](https://wayve.ai/thinking/scaling-gaia-1/) | - |
| `ADriver-I` | [](https://arxiv.org/abs/2311.13549)
ADriver-I: A General World Model for Autonomous Driving | arXiv 2023 | - | - |
| `Drive-WM` | [](https://arxiv.org/abs/2311.17918)
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving | CVPR 2024 | [](https://drive-wm.github.io/) | [](https://github.com/BraveGroup/Drive-WM) |
| `DriveDreamer` | [](https://arxiv.org/abs/2309.09777)
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving | ECCV 2024 | [](https://drivedreamer.github.io/) | [](https://github.com/JeffWang987/DriveDreamer) |
| `GenAD` | [](https://arxiv.org/abs/2403.09630)
GenAD: Generalized Predictive Model for Autonomous Driving | ECCV 2024 | - | [](https://github.com/OpenDriveLab/DriveAGI) |
| `Vista` | [](https://arxiv.org/abs/2405.17398)
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability | NeurIPS 2024 | [](https://vista-demo.github.io/) | [](https://github.com/OpenDriveLab/Vista) |
| `InfinityDrive` | [](https://arxiv.org/abs/2412.01522)
InfinityDrive: Breaking Time Limits in Driving World Models | arXiv 2024 | [](https://metadrivescape.github.io/papers_project/InfinityDrive/page.html) | - |
| `DrivingGPT` | [](https://arxiv.org/abs/2412.18607)
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers | arXiv 2024 | [](https://rogerchern.github.io/DrivingGPT/) | - |
| `DrivingWorld` | [](https://arxiv.org/abs/2412.19505)
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | arXiv 2024 | [](https://huxiaotaostasy.github.io/DrivingWorld/index.html) | [](https://github.com/YvanYin/DrivingWorld) |
| `GEM` | [](https://arxiv.org/abs/2412.11198)
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control | CVPR 2025 | [](https://vita-epfl.github.io/GEM.github.io/) | [](https://github.com/vita-epfl/GEM) |
| `MaskGWM` | [](https://arxiv.org/abs/2502.11663)
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction | CVPR 2025 | - | [](https://github.com/SenseTime-FVG/OpenDWM) |
| `VaViM & VaVAM` | [](https://arxiv.org/abs/2502.15672)
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | arXiv 2025 | [](https://valeoai.github.io/vavim-vavam/) | [](https://github.com/valeoai/VideoActionModel) |
| `MiLA` | [](https://arxiv.org/abs/2503.15875)
MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving | arXiv 2025 | - | [](https://github.com/xiaomi-mlab/mila.github.io) |
| `GAIA-2` | [](https://arxiv.org/abs/2503.20523)
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving | arXiv 2025 | [](https://wayve.ai/thinking/gaia-2) | - |
| `DriVerse` | [](https://arxiv.org/abs/2504.18576)
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment | arXiv 2025 | - | - |
| `PosePilot` | [](https://arxiv.org/abs/2505.01729)
PosePilot: Steering Camera Pose for Generative World Models with Self-Supervised Depth | arXiv 2025 | - | - |
| `ProphetDWM` | [](https://arxiv.org/abs/2505.18650)
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos | arXiv 2025 | - | - |
| `GeoDrive` | [](https://arxiv.org/abs/2505.22421)
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control | arXiv 2025 | - | [](https://github.com/antonioo-c/GeoDrive) |
| `LongDWM` | [](https://arxiv.org/abs/2506.01546)
LongDWM: Cross-Granularity Distillation for Building A Long-Term Driving World Model | arXiv 2025 | [](https://wang-xiaodong1899.github.io/longdwm/) | [](https://github.com/Wang-Xiaodong1899/Long-DWM) |
||
## :three: Closed-Loop Simulator
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `MagicDrive3D` | [](https://arxiv.org/abs/2405.14475)
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes | arXiv 2024 | [](https://gaoruiyuan.com/magicdrive3d/) | [](https://github.com/flymin/MagicDrive3D) |
| `DriveArena` | [](https://arxiv.org/abs/2408.00415)
DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving | arXiv 2024 | [](https://pjlab-adg.github.io/DriveArena/) | [](https://github.com/PJLab-ADG/DriveArena) |
| `DreamForge` | [](https://arxiv.org/abs/2409.04003)
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes | arXiv 2024 | [](https://pjlab-adg.github.io/DriveArena/dreamforge/) | [](https://github.com/PJLab-ADG/DriveArena) |
| `InfiniCube` | [](https://arxiv.org/abs/2412.03934)
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models | arXiv 2024 | [](https://research.nvidia.com/labs/toronto-ai/infinicube/) | - |
| `Doe-1` | [](https://arxiv.org/abs/2412.09627)
Doe-1: Closed-Loop Autonomous Driving with Large World Model | arXiv 2024 | [](https://wzzheng.net/Doe/) | [](https://github.com/wzzheng/Doe) |
| `DrivingSphere` | [](https://arxiv.org/abs/2411.11252)
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation | CVPR 2025 | [](https://yanty123.github.io/DrivingSphere/) | [](https://github.com/yanty123/DrivingSphere) |
| `UMGen` | [](https://arxiv.org/abs/2503.14945)
Generating Multimodal Driving Scenes via Next-Scene Prediction | CVPR 2025 | [](https://yanhaowu.github.io/UMGen/) | [](https://github.com/YanhaoWu/UMGen) |
| `UniFuture` | [](https://arxiv.org/abs/2503.13587)
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception | arXiv 2025 | [](https://dk-liang.github.io/UniFuture/) | [](https://github.com/dk-liang/UniFuture) |
| `DiST-4D` | [](https://arxiv.org/abs/2503.15208)
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation | arXiv 2025 | - | - |
| `Nexus` | [](https://arxiv.org/abs/2504.10485)
Decoupled Diffusion Sparks Adaptive Scene Generation | arXiv 2025 | [](https://opendrivelab.com/Nexus/) | [](https://github.com/OpenDriveLab/Nexus) |
| `Challenger` | [](https://arxiv.org/abs/2505.15880)
Challenger: Affordable Adversarial Driving Video Generation | arXiv 2025 | [](https://pixtella.github.io/Challenger/) | [](https://github.com/Pixtella/Challenger) |
| `Cosmos-Drive` | [](https://arxiv.org/abs/2506.09042)
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | arXiv 2025 | [](https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams/) | [](https://github.com/nv-tlabs/Cosmos-Drive-Dreams) |
||
### :four: Scene Reconstructor
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `3DGS` | [](https://arxiv.org/abs/2401.01339)
3D Gaussian Splatting for Real-Time Radiance Field Rendering | TOG 2023 | [](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) | [](https://github.com/graphdeco-inria/gaussian-splatting)
| `StreetGaussian` | [](https://arxiv.org/abs/2401.01339)
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting | ECCV 2024 | [](https://zju3dv.github.io/street_gaussians) | [](https://github.com/zju3dv/street_gaussians) |
| `4DGF` | [](https://arxiv.org/abs/2406.03175)
Dynamic 3D Gaussian Fields for Urban Areas | NeurIPS 2024 | [](https://tobiasfshr.github.io/pub/4dgf/) | [](https://github.com/tobiasfshr/map4d) |
| `MagicDrive3D` | [](https://arxiv.org/abs/2405.14475)
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes | arXiv 2024 | [](https://gaoruiyuan.com/magicdrive3d/) | [](https://github.com/flymin/MagicDrive3D) |
| `S3Gaussian` | [](https://arxiv.org/abs/2405.20323)
S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving | arXiv 2024 | [](https://wzzheng.net/S3Gaussian/) | [](https://github.com/nnanhuang/S3Gaussian/) |
| `VDG` | [](https://arxiv.org/abs/2406.18198)
VDG: Vision-Only Dynamic Gaussian for Driving Simulation | arXiv 2024 | [](https://3d-aigc.github.io/VDG/) | [](https://github.com/lifuguan/VDG_official) |
| `UniGaussian` | [](https://arxiv.org/abs/2411.15355)
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations | arXiv 2024 | | |
| `InfiniCube` | [](https://arxiv.org/abs/2412.03934)
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models | arXiv 2024 | [](https://research.nvidia.com/labs/toronto-ai/infinicube/) | |
| `Stag-1` | [](https://arxiv.org/abs/2412.05280)
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | arXiv 2024 | [](https://wzzheng.net/Stag/) | [](https://github.com/wzzheng/Stag) |
| `DrivingRecon` | [](https://arxiv.org/abs/2412.09043)
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving | arXiv 2024 | | [](https://github.com/EnVision-Research/DriveRecon) |
| `OccScene` | [](https://arxiv.org/abs/2412.11183)
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation | arXiv 2024 | | |
| `SGD` | [](https://arxiv.org/abs/2403.20079)
SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior | WACV 2025 | | |
| `OmniRe` | [](https://arxiv.org/abs/2408.16760)
OmniRe: Omni Urban Scene Reconstruction| ICLR 2025 |[](https://ziyc.github.io/omnire/) | |
| `DriveDreamer4D` | [](https://arxiv.org/abs/2410.13571)
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation | CVPR 2025 | [](https://drivedreamer4d.github.io/) | [](https://github.com/GigaAI-research/DriveDreamer4D) |
| `DeSiRe-GS` | [](https://arxiv.org/abs/2411.11921)
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes | CVPR 2025 | | [](https://github.com/chengweialan/DeSiRe-GS) |
| `SplatAD` | [](https://arxiv.org/abs/2411.16816)
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving | CVPR 2025 | [](https://research.zenseact.com/publications/splatad/) | [](https://github.com/carlinds/splatad) |
| `ReconDreamer` | [](https://arxiv.org/abs/2411.19548)
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration | CVPR 2025 | [](https://recondreamer.github.io/) | [](https://github.com/GigaAI-research/ReconDreamer/) |
| `FreeSim` | [](https://arxiv.org/abs/2412.03566)
FreeSim: Toward Free-Viewpoint Camera Simulation in Driving Scenes | CVPR 2025 | [](https://drive-sim.github.io/freesim/) | |
| `StreetCrafter` | [](https://arxiv.org/abs/2412.13188)
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models | CVPR 2025 | [](https://zju3dv.github.io/street_crafter/) | [](https://github.com/zju3dv/street_crafter) |
| `FlexDrive` | [](https://arxiv.org/abs/2502.21093)
FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering | CVPR 2025 | | |
| `S-NeRF++` | [](https://arxiv.org/abs/2402.02112)
S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation | TPAMI 2025 | | |
| `DreamDrive` | [](https://arxiv.org/abs/2501.00601)
DreamDrive: Generative 4D Scene Modeling from Street View Images | arXiv 2025 | [](https://pointscoder.github.io/DreamDrive/) | |
| `Uni-Gaussians` | [](https://arxiv.org/abs/2503.08317)
Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios | arXiv 2025 | [](https://zikangyuan.github.io/UniGaussians/) | |
| `MuDG` | [](https://arxiv.org/abs/2503.10604)
MuDG: Taming Multi-Modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction | arXiv 2025 | | [](https://github.com/heiheishuang/MuDG) |
| `UniFuture` | [](https://arxiv.org/abs/2503.13587)
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception | arXiv 2025 | |[](https://github.com/dk-liang/UniFuture)|
| `DiST-4D` | [](https://arxiv.org/abs/2503.15208)
Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation | arXiv 2025 |[](https://royalmelon0505.github.io/DiST-4D/) | [](https://github.com/royalmelon0505/dist4d) |
| `SceneCrafter` | [](https://arxiv.org/abs/2503.18108)
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving Humanoid Robots | arXiv 2025 | | [](https://github.com/cancaries/SceneCrafter) |
| `ReconDreamer++` | [](https://arxiv.org/abs/2503.18438)
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation | arXiv 2025 | [](https://recondreamer-plus.github.io/) | [](https://github.com/GigaAI-research/ReconDreamer-Plus) |
| `RealEngine` | [](https://arxiv.org/abs/2505.16902)
RealEngine: Simulating Autonomous Driving in Realistic Context | arXiv 2025 | | [](https://github.com/fudan-zvg/RealEngine) |
| `GeoDrive` | [](https://arxiv.org/abs/2505.22421)
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control | arXiv 2025 | | [](https://github.com/antonioo-c/GeoDrive) |
| `PseudoSimulation` | [](https://arxiv.org/abs/2506.04218)
Pseudo-Simulation for Autonomous Driving | arXiv 2025 | | [](https://github.com/autonomousvision/navsim) |
| `Dreamland` | [](https://arxiv.org/abs/2506.08006)
Dreamland: Controllable World Creation with Simulator and Generative Models | arXiv 2025 | [](https://metadriverse.github.io/dreamland/) | |
||
# 2. World Modeling from Occupancy Generation
### :one:
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `SSD` | [](https://arxiv.org/abs/2301.00527)
Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data | arXiv 2023 | - | [](https://github.com/zoomin-lee/scene-scale-diffusion) |
| `WoVoGen` | [](https://arxiv.org/abs/2312.02934)
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation | ECCV 2024 | - | [](https://github.com/fudan-zvg/WoVoGen) |
| `UrbanDiff` | [](https://arxiv.org/abs/2403.11697)
Urban Scene Diffusion through Semantic Occupancy Map | arXiv 2024 | [](https://metadriverse.github.io/urbandiff/) | - |
| `DrivingSphere` | [](https://arxiv.org/abs/2412.03934)
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation | CVPR 2025 | [](https://yanty123.github.io/DrivingSphere/) | [](https://github.com/yanty123/DrivingSphere) |
| `UniScene` | [](https://arxiv.org/abs/2412.05435)
UniScene: Unified Occupancy-Centric Driving Scene Generation | CVPR 2025 | [](https://arlo0o.github.io/uniscene/) | [](https://github.com/Arlo0o/UniScene-Unified-Occupancy-centric-Driving-Scene-Generation) |
| `OccScene` | [](https://arxiv.org/abs/2412.11183)
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation | arXiv 2024 | - | - |
| `Liu et al.` | [](https://arxiv.org/abs/2503.07152)
Controllable 3D Outdoor Scene Generation via Scene Graphs | ICCV 2025 | [](https://yuheng.ink/project-page/control-3d-scene/) | [](https://github.com/yuhengliu02/control-3d-scene) |
||
### :two: Occupancy Forecaster
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `Emergent-Occ` | [](https://arxiv.org/abs/2210.01917)
Differentiable Raycasting for Self-supervised Occupancy Forecasting | ECCV 2022 | - | [](https://github.com/tarashakhurana/emergent-occ-forecasting) |
| `Khurana et al.` | [](https://arxiv.org/abs/2302.13130)
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting | CVPR 2023 | [](https://www.cs.cmu.edu/~tkhurana/ff4d/index.html) | [](https://github.com/tarashakhurana/4d-occ-forecasting) |
| `UniWorld` | [](https://arxiv.org/abs/2308.07234)
UniWorld: Autonomous Driving Pre-Training via World Models | arXiv 2023 | - | - |
| `UniScene` | [](https://arxiv.org/abs/2305.18829)
UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction for Autonomous Driving | arXiv 2023 | - | [](https://github.com/chaytonmin/UniScene) |
| `OccWorld` | [](https://arxiv.org/abs/2311.16038)
OccWorld: Learning A 3D Occupancy World Model for Autonomous Driving | ECCV 2024 | [](https://wzzheng.net/OccWorld/) | [](https://github.com/wzzheng/OccWorld) |
| `Cam4DOcc` | [](https://arxiv.org/abs/2311.17663)
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | CVPR 2024 | - | [](https://github.com/haomo-ai/Cam4DOcc) |
| `DriveWorld` | [](https://arxiv.org/abs/2405.04390)
DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving | CVPR 2024 | - | - |
| `OccSora` | [](https://arxiv.org/abs/2405.20337)
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving | arXiv 2024 | [](https://wzzheng.net/OccSora/) | [](https://github.com/wzzheng/OccSora) |
| `UnO` | [](https://arxiv.org/abs/2406.08691)
UnO: Unsupervised Occupancy Fields for Perception and Forecasting | CVPR 2024 | [](https://waabi.ai/uno/) | - |
| `LOPR` | [](https://arxiv.org/abs/2407.21126)
Self-Supervised Multi-Future Occupancy Forecasting for Autonomous Driving | arXiv 2024 | - | - |
| `FSF-Net` | [](https://arxiv.org/abs/2409.15841)
FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving | arXiv 2024 | - | - |
| `OccLLaMA` | [](https://arxiv.org/abs/2409.03272)
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving | arXiv 2024 | -| - |
| `DOME` | [](https://arxiv.org/abs/2410.10429)
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model | arXiv 2024 | [](https://gusongen.github.io/DOME) | [](https://github.com/gusongen/DOME) |
| `GaussianAD` | [](https://arxiv.org/abs/2412.10371)
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving | arXiv 2024 | [](https://wzzheng.net/GaussianAD) | [](https://github.com/wzzheng/GaussianAD) |
| `DFIT-OccWorld` | [](https://arxiv.org/abs/2412.13772)
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training | arXiv 2024 | - | - |
| `Drive-OccWorld` | [](https://arxiv.org/abs/2408.14197)
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving | AAAI 2025 | [](https://drive-occworld.github.io/) | [](https://github.com/yuyang-cloud/Drive-OccWorld) |
| `RenderWorld` | [](https://arxiv.org/abs/2409.11356)
RenderWorld: World Model with Self-Supervised 3D Label | ICRA 2025 | - | - |
| `Occ-LLM` | [](https://arxiv.org/abs/2502.06419)
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models | ICRA 2025 | - | - |
| `EfficientOCF` | [](https://arxiv.org/abs/2411.14169)
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting | CVPR 2025 | - | - |
| `DIO` | [](https://openaccess.thecvf.com/content/CVPR2025/papers/Diehl_DIO_Decomposable_Implicit_4D_Occupancy-Flow_World_Model_CVPR_2025_paper.pdf)
DIO: Decomposable Implicit 4D Occupancy-Flow World Model | CVPR 2025 | - | - |
| `T3Former` | [](https://arxiv.org/abs/2503.07338)
Temporal Triplane Transformers as Occupancy World Models | arXiv 2025 | - | - |
| `UniOcc` | [](https://arxiv.org/abs/2503.24381)
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving | arXiv 2025 | [](https://huggingface.co/datasets/tasl-lab/uniocc) | [](https://github.com/tasl-lab/UniOcc) |
| `COME` | [](https://arxiv.org/abs/2506.13260)
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model | arXiv 2025 | - | [](https://github.com/synsin0/COME) |
||
### :three:
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `SemCity` | [](https://arxiv.org/abs/2403.07773)
SemCity: Semantic Scene Generation with Triplane Diffusion | CVPR 2024 | [](https://sglab.kaist.ac.kr/SemCity/) |[](https://github.com/zoomin-lee/SemCity) |
| `XCube` | [](https://arxiv.org/abs/2312.03806)
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies | CVPR 2024 | [](https://research.nvidia.com/labs/toronto-ai/xcube/) | [](https://github.com/nv-tlabs/XCube) |
| `PDD` | [](https://arxiv.org/abs/2311.12085)
Pyramid Diffusion for Fine 3D Large Scene Generation | ECCV 2024 | [](https://yuheng.ink/project-page/pyramid-discrete-diffusion) | [](https://github.com/yuhengliu02/pyramid-discrete-diffusion) |
| `InfiniCube` | [](https://arxiv.org/abs/2412.03934)
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models | arXiv 2024 | [](https://research.nvidia.com/labs/toronto-ai/infinicube/) | - |
| `DynamicCity` | [](https://arxiv.org/abs/2410.18084)
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes | ICLR 2025 | [](https://dynamic-city.github.io/) | [](https://github.com/3DTopia/DynamicCity) |
| `X-Scene` | [](https://arxiv.org/abs/2506.13558)
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability | arXiv 2025 | [](https://x-scene.github.io/) | [](https://github.com/yuyang-cloud/X-Scene) |
| `PrITTI` | [](https://arxiv.org/abs/2506.19117)
PrITTI: Primitive-Based Generation of Controllable and Editable 3D Semantic Scenes | arXiv 2025 | [](https://raniatze.github.io/pritti/) | [](https://github.com/avg-dev/PrITTI) |
# 3. World Modeling from LiDAR Generation
### :one:
> :timer_clock: In chronological order, from the earliest to the latest.
| Model | Paper | Venue | Website | GitHub |
|:-:|:-|:-:|:-:|:-:|
||
| `DUSty-GAN` | [](https://arxiv.org/abs/2102.11952)
Learning to Drop Points for LiDAR Scan Synthesis | IROS 2021 |
| `LiDARGen` | [](https://arxiv.org/abs/2209.03954)
Learning to Generate Realistic LiDAR Point Clouds | ECCV 2022 |
| `DUSty-GAN v2` | [](https://arxiv.org/abs/2210.11750)
Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data | WACV 2023 |
| `UltraLiDAR` | [](https://arxiv.org/abs/2311.01448)
UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation | CVPR 2023 | [](https://waabi.ai/ultralidar/) | |
| `Copilot4D` | [](https://arxiv.org/abs/2311.01017)
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion | ICLR 2024 |
| `R2DM` | [](https://arxiv.org/abs/2309.09256)
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models | ICRA 2024 | [](https://kazuto1011.github.io/r2dm) | |
| `LiDiff` | [](https://arxiv.org/abs/2403.13470)
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion | CVPR 2024 | - | [](https://github.com/PRBonn/LiDiff) |
| `LiDM` | [](https://arxiv.org/abs/2404.00815)
Towards Realistic Scene Generation with LiDAR Diffusion Models | CVPR 2024 | - | |
| `ViDAR` | [](https://arxiv.org/abs/2312.17655)
Visual Point Cloud Forecasting enables Scalable Autonomous Driving | CVPR 2024 |
| `RangeLDM` | [](https://arxiv.org/abs/2403.10094)
RangeLDM: Fast Realistic LiDAR Point Cloud Generation | ECCV 2024 |
| `Text2LiDAR` | [](https://arxiv.org/abs/2407.19628)
Text2LiDAR: Text-Guided LiDAR Point Cloud Generation via Equirectangular Transformer | ECCV 2024 |
| `BEVWorld` | [](https://arxiv.org/abs/2407.05679)
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents | arXiv 2024 |
| `HoloDrive` | [](https://arxiv.org/abs/2412.01407)
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving | arXiv 2024 |
| `LiDARGRIT` | [](https://arxiv.org/abs/2404.05505)
Taming Transformers for Realistic Lidar Point Cloud Generation | arXiv 2024 |
| `SDS` | [](https://arxiv.org/abs/2410.11628)
Simultaneous Diffusion Sampling for Conditional LiDAR Generation | arXiv 2024 |
| `OLiDM` | [](https://arxiv.org/abs/2412.17226)
OLiDM: Object-Aware LiDAR Diffusion Models for Autonomous Driving | AAAI 2025 | [](https://yanty123.github.io/OLiDM) | |
| `X-Drive` | [](https://arxiv.org/abs/2411.01123)
X-Drive: Cross-Modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios | ICLR 2025 | - | [](https://github.com/yichen928/X-Drive) |
| `R2Flow` | [](https://arxiv.org/abs/2412.02241)
Fast LiDAR Data Generation with Rectified Flows | ICRA 2025 | [](https://kazuto1011.github.io/r2flow/) | [](https://github.com/kazuto1011/r2flow) |
| `LidarDM` |
| `WeatherGen` |
| `HERMES` |
| `DriveX` |
| `SPIRAL` |
# 4. Datasets & Benchmarks
# 5. Applications
# 6. Other Resources
### Workshops
### Tutorials
### Talks & Seminars
# 7. Acknowledgements