{"id":33900172,"url":"https://github.com/worldbench/WorldLens","last_synced_at":"2025-12-16T22:00:47.795Z","repository":{"id":327908806,"uuid":"1102815866","full_name":"worldbench/WorldLens","owner":"worldbench","description":"WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World","archived":false,"fork":false,"pushed_at":"2025-12-10T08:59:01.000Z","size":56792,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-10T09:50:16.211Z","etag":null,"topics":["3d","4d","aigc","aigc3d","autonomous-driving","generation","generative-ai","human-preferences","lidar","occupancy","reconstruction","scene-understanding","spatial-intelligence","video-generation","world-model"],"latest_commit_sha":null,"homepage":"https://worldbench.github.io/worldlens","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/worldbench.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-24T04:40:09.000Z","updated_at":"2025-12-10T09:00:36.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/worldbench/WorldLens","commit_stats":null,"previous_names":["worldbench/worldlens"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/worldbench/WorldLens","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbench%2FWorldLens","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbench%2FWorldLens/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbench%2FWorldLens/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbench%2FWorldLens/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/worldbench","download_url":"https://codeload.github.com/worldbench/WorldLens/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbench%2FWorldLens/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27772317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-16T02:00:10.477Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d","4d","aigc","aigc3d","autonomous-driving","generation","generative-ai","human-preferences","lidar","occupancy","reconstruction","scene-understanding","spatial-intelligence","video-generation","world-model"],"created_at":"2025-12-11T23:00:24.409Z","updated_at":"2025-12-16T22:00:47.790Z","avatar_url":"https://github.com/worldbench.png","language":"Python","funding_links":[],"categories":["What Are World Models in 3D and 4D?"],"sub_categories":["Benchmarks"],"readme":"\u003cp align=\"right\"\u003eEnglish | \u003ca href=\"./README_CN.md\"\u003e简体中文\u003c/a\u003e\u003c/p\u003e  \n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/figures/worldbench.gif\" width=\"12.8%\" align=\"center\"\u003e\n\n  \u003ch1 align=\"center\"\u003e\n    \u003cstrong\u003eWorldLens: Full-Spectrum Evaluations of Driving World Models in Real World\u003c/strong\u003e\n  \u003c/h1\u003e\n\n  \u003cp align=\"center\"\u003e\n    \u003cstrong\u003e:earth_asia: WorldBench Team\u003c/strong\u003e \n  \u003c/p\u003e\n\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://worldbench.github.io/assets_common/papers/worldlens.pdf\" target='_blank'\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Paper-%F0%9F%93%96-darkred\"\u003e\n    \u003c/a\u003e\u0026nbsp;\n    \u003ca href=\"https://worldbench.github.io/worldlens\" target='_blank'\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Project-%F0%9F%94%97-blue\"\u003e\n    \u003c/a\u003e\u0026nbsp;\n    \u003ca href=\"https://worldbench.github.io/worldlens\" target='_blank'\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Leaderboard-%F0%9F%94%97-yellow\"\u003e\n    \u003c/a\u003e\u0026nbsp;\n    \u003ca href=\"https://huggingface.co/datasets/worldbench/videogen\" target='_blank'\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Dataset-%F0%9F%94%97-green\"\u003e\n    \u003c/a\u003e\u0026nbsp;\n    \u003ca href=\"\" target='_blank'\u003e\n      \u003cimg src=\"https://visitor-badge.laobi.icu/badge?page_id=worldbench.WorldLens\"\u003e\n    \u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\n\n| \u003cimg src=\"docs/figures/teaser.png\" alt=\"teaser\" width=\"100%\"\u003e |\n| :-: |\n\n### :grey_question: Is your driving world model an all-around player? \n\n- This work presents `WorldLens`, a unified benchmark encompassing evaluations on $^1$**Generation**, $^2$**Reconstruction**, $^3$**Action-Following**, $^4$**Downstream Task**, and $^5$**Human Preference**, across **a total of 24 dimensions** spanning visual realism, geometric consistency, functional reliability, and perceptual alignment.\n- We observe no single model dominates across all axes, highlighting the need for balanced progress toward physically and behaviorally realistic world modeling.\n- For additional visual examples, kindly refer to our :earth_asia: [Project Page](https://worldbench.github.io/worldlens).\n\n\n\n\n### :books: Citation\nIf you find this work helpful for your research, please kindly consider citing our papers:\n\n```bibtex\n@article{worldlens,\n    title   = {{WorldLens}: Full-Spectrum Evaluations of Driving World Models in Real World},\n    author  = {Ao Liang and Lingdong Kong and Tianyi Yan and Hongsi Liu and Wesley Yang and Ziqi Huang and Wei Yin and Jialong Zuo and Yixuan Hu and Dekai Zhu and Dongyue Lu and Youquan Liu and Guangfeng Jiang and Linfeng Li and Xiangtai Li and Long Zhuo and Lai Xing Ng and Benoit R. Cottereau and Changxin Gao and Liang Pan and Wei Tsang Ooi and Ziwei Liu},\n    journal = {arXiv preprint arXiv:2512.10958},\n    year    = {2025}\n}\n```\n```bibtex\n@article{survey_3d_4d_world_models,\n    title   = {{3D} and {4D} World Modeling: A Survey},\n    author  = {Lingdong Kong and Wesley Yang and Jianbiao Mei and Youquan Liu and Ao Liang and Dekai Zhu and Dongyue Lu and Wei Yin and Xiaotao Hu and Mingkai Jia and Junyuan Deng and Kaiwen Zhang and Yang Wu and Tianyi Yan and Shenyuan Gao and Song Wang and Linfeng Li and Liang Pan and Yong Liu and Jianke Zhu and Wei Tsang Ooi and Steven C. H. Hoi and Ziwei Liu},\n    journal = {arXiv preprint arXiv:2509.07996},\n    year    = {2025}\n}\n```\n\n\n## Updates\n\n- **[12/2025]** - The official :balance_scale: [WorldLens Leaderboard](https://huggingface.co/spaces/worldbench/WorldLens) is online at HuggingFace Spaces. We invite researchers and practitioners to submit their models for evaluation on the leaderboard, enabling consistent comparison and supporting progress in world model research.\n- **[12/2025]** - A collection of 3D and 4D world models is avaliable at :hugs: [`awesome-3d-4d-world-models`](https://github.com/worldbench/awesome-3d-4d-world-models).\n- **[12/2025]** - The [Project Page](https://worldbench.github.io/worldlens) is online. :rocket:\n\n\n\n## Outline\n- [WorldLens Benchmark](#earth_asia-worldlens-benchmark)\n- [WorldLens Leaderboard](#balance_scale-worldlens-leaderboard)\n- [Installation](#gear-installation)\n- [Data Preparation](#hotsprings-data-preparation)\n- [Getting Started](#rocket-getting-started)\n- [WorldLens-26K](#hugs-worldlens-26k)\n- [WorldLens-Agent](#robot-worldlens-agent)\n- [TODO List](#memo-todo-list)\n- [License](#license)\n- [Acknowledgements](#acknowledgements)\n\n\n\n## :earth_asia: WorldLens Benchmark\n\n| \u003cimg src=\"docs/figures/bench.png\" alt=\"framework\" width=\"100%\"\u003e|\n| :-: |\n\n- Generative world models must go beyond visual realism to achieve geometric consistency, physical plausibility, and functional reliability. `WorldLens` is a unified benchmark that evaluates these capabilities across five complementary aspects - from low-level appearance fidelity to high-level behavioral realism.\n\n- Each aspect is decomposed into fine-grained, interpretable dimensions, forming a comprehensive framework that bridges human perception, physical reasoning, and downstream utility.\n\nFor additional details and visual examples, kindly refer to our :books: [Paper](https://worldbench.github.io/assets_common/papers/worldlens.pdf) and :earth_asia: [Project Page](https://worldbench.github.io/worldlens).\n\n\n\n## :balance_scale: WorldLens Leaderboard\n\n||||\n|:-:|:-:|:-|\n| \u003cimg src=\"docs/icons/generation.gif\" width=\"100\"\u003e | Generation | Measuring whether a model can synthesize visually realistic, temporally stable, and semantically consistent scenes. Even state-of-the-art models that achieve low perceptual error (e.g., LPIPS, FVD) often suffer from view flickering or motion instability, revealing the limits of current diffusion-based architectures.\n| \u003cimg src=\"docs/icons/reconstruction.gif\" width=\"100\"\u003e | Reconstruction | Probing whether generated videos can be reprojected into a coherent 4D scene using differentiable rendering. Models that appear sharp in 2D frequently collapse when reconstructed, producing geometric \"floaters\": a gap that exposes how temporal coherence remains weakly coupled in most pipelines.\n| \u003cimg src=\"docs/icons/action-following.gif\" width=\"100\"\u003e | Action-Following | Testing if a pre-trained action planner can operate safely inside the generated world. High open-loop realism does not guarantee safe closed-loop control; almost all existing world models trigger collisions or off-road drifts, underscoring that photometric realism alone cannot yield functional fidelity.\n| \u003cimg src=\"docs/icons/downstream.gif\" width=\"100\"\u003e | Downstream Task | Evaluating whether the synthetic data support downstream perception models trained on real-world datasets. Even visually appealing worlds may degrade detection or segmentation accuracy by 30-50%, highlighting that alignment to task distributions, not just image quality, is vital for practical usability.\n| \u003cimg src=\"docs/icons/human-preference.gif\" width=\"100\"\u003e | Human Preference | Capturing subjective scores such as world realism, physical plausibility, and behavioral safety through large-scale human annotations. Our study reveals that models with strong geometric consistency are generally rated as more \"real\", confirming that perceptual fidelity is inseparable from structural coherence.\n||||\n\n### Leaderboard\n\nAn interactive :balance_scale: [WorldLens Leaderboard](https://huggingface.co/spaces/worldbench/WorldLens) is online at :hugs: HuggingFace Spaces. We invite researchers and practitioners to submit their models for evaluation on the leaderboard, enabling consistent comparison and supporting progress in world model research.\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u0026nbsp\u003cb\u003eBenchmarked Models\u003c/b\u003e\u003c/summary\u003e\n\n\u003e - [x] **[MagicDrive](), ICLR 2023.**\n\u003e - [x] **[Panacea](), CVPR 2024.**\n\u003e - [x] **[DreamForge](), arXiv 2024.**\n\u003e - [x] **[DriveDreamer-2](), AAAI 2025.**\n\u003e - [x] **[DrivingSphere](), CVPR 2025.**\n\u003e - [x] **[OpenDWM](), CVPR 2025.**\n\u003e - [x] **[MagicDrive-V2](), ICCV 2025.**\n\u003e - [x] **[DiST-4D](), ICCV 2025.**\n\u003e - [x] **[RLGF](), NeurIPS 2025.**\n\u003e - [x] **[X-Scene](), NeurIPS 2025.**\n\u003e - [ ] . . .\n\n\u003c/details\u003e\n\n\n## :gear: Installation\n\nThe `WorldLens` evaluation toolkit is developed and tested under Python 3.9 + CUDA 11.8. We recommend using Conda to manage the environment.\n\n- Create Environment:\n```shell\nconda create -n worldbench python=3.9.20\nconda activate worldbench\n```\n\n- Install PyTorch:\n```shell\npip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 \\\n    --index-url https://download.pytorch.org/whl/cu118\n```\n\n- Install MMCV (with CUDA):\n```shell\ncd worldbench/third_party/mmcv-1.6.0\nMMCV_WITH_OPS=1 pip install -e .\n```\n\n\u003e **Note**:\nWe modified the C++ standard to C++17 for better compatibility. You may adjust it in worldbench/third_party/mmcv-1.6.0/setup.py based on your system.\n\n- Install MMSegmentation:\n```shell\npip install https://github.com/open-mmlab/mmsegmentation/archive/refs/tags/v0.30.0.zip\n```\n\n- Install MMDetection:\n```shell\npip install mmdet==2.28.2\n```\n\n- Install BEVFusion-based MMDet3D:\n```shell\ngit clone --recursive https://github.com/worldbench/dev-evalkit.git\ncd worldbench/third_party/bevfusion\npython setup.py develop\n```\n\u003e Additional Notes:\n\u003e 1. C++ standard was updated to C++17.\n\u003e 2. We modified the sparse convolution import logic at\n`worldbench/third_party/bevfusion/mmdet3d/ops/spconv/conv.py.`\n\n- Install MMDetection3D (v1.0.0rc6):\n```shell\ncd worldbench/third_party/mmdetection3d-1.0.0rc6\npip install -v -e .\n```\nRequired dependency versions:\n```shell\nnumpy == 1.23.5\nnumba == 0.53.0\n```\n\n- Pretrained Models\nWorldLens relies on several pretrained models (e.g., CLIP, segmentation, depth networks). Please download them from [HuggingFace](https://huggingface.co/datasets/worldbench/videogen/tree/main/pretrained_models) and place them under: `./pretrained_models/`\n\n\n\n\n## :hotsprings: Data Preparation\nHere we take nuScenes as an example.\nRequired Files:\n- nuScenes official dataset\n- 12 Hz interpolated annotations from [ECCV 2024 Workshop – CODA Track 2](https://coda-dataset.github.io/w-coda2024/track2/)\n- Tracking \u0026 temporal .pkl files from [HuggingFace – WorldLens Data Preparation](https://huggingface.co/datasets/worldbench/videogen/tree/main/data_preparation)\n\n**Final Directory Structure**\n```Shell\ndata\n  ├── nuscenes\n  │   ├── can_bus\n  │   ├── lidarseg\n  │   ├── maps\n  │   ├── occ3d\n  │   ├── samples\n  │   ├── sweeps\n  │   ├── v1.0-mini\n  │   └── v1.0-trainval\n  ├── nuscenes_map_aux_12Hz_interp\n  │   └── val_200x200_12Hz_interp.h5\n  ├── nuscenes_mmdet3d-12Hz\n  │   ├── nuscenes_interp_12Hz_dbinfos_train.pkl\n  │   ├── nuscenes_interp_12Hz_infos_track2_eval.pkl\n  │   ├── nuscenes_interp_12Hz_infos_train.pkl\n  │   └── nuscenes_interp_12Hz_infos_val.pkl\n  ├── nuscenes_mmdet3d-12Hz_description\n  │   ├── nuscenes_interp_12Hz_updated_description_train.pkl\n  │   └── nuscenes_interp_12Hz_updated_description_val.pkl\n  ├── nuscenes_mmdet3d_2\n  │   └── nuscenes_infos_temporal_val_3keyframes.pkl\n  └── nuscenes_track\n      ├── ada_track_infos_train.pkl\n      └── ada_track_infos_val.pkl\n```\n\n\n\n## :rocket: Getting Started\n- Configure Metrics:\n\nAll evaluation metrics are defined in a unified YAML format under `tools/configs/.`\nExample: Temporal (Depth) Consistency:\n```yaml\ntemporal_consistency:\n  - name: temporal_consistency\n    method_name: ${method_name}\n    need_preprocessing: true\n    repeat_times: 1\n    local_save_path: pretrained_models/clip/ViT-B-32.pt\n```\n\n- Run Evaluation:\n```shell\nbash tools/scripts/evaluate.sh $TASK $METHOD_NAME\n```\n- Example: evaluating MagicDrive (video-based world model)\n```shell\nbash tools/scripts/evaluate.sh videogen magicdrive\n```\n\n\n\n### Visualizations\n- Prepare Generated Results:\nDownload model outputs from [HuggingFace](https://huggingface.co/datasets/worldbench/videogen/tree/main/nuscenes) and move them to:\n```shell\n./generated_results\n  ├── dist4d\n  ├── dreamforge\n  ├── drivedreamer2\n  ├── gt\n  ├── magicdrive\n  ├── opendwm\n  └── xscene\n      └── video_submission\n```\n\n- Visualization Tools\n  - Multi-view Panorama Viewer (Cross-view Consistency):\n  ```shell\n  python tools/showcase/video_multi_view_app.py\n  ```\n\n  - Method-to-Method Comparison:\n  ```shell\n  python tools/showcase/video_method_compare_app.py\n  ```\n\n  - GIF-based Comparison:\n  ```shell\n  python tools/showcase/gif_method_compare_app.py\n  ```\n\n\n\n## :hugs: WorldLens-26K\n\nTo be updated.\n\n\n\n## :robot: WorldLens-Agent\n\nTo be updated.\n\n\n\n## :memo: TODO List\n- [x] Initial release. 🚀\n- [ ] Release the WorldLens-26K dataset.\n- [ ] Support additional datasets (Waymo, Argoverse, and more)\n- [ ] Add agent-based automatic evaluators\n- [ ] . . .\n\n\n\n## License\nThis work is under the \u003ca rel=\"license\" href=\"https://www.apache.org/licenses/LICENSE-2.0\"\u003eApache License Version 2.0\u003c/a\u003e, while some specific implementations in this codebase might be under other licenses. Kindly refer to [LICENSE.md](docs/LICENSE.md) for a more careful check, if you are using our code for commercial matters.\n\n\n\n## Acknowledgements\n\nTo be added.\n\n\n## Related Projects\n\n| :sunglasses: Awesome | Projects |\n|:-:|:-|\n| |\n| \u003cimg width=\"95px\" src=\"https://github.com/ldkong1205/ldkong1205/blob/master/Images/worldbench_survey.webp\"\u003e | **3D and 4D World Modeling: A Survey**\u003cbr\u003e[[GitHub Repo](https://github.com/worldbench/survey)] - [[Project Page](https://worldbench.github.io/survey)] - [[Paper](https://worldbench.github.io/assets_common/papers/survey.pdf)] |\n| \u003cimg width=\"95px\" src=\"https://github.com/worldbench/worldbench.github.io/blob/main/assets_common/teasers/vbench.png\"\u003e | **VBench: Comprehensive Benchmark Suite for Video Generative Models**\u003cbr\u003e[[GitHub Repo](https://github.com/Vchitect/VBench)] - [[Project Page](https://vchitect.github.io/VBench-project/)] - [[Paper](https://arxiv.org/abs/2311.17982)] |\n| \u003cimg width=\"95px\" src=\"https://github.com/worldbench/worldbench.github.io/blob/main/assets_common/teasers/vbench2.png\"\u003e | **VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models**\u003cbr\u003e[[GitHub Repo](https://github.com/Vchitect/VBench)] - [[Project Page](https://vchitect.github.io/VBench-project/)] - [[Paper](https://arxiv.org/abs/2411.13503)] |\n| \u003cimg width=\"95px\" src=\"https://github.com/ldkong1205/ldkong1205/blob/master/Images/lidarcrafter.png\"\u003e | **LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences**\u003cbr\u003e[[GitHub Repo](https://github.com/lidarcrafter/toolkit)] - [[Project Page]](https://lidarcrafter.github.io/) - [[Paper](https://arxiv.org/abs/2508.03692)] |\n| \u003cimg width=\"95px\" src=\"https://github.com/ldkong1205/ldkong1205/blob/master/Images/3eed.png\"\u003e | **3EED: Ground Everything Everywhere in 3D**\u003cbr\u003e[[GitHub Repo](https://github.com/worldbench/3EED)] - [[Project Page]](https://project-3eed.github.io/) - [[Paper](https://arxiv.org/abs/2511.01755)] |\n| \u003cimg width=\"95px\" src=\"https://github.com/ldkong1205/ldkong1205/blob/master/Images/drivebench.png\"\u003e | **Are VLMs Ready for Autonomous Driving? A Study from Reliability, Data \u0026 Metric Perspectives**\u003cbr\u003e[[GitHub Repo](https://github.com/drive-bench/toolkit)] - [[Project Page]](https://drive-bench.github.io/) - [[Paper](https://arxiv.org/abs/2501.04003)] |\n| \u003cimg width=\"95px\" src=\"https://github.com/ldkong1205/ldkong1205/blob/master/Images/pi3det.png\"\u003e | **Perspective-Invariant 3D Object Detection**\u003cbr\u003e[[GitHub Repo](https://github.com/pi3det/toolkit)] - [[Project Page]](https://pi3det.github.io/) - [[Paper](https://arxiv.org/abs/2507.17665)] |\n| \u003cimg width=\"95px\" src=\"https://github.com/ldkong1205/ldkong1205/blob/master/Images/dynamiccity.webp\"\u003e | **DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes**\u003cbr\u003e[[GitHub Repo](https://github.com/3DTopia/DynamicCity)] - [[Project Page]](https://dynamic-city.github.io/) - [[Paper](https://arxiv.org/abs/2410.18084)] |\n| |\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbench%2FWorldLens","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fworldbench%2FWorldLens","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbench%2FWorldLens/lists"}