{"id":13441452,"url":"https://github.com/OpenDriveLab/ViDAR","last_synced_at":"2025-03-20T12:30:45.017Z","repository":{"id":214701304,"uuid":"728532286","full_name":"OpenDriveLab/ViDAR","owner":"OpenDriveLab","description":"[CVPR 2024 Highlight] Visual Point Cloud Forecasting","archived":false,"fork":false,"pushed_at":"2024-06-24T02:28:03.000Z","size":37456,"stargazers_count":243,"open_issues_count":7,"forks_count":17,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-08-01T03:34:33.017Z","etag":null,"topics":["autonomous-driving","point-cloud-forecasting","pre-training","world-model"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2312.17655","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenDriveLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["OpenDriveLab"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2023-12-07T06:21:44.000Z","updated_at":"2024-07-29T15:05:23.000Z","dependencies_parsed_at":"2024-01-16T02:45:22.983Z","dependency_job_id":"c7c14827-a2d1-4037-9f30-2363f398278c","html_url":"https://github.com/OpenDriveLab/ViDAR","commit_stats":null,"previous_names":["opendrivelab/vidar"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FViDAR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FViDAR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FViDAR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenDriveLab%2FViDAR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenDriveLab","download_url":"https://codeload.github.com/OpenDriveLab/ViDAR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221759984,"owners_count":16876329,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autonomous-driving","point-cloud-forecasting","pre-training","world-model"],"created_at":"2024-07-31T03:01:34.089Z","updated_at":"2025-03-20T12:30:45.011Z","avatar_url":"https://github.com/OpenDriveLab.png","language":"Python","funding_links":["https://github.com/sponsors/OpenDriveLab"],"categories":["Python","General-Purpose / Sequential / Spatial Latent Grid"],"sub_categories":["2024"],"readme":"# ViDAR: Visual Point Cloud Forecasting\n\n![](./assets/teaser.png \"Visual point cloud forecasting\")\n\n\u003e **Visual Point Cloud Forecasting enables Scalable Autonomous Driving [CVPR 2024 Highlight]**\n\u003e\n\u003e [Zetong Yang](https://scholar.google.com/citations?user=oPiZSVYAAAAJ\u0026hl=en), [Li Chen](https://scholar.google.com/citations?user=ulZxvY0AAAAJ\u0026hl=en\u0026authuser=1), [Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ\u0026hl=en), and [Hongyang Li](https://lihongyang.info/)\n\u003e \n\u003e - Presented by [OpenDriveLab](https://opendrivelab.com/) at Shanghai AI Lab\n\u003e - :mailbox_with_mail: Primary contact: [Zetong Yang]((https://scholar.google.com/citations?user=oPiZSVYAAAAJ\u0026hl=en)) ( tomztyang@gmail.com ) \n\u003e - [arXiv paper](https://arxiv.org/abs/2312.17655) | [Video (YouTube, 5min)](https://www.youtube.com/watch?v=j1dU1ii5Rvg) | [Tutorial on World Model (Bilibili)](https://www.bilibili.com/video/BV1ub421p7Rg/?share_source=copy_web\u0026vd_source=47bdbb6c67891d390b613c403e23dcfb)\n\u003e - [CVPR 2024 Autonomous Deiving Challenge - Predictive World Model](https://opendrivelab.com/challenge2024/#predictive_world_model)\n\n\n## Highlights \u003ca name=\"highlights\"\u003e\u003c/a\u003e\n\n:fire: **Visual point cloud forecasting**, a new self-supervised pre-training task for end-to-end autonomous driving, predicting \nfuture point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.\n\n:star2: **ViDAR**, the first visual point cloud forecasting architecture.\n\n![method](./assets/vidar.png \"Architecture of ViDAR\")\n\n:trophy: Predictive world model, in the form of visual point cloud forecasting, will be a main track in the `CVPR 2024 Autonomous Driving Challenge`. Please [stay tuned](https://opendrivelab.com/AD24Challenge.html) for further details!\n\n## News \u003ca name=\"news\"\u003e\u003c/a\u003e\n\n- `[2024/4]` :fire: ViDAR-pretraining on **End-to-End Autonomous Driving (UniAD)** is released. Please refer to [ViDAR-UniAD Page](./UniAD/README.md) for more information.\n- `[2024/4]` :fire: ViDAR-pretraining on **nuScenes-fullset** is released. Please check the configs for [pre-training](projects/configs/vidar_pretrain/nusc_fullset/vidar_full_nusc_1future.py) and [fine-tuning](projects/configs/vidar_finetune/nusc_fullset/vidar_full_nusc_1future.py). Corresponding\nmodels are available at [pre-trained](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/pretrain-ViDAR-RN101-nus-full-1future.pth) and [fine-tuned](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/finetune-ViDAR-RN101-nus-full-1future.pth).\n- `[2024/3]` :fire: Predictive world model challenge is launched. Please refer to the [link](docs/CHALLENGE.md) for more details.\n- `[2024/2]` ViDAR code and models initially released.\n- `[2024/2]` ViDAR is accepted by CVPR 2024.\n- `[2023/12]` ViDAR [paper](https://arxiv.org/abs/2312.17655) released.\n\n## TODO List \u003ca name=\"TODO List\"\u003e\u003c/a\u003e\n\nStill in progress:\n- [x] ViDAR-nuScenes-1/8 training and BEVFormer fine-tuning configurations.\n- [x] ViDAR-OpenScene-mini training configurations. (Welcome joining [predictive world model challenge](https://opendrivelab.com/challenge2024/#predictive_world_model)!)\n- [x] ViDAR-nuScenes-full training and BEVFormer full fine-tuning configurations.\n- [x] UniAD fine-tuning code and configuration.\n\n\n## Table of Contents\n\n1. [Results and Model Zoo](#models)\n2. [Installation](#installation)\n3. [Prepare Datasets](#prepare-datasets)\n4. [Train and Evaluate](#train-and-evaluate)\n5. [License and Citation](#license-and-citation)\n6. [Related Resources](#resources)\n\n## Results and Model Zoo \u003ca name=\"models\"\u003e\u003c/a\u003e\n\n### Visual point cloud forecasting pre-training\n\n**NuScenes Dataset:**\n\n|  Pre-train Model | Dataset  | Config | CD@1s | CD@2s | CD@3s | models \u0026 logs |\n| :------: | :---: | :---: | :----: | :----: | :----: | :----: |\n|   ViDAR-RN101-nus-1-8-1future | nuScenes (12.5% Data)   |  [vidar-nusc-pretrain-1future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_1future.py)  |  -   | - | - |  [models](https://drive.google.com/file/d/1NrJ49fFJaIPtnM9mfP_OsomY8AydMlNx/view?usp=sharing) / [logs](https://drive.google.com/file/d/1_80pYnhAHk7ZAiDMJKJW7_jXKGylZ3-D/view?usp=sharing) |\n|   ViDAR-RN101-nus-1-8-3future | nuScenes (12.5% Data)   |  [vidar-nusc-pretrain-3future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_3future.py)  |  1.25   | 1.48 | 1.79 |  [models](https://drive.google.com/file/d/1FR5lZGIA2KBzg-CsERDegNCuRNrMJsmR/view?usp=sharing) / [logs](https://drive.google.com/file/d/1HeiTGv8ss3fT2wCrFyzSGWwHbn7IR0mH/view?usp=sharing) |\n|   ViDAR-RN101-nus-full-1future | nuScenes (100% Data)   |  [vidar-nusc-pretrain-1future](projects/configs/vidar_pretrain/nusc_fullset/vidar_full_nusc_1future.py)  |  -   | - | - |  [models](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/pretrain-ViDAR-RN101-nus-full-1future.pth) |\n\n* **HINT**: For running ViDAR on the nuScenes-full set, please run `python tools/merge_nusc_fullset_pkl.py` before to generate the\n*nuscenes_infos_temporal_traintest.pkl* for pre-training.\n\n**OpenScene Dataset:**\n\n|  Pre-train Model | Dataset  | Config | CD@1s | CD@2s | CD@3s | models \u0026 logs |\n| :------: | :---: | :---: | :----: | :----: | :----: | :----: |\n|   ViDAR-RN101-OpenScene-3future | OpenScene-mini (12.5% Data)   |  [vidar-OpenScene-pretrain-3future-1-8](projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_mini_1_8_3future.py)  |  1.41   | 1.57 | 1.78 |  [models](https://drive.google.com/file/d/1aai3Z7JZavtDAFYzY1pwe41MNNRO_Wn_/view?usp=sharing) / [logs](https://drive.google.com/file/d/1oHdLH11l_ik2M5KyJBtklxa5Skz1bVra/view?usp=sharing) |\n|   ViDAR-RN101-OpenScene-3future | OpenScene-mini-Full (100% Data)   |  [vidar-OpenScene-pretrain-3future-full](projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_mini_full_3future.py)  |  1.03   | 1.15 | 1.35 |  [models](https://drive.google.com/file/d/1FiiZBHTtZYIvetwru9sTcVpDtKx_zAqd/view?usp=sharing) / [logs](https://drive.google.com/file/d/1mKiX-q6xSbhGa8tmsC19zUQbdTwg1JeA/view?usp=sharing) |\n\n### Down-stream fine-tuning (Perception)\n| Downstream Model | Dataset |  pre-train | Config | NDS | mAP | models \u0026 logs |\n| :------: | :------: | :---: | :---: | :----: | :----: | :----: |\n| BEVFormer-Base (baseline) | nuScenes (25% Data) |  [FCOS3D](https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth)  | [bevformer-base](projects/configs/vidar_finetune/nusc_1_4_subset/bevformer_1_4_baseline.py)  |  43.40   | 35.47 | [models](https://drive.google.com/file/d/19FKge9dANm7qG_hb1WRmokS3svWiMhE4/view?usp=sharing) / [logs](https://drive.google.com/file/d/1YwvW-ON6hHM4tLyWpo-orVUTXErRAfsu/view?usp=sharing) |\n| BEVFormer-Base | nuScenes (25% Data) |   [ViDAR-RN101-nus-1-8-1future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_1future.py)   | [vidar-nusc-finetune-1future](projects/configs/vidar_finetune/nusc_1_4_subset/vidar_1_8_nusc_1future.py)  |  45.77   | 36.90 | [models](https://drive.google.com/file/d/1t-SQUf41QcVOnyQk2TaSu7MBYcTqA_sf/view?usp=sharing) / [logs](https://drive.google.com/file/d/1Mq99JK_wATQdz6iwUPlN9YAtraB_HgjJ/view?usp=sharing) |\n| BEVFormer-Base | nuScenes (25% Data) |   [ViDAR-RN101-nus-1-8-3future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_3future.py)   | [vidar-nusc-finetune-3future](projects/configs/vidar_finetune/nusc_1_4_subset/vidar_1_8_nusc_3future.py)  |  45.61   | 36.84 | [models](https://drive.google.com/file/d/1D6yogBruaIcItgU-dPQt8qCPrDmxin5i/view?usp=sharing) / [logs](https://drive.google.com/file/d/1f7LiYp2hP64KnJzpDjj6JfK6lC4GtIly/view?usp=sharing) |\n| BEVFormer-Base(baseline)  | nuScenes (100% Data) | [FCOS3D](https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth)  | [bevformer-base](projects/configs/bevformer/bevformer_base.py)  |  51.7   | 41.6 | [models](https://github.com/zhiqi-li/storage/releases/download/v1.0/bevformer_r101_dcn_24ep.pth) |\n| BEVFormer-Base | nuScenes (100% Data) |   [ViDAR-RN101-nus-full-1future](projects/configs/vidar_pretrain/nusc_fullset/vidar_full_nusc_1future.py)   | [vidar-nusc-finetune-1future](projects/configs/vidar_finetune/nusc_fullset/vidar_full_nusc_1future.py)  |  55.33   | 45.20 | [models](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/finetune-ViDAR-RN101-nus-full-1future.pth) |\n\n### Down-stream fine-tuning (End-to-End)\n\nPlease refer to [ViDAR-UniAD page](UniAD/README.md).\n\n## Installation \u003ca name=\"installation\"\u003e\u003c/a\u003e\n\nThe installation step is similar to [BEVFormer](https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md).\nFor convenience, we list the steps below:\n```bash\nconda create -n vidar python=3.8 -y\nconda activate vidar\n\npip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html\nconda install -c omgarcia gcc-6 # (optional) gcc-6.2\n```\n\nInstall mm-series packages.\n```bash\npip install mmcv-full==1.4.0\npip install mmdet==2.14.0\npip install mmsegmentation==0.14.1\n\n# Install mmdetection3d from source codes.\ngit clone https://github.com/open-mmlab/mmdetection3d.git\ncd mmdetection3d\ngit checkout v0.17.1 # Other versions may not be compatible.\npython setup.py install\n```\n\nInstall Detectron2 and Timm.\n```bash\npip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13  typing-extensions==4.5.0 pylint ipython==8.12  numpy==1.19.5 matplotlib==3.5.2 numba==0.48.0 pandas==1.4.4 scikit-image==0.19.3 setuptools==59.5.0\npython -m pip install 'git+https://github.com/facebookresearch/detectron2.git'\n```\n\nSetup ViDAR project.\n```bash\ngit clone https://github.com/OpenDriveLab/ViDAR\n\ncd ViDAR\nmkdir pretrained\ncd pretrained \u0026 wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth\n\n# Install chamferdistance library.\ncd third_lib/chamfer_dist/chamferdist/\npip install .\n```\n\n## Prepare Datasets \u003ca name=\"prepare-datasets\"\u003e\u003c/a\u003e\n\n- [OpenScene](https://github.com/OpenDriveLab/OpenScene): please refer to [HERE](docs/DATASET.md).\n- [nuScenes](https://www.nuscenes.org/): please refer to [HERE](docs/DATASET.md#nuscenes).\n\n\n## Train and Evaluate \u003ca name=\"train-and-evaluate\"\u003e\u003c/a\u003e\n\n### Train\n\nWe recommand using 8 A100 GPUs for training. The GPU memory usage is around 63G while pre-training.\n* **HINT**: To save GPU memory, you can change *supervise_all_future=True* to *False*, and use a smaller *vidar_head_pred_history_frame_num* and\n*vidar_head_pred_future_frame_num*.\nFor example, by setting `supervise_all_future=False`, `vidar_head_pred_history_frame_num=0`, `vidar_head_pred_future_frame_num=0`,\nand `vidar_head_per_frame_loss_weight=(1.0,)`, \nthe GPU memory consumption of [vidar-pretrain-3future-model](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_3future.py) is reduced to ~34G.\nAn example configuration is provided at [link](projects/configs/vidar_pretrain/nusc_1_8_subset/mem_efficient_vidar_1_8_nusc_3future.py).\n* **Full-nuScenes-Training**: To pre-train ViDAR on the full nuScenes dataset, run `python tools/merge_nusc_fullset_pkl.py` before, to generate the\n*nuscenes_infos_temporal_traintest.pkl* for pre-training.\n\n\n```bash\nCONFIG=path/to/config.py\nGPU_NUM=8\n\n./tools/dist_train.sh ${CONFIG} ${GPU_NUM}\n```\n\n### Evaluate\n\n```bash\nCONFIG=path/to/vidar_config.py\nCKPT=path/to/checkpoint.pth\nGPU_NUM=8\n\n./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM}\n```\n\n### Visualize\n\n```bash\nCONFIG=path/to/vidar_config.py\nCKPT=path/to/checkpoint.pth\nGPU_NUM=1\n\n./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM} \\\n  --cfg-options 'model._viz_pcd_flag=True' 'model._viz_pcd_path=/path/to/output'\n```\n\n\n## License and Citation \u003ca name=\"license-and-citation\"\u003e\u003c/a\u003e\n\nAll assets and code are under the [Apache 2.0 license](./LICENSE) unless specified otherwise.\n\nIf this work is helpful for your research, please consider citing the following BibTeX entry.\n\n``` bibtex\n@inproceedings{yang2023vidar,\n  title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},\n  author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  year={2024}\n}\n```\n\n## Related Resources \u003ca name=\"resources\"\u003e\u003c/a\u003e\n\nWe acknowledge all the open-source contributors for the following projects to make this work possible:\n\n- [BEVFormer](https://github.com/fundamentalvision/BEVFormer) | [UniAD](https://github.com/OpenDriveLab/UniAD) | [4D Occ](https://github.com/tarashakhurana/4d-occ-forecasting)\n\n\u003ca href=\"https://twitter.com/OpenDriveLab\" target=\"_blank\"\u003e\n    \u003cimg alt=\"Twitter Follow\" src=\"https://img.shields.io/twitter/follow/OpenDriveLab?style=social\u0026color=brightgreen\u0026logo=twitter\" /\u003e\n  \u003c/a\u003e\n\n- [DriveAGI](https://github.com/OpenDriveLab/DriveAGI) | [Survey on BEV Perception](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)\n- [BEVFormer](https://github.com/fundamentalvision/BEVFormer) | [UniAD](https://github.com/OpenDriveLab/UniAD) | [OpenLane-V2](https://github.com/OpenDriveLab/OpenLane-V2) | [OccNet](https://github.com/OpenDriveLab/OccNet)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenDriveLab%2FViDAR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenDriveLab%2FViDAR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenDriveLab%2FViDAR/lists"}