https://github.com/OpenDriveLab/ViDAR

[CVPR 2024 Highlight] Visual Point Cloud Forecasting
https://github.com/OpenDriveLab/ViDAR
autonomous-driving point-cloud-forecasting pre-training world-model
Last synced: 7 months ago
JSON representation
[CVPR 2024 Highlight] Visual Point Cloud Forecasting
Host: GitHub
URL: https://github.com/OpenDriveLab/ViDAR
Owner: OpenDriveLab
License: apache-2.0
Created: 2023-12-07T06:21:44.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-06-24T02:28:03.000Z (over 1 year ago)
Last Synced: 2024-08-01T03:34:33.017Z (about 1 year ago)
Topics: autonomous-driving, point-cloud-forecasting, pre-training, world-model
Language: Python
Homepage: https://arxiv.org/abs/2312.17655
Size: 35.7 MB
Stars: 243
Watchers: 9
Forks: 17
Open Issues: 7
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project

README

          # ViDAR: Visual Point Cloud Forecasting

![](./assets/teaser.png "Visual point cloud forecasting")

> **Visual Point Cloud Forecasting enables Scalable Autonomous Driving [CVPR 2024 Highlight]**

>

> [Zetong Yang](https://scholar.google.com/citations?user=oPiZSVYAAAAJ&hl=en), [Li Chen](https://scholar.google.com/citations?user=ulZxvY0AAAAJ&hl=en&authuser=1), [Yanan Sun](https://scholar.google.com/citations?user=6TA1oPkAAAAJ&hl=en), and [Hongyang Li](https://lihongyang.info/)

> 

> - Presented by [OpenDriveLab](https://opendrivelab.com/) at Shanghai AI Lab

> - :mailbox_with_mail: Primary contact: [Zetong Yang]((https://scholar.google.com/citations?user=oPiZSVYAAAAJ&hl=en)) ( tomztyang@gmail.com ) 

> - [arXiv paper](https://arxiv.org/abs/2312.17655) | [Video (YouTube, 5min)](https://www.youtube.com/watch?v=j1dU1ii5Rvg) | [Tutorial on World Model (Bilibili)](https://www.bilibili.com/video/BV1ub421p7Rg/?share_source=copy_web&vd_source=47bdbb6c67891d390b613c403e23dcfb)

> - [CVPR 2024 Autonomous Deiving Challenge - Predictive World Model](https://opendrivelab.com/challenge2024/#predictive_world_model)

## Highlights 

:fire: **Visual point cloud forecasting**, a new self-supervised pre-training task for end-to-end autonomous driving, predicting 

future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.

:star2: **ViDAR**, the first visual point cloud forecasting architecture.

![method](./assets/vidar.png "Architecture of ViDAR")

:trophy: Predictive world model, in the form of visual point cloud forecasting, will be a main track in the `CVPR 2024 Autonomous Driving Challenge`. Please [stay tuned](https://opendrivelab.com/AD24Challenge.html) for further details!

## News 

- `[2024/4]` :fire: ViDAR-pretraining on **End-to-End Autonomous Driving (UniAD)** is released. Please refer to [ViDAR-UniAD Page](./UniAD/README.md) for more information.

- `[2024/4]` :fire: ViDAR-pretraining on **nuScenes-fullset** is released. Please check the configs for [pre-training](projects/configs/vidar_pretrain/nusc_fullset/vidar_full_nusc_1future.py) and [fine-tuning](projects/configs/vidar_finetune/nusc_fullset/vidar_full_nusc_1future.py). Corresponding

models are available at [pre-trained](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/pretrain-ViDAR-RN101-nus-full-1future.pth) and [fine-tuned](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/finetune-ViDAR-RN101-nus-full-1future.pth).

- `[2024/3]` :fire: Predictive world model challenge is launched. Please refer to the [link](docs/CHALLENGE.md) for more details.

- `[2024/2]` ViDAR code and models initially released.

- `[2024/2]` ViDAR is accepted by CVPR 2024.

- `[2023/12]` ViDAR [paper](https://arxiv.org/abs/2312.17655) released.

## TODO List 

Still in progress:

- [x] ViDAR-nuScenes-1/8 training and BEVFormer fine-tuning configurations.

- [x] ViDAR-OpenScene-mini training configurations. (Welcome joining [predictive world model challenge](https://opendrivelab.com/challenge2024/#predictive_world_model)!)

- [x] ViDAR-nuScenes-full training and BEVFormer full fine-tuning configurations.

- [x] UniAD fine-tuning code and configuration.

## Table of Contents

1. [Results and Model Zoo](#models)

2. [Installation](#installation)

3. [Prepare Datasets](#prepare-datasets)

4. [Train and Evaluate](#train-and-evaluate)

5. [License and Citation](#license-and-citation)

6. [Related Resources](#resources)

## Results and Model Zoo 

### Visual point cloud forecasting pre-training

**NuScenes Dataset:**

|  Pre-train Model | Dataset  | Config | CD@1s | CD@2s | CD@3s | models & logs |

| :------: | :---: | :---: | :----: | :----: | :----: | :----: |

|   ViDAR-RN101-nus-1-8-1future | nuScenes (12.5% Data)   |  [vidar-nusc-pretrain-1future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_1future.py)  |  -   | - | - |  [models](https://drive.google.com/file/d/1NrJ49fFJaIPtnM9mfP_OsomY8AydMlNx/view?usp=sharing) / [logs](https://drive.google.com/file/d/1_80pYnhAHk7ZAiDMJKJW7_jXKGylZ3-D/view?usp=sharing) |

|   ViDAR-RN101-nus-1-8-3future | nuScenes (12.5% Data)   |  [vidar-nusc-pretrain-3future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_3future.py)  |  1.25   | 1.48 | 1.79 |  [models](https://drive.google.com/file/d/1FR5lZGIA2KBzg-CsERDegNCuRNrMJsmR/view?usp=sharing) / [logs](https://drive.google.com/file/d/1HeiTGv8ss3fT2wCrFyzSGWwHbn7IR0mH/view?usp=sharing) |

|   ViDAR-RN101-nus-full-1future | nuScenes (100% Data)   |  [vidar-nusc-pretrain-1future](projects/configs/vidar_pretrain/nusc_fullset/vidar_full_nusc_1future.py)  |  -   | - | - |  [models](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/pretrain-ViDAR-RN101-nus-full-1future.pth) |

* **HINT**: For running ViDAR on the nuScenes-full set, please run `python tools/merge_nusc_fullset_pkl.py` before to generate the

*nuscenes_infos_temporal_traintest.pkl* for pre-training.

**OpenScene Dataset:**

|  Pre-train Model | Dataset  | Config | CD@1s | CD@2s | CD@3s | models & logs |

| :------: | :---: | :---: | :----: | :----: | :----: | :----: |

|   ViDAR-RN101-OpenScene-3future | OpenScene-mini (12.5% Data)   |  [vidar-OpenScene-pretrain-3future-1-8](projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_mini_1_8_3future.py)  |  1.41   | 1.57 | 1.78 |  [models](https://drive.google.com/file/d/1aai3Z7JZavtDAFYzY1pwe41MNNRO_Wn_/view?usp=sharing) / [logs](https://drive.google.com/file/d/1oHdLH11l_ik2M5KyJBtklxa5Skz1bVra/view?usp=sharing) |

|   ViDAR-RN101-OpenScene-3future | OpenScene-mini-Full (100% Data)   |  [vidar-OpenScene-pretrain-3future-full](projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_mini_full_3future.py)  |  1.03   | 1.15 | 1.35 |  [models](https://drive.google.com/file/d/1FiiZBHTtZYIvetwru9sTcVpDtKx_zAqd/view?usp=sharing) / [logs](https://drive.google.com/file/d/1mKiX-q6xSbhGa8tmsC19zUQbdTwg1JeA/view?usp=sharing) |

### Down-stream fine-tuning (Perception)

| Downstream Model | Dataset |  pre-train | Config | NDS | mAP | models & logs |

| :------: | :------: | :---: | :---: | :----: | :----: | :----: |

| BEVFormer-Base (baseline) | nuScenes (25% Data) |  [FCOS3D](https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth)  | [bevformer-base](projects/configs/vidar_finetune/nusc_1_4_subset/bevformer_1_4_baseline.py)  |  43.40   | 35.47 | [models](https://drive.google.com/file/d/19FKge9dANm7qG_hb1WRmokS3svWiMhE4/view?usp=sharing) / [logs](https://drive.google.com/file/d/1YwvW-ON6hHM4tLyWpo-orVUTXErRAfsu/view?usp=sharing) |

| BEVFormer-Base | nuScenes (25% Data) |   [ViDAR-RN101-nus-1-8-1future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_1future.py)   | [vidar-nusc-finetune-1future](projects/configs/vidar_finetune/nusc_1_4_subset/vidar_1_8_nusc_1future.py)  |  45.77   | 36.90 | [models](https://drive.google.com/file/d/1t-SQUf41QcVOnyQk2TaSu7MBYcTqA_sf/view?usp=sharing) / [logs](https://drive.google.com/file/d/1Mq99JK_wATQdz6iwUPlN9YAtraB_HgjJ/view?usp=sharing) |

| BEVFormer-Base | nuScenes (25% Data) |   [ViDAR-RN101-nus-1-8-3future](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_3future.py)   | [vidar-nusc-finetune-3future](projects/configs/vidar_finetune/nusc_1_4_subset/vidar_1_8_nusc_3future.py)  |  45.61   | 36.84 | [models](https://drive.google.com/file/d/1D6yogBruaIcItgU-dPQt8qCPrDmxin5i/view?usp=sharing) / [logs](https://drive.google.com/file/d/1f7LiYp2hP64KnJzpDjj6JfK6lC4GtIly/view?usp=sharing) |

| BEVFormer-Base(baseline)  | nuScenes (100% Data) | [FCOS3D](https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth)  | [bevformer-base](projects/configs/bevformer/bevformer_base.py)  |  51.7   | 41.6 | [models](https://github.com/zhiqi-li/storage/releases/download/v1.0/bevformer_r101_dcn_24ep.pth) |

| BEVFormer-Base | nuScenes (100% Data) |   [ViDAR-RN101-nus-full-1future](projects/configs/vidar_pretrain/nusc_fullset/vidar_full_nusc_1future.py)   | [vidar-nusc-finetune-1future](projects/configs/vidar_finetune/nusc_fullset/vidar_full_nusc_1future.py)  |  55.33   | 45.20 | [models](https://github.com/OpenDriveLab/ViDAR/releases/download/v1.0.0/finetune-ViDAR-RN101-nus-full-1future.pth) |

### Down-stream fine-tuning (End-to-End)

Please refer to [ViDAR-UniAD page](UniAD/README.md).

## Installation 

The installation step is similar to [BEVFormer](https://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md).

For convenience, we list the steps below:

```bash

conda create -n vidar python=3.8 -y

conda activate vidar

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html

conda install -c omgarcia gcc-6 # (optional) gcc-6.2

```

Install mm-series packages.

```bash

pip install mmcv-full==1.4.0

pip install mmdet==2.14.0

pip install mmsegmentation==0.14.1

# Install mmdetection3d from source codes.

git clone https://github.com/open-mmlab/mmdetection3d.git

cd mmdetection3d

git checkout v0.17.1 # Other versions may not be compatible.

python setup.py install

```

Install Detectron2 and Timm.

```bash

pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13  typing-extensions==4.5.0 pylint ipython==8.12  numpy==1.19.5 matplotlib==3.5.2 numba==0.48.0 pandas==1.4.4 scikit-image==0.19.3 setuptools==59.5.0

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

```

Setup ViDAR project.

```bash

git clone https://github.com/OpenDriveLab/ViDAR

cd ViDAR

mkdir pretrained

cd pretrained & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth

# Install chamferdistance library.

cd third_lib/chamfer_dist/chamferdist/

pip install .

```

## Prepare Datasets 

- [OpenScene](https://github.com/OpenDriveLab/OpenScene): please refer to [HERE](docs/DATASET.md).

- [nuScenes](https://www.nuscenes.org/): please refer to [HERE](docs/DATASET.md#nuscenes).

## Train and Evaluate 

### Train

We recommand using 8 A100 GPUs for training. The GPU memory usage is around 63G while pre-training.

* **HINT**: To save GPU memory, you can change *supervise_all_future=True* to *False*, and use a smaller *vidar_head_pred_history_frame_num* and

*vidar_head_pred_future_frame_num*.

For example, by setting `supervise_all_future=False`, `vidar_head_pred_history_frame_num=0`, `vidar_head_pred_future_frame_num=0`,

and `vidar_head_per_frame_loss_weight=(1.0,)`, 

the GPU memory consumption of [vidar-pretrain-3future-model](projects/configs/vidar_pretrain/nusc_1_8_subset/vidar_1_8_nusc_3future.py) is reduced to ~34G.

An example configuration is provided at [link](projects/configs/vidar_pretrain/nusc_1_8_subset/mem_efficient_vidar_1_8_nusc_3future.py).

* **Full-nuScenes-Training**: To pre-train ViDAR on the full nuScenes dataset, run `python tools/merge_nusc_fullset_pkl.py` before, to generate the

*nuscenes_infos_temporal_traintest.pkl* for pre-training.

```bash

CONFIG=path/to/config.py

GPU_NUM=8

./tools/dist_train.sh ${CONFIG} ${GPU_NUM}

```

### Evaluate

```bash

CONFIG=path/to/vidar_config.py

CKPT=path/to/checkpoint.pth

GPU_NUM=8

./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM}

```

### Visualize

```bash

CONFIG=path/to/vidar_config.py

CKPT=path/to/checkpoint.pth

GPU_NUM=1

./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM} \

  --cfg-options 'model._viz_pcd_flag=True' 'model._viz_pcd_path=/path/to/output'

```

## License and Citation 

All assets and code are under the [Apache 2.0 license](./LICENSE) unless specified otherwise.

If this work is helpful for your research, please consider citing the following BibTeX entry.

``` bibtex

@inproceedings{yang2023vidar,

  title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},

  author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},

  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  year={2024}

}

```

## Related Resources 

We acknowledge all the open-source contributors for the following projects to make this work possible:

- [BEVFormer](https://github.com/fundamentalvision/BEVFormer) | [UniAD](https://github.com/OpenDriveLab/UniAD) | [4D Occ](https://github.com/tarashakhurana/4d-occ-forecasting)



    

  

- [DriveAGI](https://github.com/OpenDriveLab/DriveAGI) | [Survey on BEV Perception](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)

- [BEVFormer](https://github.com/fundamentalvision/BEVFormer) | [UniAD](https://github.com/OpenDriveLab/UniAD) | [OpenLane-V2](https://github.com/OpenDriveLab/OpenLane-V2) | [OccNet](https://github.com/OpenDriveLab/OccNet)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/OpenDriveLab/ViDAR

Awesome Lists containing this project

README