
An open API service indexing awesome lists of open source software.

[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos

3d-object-detection autonomous-driving bev-perception transformer

Last synced: 5 days ago
JSON representation

[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos




# SparseBEV


This is the official PyTorch implementation for our ICCV 2023 paper:

> [**SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos**](

> [Haisong Liu](, [Yao Teng](, [Tao Lu](, [Haiguang Wang](, [Limin Wang](
Nanjing University, Shanghai AI Lab



## News

* 2024-03-31: The code of SparseOcc is released at [](
* 2023-12-29: Check out our new paper ([]( to learn about SparseOcc, a fully sparse architecture for panoptic occupancy!
* 2023-10-20: We provide code for visualizing the predictions and the sampling points, as requested in [#25](
* 2023-09-23: We release [the native PyTorch implementation of sparse sampling]( You can use this version if you encounter problems when compiling CUDA operators. It’s only about 15% slower.
* 2023-08-21: We release the paper, code and pretrained weights.
* 2023-07-14: SparseBEV is accepted to ICCV 2023.
* 2023-02-09: SparseBEV-Beta achieves 65.6 NDS on [the nuScenes leaderboard](

## Model Zoo

| Setting | Pretrain | Training Cost | NDSval | NDStest | FPS | Weights |
| [r50_nuimg_704x256](configs/ | [nuImg]( | 21h (8x2080Ti) | 55.6 | - | 15.8 | [gdrive]( |
| [r50_nuimg_704x256_400q_36ep](configs/ | [nuImg]( | 28h (8x2080Ti) | 55.8 | - | 23.5 | [gdrive]( |
| [r101_nuimg_1408x512](configs/ | [nuImg]( | 2d8h (8xV100) | 59.2 | - | 6.5 | [gdrive]( |
| [vov99_dd3d_1600x640_trainval_future](configs/ | [DD3D]( | 4d1h (8xA100) | 84.9 | 67.5 | - | [gdrive]( |
| [vit_eva02_1600x640_trainval_future](configs/ | [EVA02]( | 11d (8xA100) | 85.3 | 70.2 | - | [gdrive]( |

* We use `r50_nuimg_704x256` for ablation studies and `r50_nuimg_704x256_400q_36ep` for comparison with others.
* We recommend using `r50_nuimg_704x256` to validate new ideas since it trains faster and the result is more stable.
* FPS is measured with AMD 5800X CPU and RTX 3090 GPU (without `fp16`).
* The noise is around 0.3 NDS.

## Environment

Install PyTorch 2.0 + CUDA 11.8:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia

or PyTorch 1.10.2 + CUDA 10.2 for older GPUs:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch

Install other dependencies:

pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5

Install turbojpeg and pillow-simd to speed up data loading (optional but important):

sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1

Compile CUDA extensions:

cd models/csrc
python build_ext --inplace

## Prepare Dataset

1. Download nuScenes from []( and put it in `data/nuscenes`.
2. Download the generated info file from [Google Drive]( and unzip it.
3. Folder structure:

├── maps
├── nuscenes_infos_test_sweep.pkl
├── nuscenes_infos_train_sweep.pkl
├── nuscenes_infos_train_mini_sweep.pkl
├── nuscenes_infos_val_sweep.pkl
├── nuscenes_infos_val_mini_sweep.pkl
├── samples
├── sweeps
├── v1.0-test
└── v1.0-trainval

These `*.pkl` files can also be generated with our script: ``.

## Training

Download pretrained weights and put it in directory `pretrain/`:

├── cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth
├── cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth

Train SparseBEV with 8 GPUs:

torchrun --nproc_per_node 8 --config configs/

Train SparseBEV with 4 GPUs (i.e the last four GPUs):

torchrun --nproc_per_node 4 --config configs/

The batch size for each GPU will be scaled automatically. So there is no need to modify the `batch_size` in config files.

## Evaluation

Single-GPU evaluation:

python --config configs/ --weights checkpoints/r50_nuimg_704x256.pth

Multi-GPU evaluation:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node 8 --config configs/ --weights checkpoints/r50_nuimg_704x256.pth

## Timing

FPS is measured with a single GPU:

python --config configs/ --weights checkpoints/r50_nuimg_704x256.pth

## Visualization

Visualize the predicted bbox:

python --config configs/ --weights checkpoints/r50_nuimg_704x256.pth

Visualize the sampling points (like Fig. 6 in the paper):

python --config configs/ --weights checkpoints/r50_nuimg_704x256.pth

## Acknowledgements

Many thanks to these excellent open-source projects:

* 3D Detection: [DETR3D](, [PETR](, [BEVFormer](, [BEVDet](, [StreamPETR](
* 2D Detection: [AdaMixer](, [DN-DETR](
* Codebase: [MMDetection3D](, [CamLiFlow](