{"id":13441931,"url":"https://github.com/MCG-NJU/SparseBEV","last_synced_at":"2025-03-20T13:31:22.460Z","repository":{"id":189555661,"uuid":"672462338","full_name":"MCG-NJU/SparseBEV","owner":"MCG-NJU","description":"[ICCV 2023] SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos","archived":false,"fork":false,"pushed_at":"2024-03-31T16:01:21.000Z","size":787,"stargazers_count":348,"open_issues_count":12,"forks_count":24,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-10-28T05:12:12.306Z","etag":null,"topics":["3d-object-detection","autonomous-driving","bev-perception","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2308.09244","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MCG-NJU.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-30T07:00:14.000Z","updated_at":"2024-10-28T02:26:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"51663ecf-4be9-424e-a906-cf911580129c","html_url":"https://github.com/MCG-NJU/SparseBEV","commit_stats":null,"previous_names":["mcg-nju/sparsebev"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCG-NJU%2FSparseBEV","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCG-NJU%2FSparseBEV/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCG-NJU%2FSparseBEV/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCG-NJU%2FSparseBEV/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MCG-NJU","download_url":"https://codeload.github.com/MCG-NJU/SparseBEV/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244619148,"owners_count":20482369,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-object-detection","autonomous-driving","bev-perception","transformer"],"created_at":"2024-07-31T03:01:39.760Z","updated_at":"2025-03-20T13:31:22.006Z","avatar_url":"https://github.com/MCG-NJU.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# SparseBEV\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparsebev-high-performance-sparse-3d-object/3d-object-detection-on-nuscenes-camera-only)](https://paperswithcode.com/sota/3d-object-detection-on-nuscenes-camera-only?p=sparsebev-high-performance-sparse-3d-object)\n\nThis is the official PyTorch implementation for our ICCV 2023 paper:\n\n\u003e [**SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos**](https://arxiv.org/abs/2308.09244)\u003cbr\u003e\n\u003e [Haisong Liu](https://scholar.google.com/citations?user=Z9yWFA0AAAAJ\u0026hl=en\u0026oi=sra), [Yao Teng](https://scholar.google.com/citations?user=eLIsViIAAAAJ\u0026hl=en\u0026oi=sra), [Tao Lu](https://scholar.google.com/citations?user=Ch28NiIAAAAJ\u0026hl=en\u0026oi=sra), [Haiguang Wang](https://miraclesinwang.github.io/), [Limin Wang](https://scholar.google.com/citations?user=HEuN8PcAAAAJ\u0026hl=en\u0026oi=sra)\u003cbr\u003eNanjing University, Shanghai AI Lab\n\n中文解读：[https://zhuanlan.zhihu.com/p/654821380](https://zhuanlan.zhihu.com/p/654821380)\n\n![](asserts/banner.jpg)\n\n## News\n\n* 2024-03-31: The code of SparseOcc is released at [https://github.com/MCG-NJU/SparseOcc](https://github.com/MCG-NJU/SparseOcc).\n* 2023-12-29: Check out our new paper ([https://arxiv.org/abs/2312.17118](https://arxiv.org/abs/2312.17118)) to learn about SparseOcc, a fully sparse architecture for panoptic occupancy!\n* 2023-10-20: We provide code for visualizing the predictions and the sampling points, as requested in [#25](https://github.com/MCG-NJU/SparseBEV/issues/25).\n* 2023-09-23: We release [the native PyTorch implementation of sparse sampling](https://github.com/MCG-NJU/SparseBEV/blob/97c8c798284555accedd0625395dd397fa4511d2/models/csrc/wrapper.py#L14). You can use this version if you encounter problems when compiling CUDA operators. It’s only about 15% slower.\n* 2023-08-21: We release the paper, code and pretrained weights.\n* 2023-07-14: SparseBEV is accepted to ICCV 2023.\n* 2023-02-09: SparseBEV-Beta achieves 65.6 NDS on [the nuScenes leaderboard](https://eval.ai/web/challenges/challenge-page/356/leaderboard/1012).\n\n## Model Zoo\n\n| Setting  | Pretrain | Training Cost | NDS\u003csub\u003eval\u003c/sub\u003e | NDS\u003csub\u003etest\u003c/sub\u003e | FPS | Weights |\n|----------|:--------:|:-------------:|:-----------------:|:------------------:|:---:|:-------:|\n| [r50_nuimg_704x256](configs/r50_nuimg_704x256.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 21h (8x2080Ti) | 55.6 | - | 15.8 | [gdrive](https://drive.google.com/file/d/1ft34-pxLpHGo2Aw-jowEtCxyXcqszHNn/view) |\n| [r50_nuimg_704x256_400q_36ep](configs/r50_nuimg_704x256_400q_36ep.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 28h (8x2080Ti) | 55.8 | - | 23.5 | [gdrive](https://drive.google.com/file/d/1C_Vn3iiSnSW1Dw1r0DkjJMwvHC5Y3zTN/view) |\n| [r101_nuimg_1408x512](configs/r101_nuimg_1408x512.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r101_fpn_1x_nuim/cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth) | 2d8h (8xV100) | 59.2 | - | 6.5 | [gdrive](https://drive.google.com/file/d/1dKu5cR1fuo-O0ynyBh-RCPtHrgut29mN/view) |\n| [vov99_dd3d_1600x640_trainval_future](configs/vov99_dd3d_1600x640_trainval_future.py) | [DD3D](https://drive.google.com/file/d/1gQkhWERCzAosBwG5bh2BKkt1k0TJZt-A/view) | 4d1h (8xA100) | 84.9 | 67.5 | - | [gdrive](https://drive.google.com/file/d/1TL0QoCiWD5uq8PCAWWE3A-g73ibK1R0S/view) |\n| [vit_eva02_1600x640_trainval_future](configs/vit_eva02_1600x640_trainval_future.py) | [EVA02](https://huggingface.co/Yuxin-CV/EVA-02/blob/main/eva02/det/eva02_L_coco_seg_sys_o365.pth) | 11d (8xA100) | 85.3 | 70.2 | - | [gdrive](https://drive.google.com/file/d/1cx7h6PUqiaVWPixpcuB9AhsX3Sx4n0q_/view) |\n\n* We use `r50_nuimg_704x256` for ablation studies and `r50_nuimg_704x256_400q_36ep` for comparison with others.\n* We recommend using `r50_nuimg_704x256` to validate new ideas since it trains faster and the result is more stable.\n* FPS is measured with AMD 5800X CPU and RTX 3090 GPU (without `fp16`).\n* The noise is around 0.3 NDS.\n\n## Environment\n\nInstall PyTorch 2.0 + CUDA 11.8:\n\n```\nconda create -n sparsebev python=3.8\nconda activate sparsebev\nconda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia\n```\n\nor PyTorch 1.10.2 + CUDA 10.2 for older GPUs:\n\n```\nconda create -n sparsebev python=3.8\nconda activate sparsebev\nconda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch\n```\n\nInstall other dependencies:\n\n```\npip install openmim\nmim install mmcv-full==1.6.0\nmim install mmdet==2.28.2\nmim install mmsegmentation==0.30.0\nmim install mmdet3d==1.0.0rc6\npip install setuptools==59.5.0\npip install numpy==1.23.5\n```\n\nInstall turbojpeg and pillow-simd to speed up data loading (optional but important):\n\n```\nsudo apt-get update\nsudo apt-get install -y libturbojpeg\npip install pyturbojpeg\npip uninstall pillow\npip install pillow-simd==9.0.0.post1\n```\n\nCompile CUDA extensions:\n\n```\ncd models/csrc\npython setup.py build_ext --inplace\n```\n\n## Prepare Dataset\n\n1. Download nuScenes from [https://www.nuscenes.org/nuscenes](https://www.nuscenes.org/nuscenes) and put it in `data/nuscenes`.\n2. Download the generated info file from [Google Drive](https://drive.google.com/file/d/1uyoUuSRIVScrm_CUpge6V_UzwDT61ODO/view?usp=sharing) and unzip it.\n3. Folder structure:\n\n```\ndata/nuscenes\n├── maps\n├── nuscenes_infos_test_sweep.pkl\n├── nuscenes_infos_train_sweep.pkl\n├── nuscenes_infos_train_mini_sweep.pkl\n├── nuscenes_infos_val_sweep.pkl\n├── nuscenes_infos_val_mini_sweep.pkl\n├── samples\n├── sweeps\n├── v1.0-test\n└── v1.0-trainval\n```\n\nThese `*.pkl` files can also be generated with our script: `gen_sweep_info.py`.\n\n## Training\n\nDownload pretrained weights and put it in directory `pretrain/`:\n\n```\npretrain\n├── cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth\n├── cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth\n```\n\nTrain SparseBEV with 8 GPUs:\n\n```\ntorchrun --nproc_per_node 8 train.py --config configs/r50_nuimg_704x256.py\n```\n\nTrain SparseBEV with 4 GPUs (i.e the last four GPUs):\n\n```\nexport CUDA_VISIBLE_DEVICES=4,5,6,7\ntorchrun --nproc_per_node 4 train.py --config configs/r50_nuimg_704x256.py\n```\n\nThe batch size for each GPU will be scaled automatically. So there is no need to modify the `batch_size` in config files.\n\n## Evaluation\n\nSingle-GPU evaluation:\n\n```\nexport CUDA_VISIBLE_DEVICES=0\npython val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth\n```\n\nMulti-GPU evaluation:\n\n```\nexport CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\ntorchrun --nproc_per_node 8 val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth\n```\n\n## Timing\n\nFPS is measured with a single GPU:\n\n```\nexport CUDA_VISIBLE_DEVICES=0\npython timing.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth\n```\n\n## Visualization\n\nVisualize the predicted bbox:\n\n```\npython viz_bbox_predictions.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth\n```\n\nVisualize the sampling points (like Fig. 6 in the paper):\n\n```\npython viz_sample_points.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth\n```\n\n## Acknowledgements\n\nMany thanks to these excellent open-source projects:\n\n* 3D Detection: [DETR3D](https://github.com/WangYueFt/detr3d), [PETR](https://github.com/megvii-research/PETR), [BEVFormer](https://github.com/fundamentalvision/BEVFormer), [BEVDet](https://github.com/HuangJunJie2017/BEVDet), [StreamPETR](https://github.com/exiawsh/StreamPETR)\n* 2D Detection: [AdaMixer](https://github.com/MCG-NJU/AdaMixer), [DN-DETR](https://github.com/IDEA-Research/DN-DETR)\n* Codebase: [MMDetection3D](https://github.com/open-mmlab/mmdetection3d), [CamLiFlow](https://github.com/MCG-NJU/CamLiFlow)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMCG-NJU%2FSparseBEV","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMCG-NJU%2FSparseBEV","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMCG-NJU%2FSparseBEV/lists"}