{"id":13442227,"url":"https://github.com/JiayuZou2020/DiffBEV","last_synced_at":"2025-03-20T13:33:05.445Z","repository":{"id":142829455,"uuid":"613895691","full_name":"JiayuZou2020/DiffBEV","owner":"JiayuZou2020","description":"Official PyTorch implementation for a conditional diffusion probability model in BEV perception","archived":false,"fork":false,"pushed_at":"2023-04-04T02:53:06.000Z","size":5243,"stargazers_count":239,"open_issues_count":12,"forks_count":10,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-08-01T03:39:13.254Z","etag":null,"topics":["3d-detection","bev-perception","diffusion-models","semantic-segmentation"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2303.08333","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JiayuZou2020.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-14T13:41:19.000Z","updated_at":"2024-07-18T05:37:29.000Z","dependencies_parsed_at":"2023-04-09T02:54:45.810Z","dependency_job_id":null,"html_url":"https://github.com/JiayuZou2020/DiffBEV","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiayuZou2020%2FDiffBEV","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiayuZou2020%2FDiffBEV/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiayuZou2020%2FDiffBEV/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiayuZou2020%2FDiffBEV/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JiayuZou2020","download_url":"https://codeload.github.com/JiayuZou2020/DiffBEV/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221768500,"owners_count":16877650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-detection","bev-perception","diffusion-models","semantic-segmentation"],"created_at":"2024-07-31T03:01:43.225Z","updated_at":"2024-10-28T02:31:44.657Z","avatar_url":"https://github.com/JiayuZou2020.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# [DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception](https://arxiv.org/abs/2303.08333)\n\u003cdiv align=\"center\"\u003e\nConditional diffusion probability model for BEV perception\n\u003c/div\u003e\n\n---\n\u003cdiv align=\"center\"\u003e\n    \u003ca href='https://arxiv.org/abs/2303.08333'\u003eArxiv\n    \u003ca href='https://github.com/JiayuZou2020/DiffBEV#abstract'\u003eAbstract\n    \u003ca href='https://github.com/JiayuZou2020/DiffBEV#dataset'\u003eDataset\n    \u003ca href='https://github.com/JiayuZou2020/DiffBEV#installation'\u003eInstallation  \n    \u003ca href='https://github.com/JiayuZou2020/DiffBEV#visualization'\u003eVisualization \n    \u003ca href='https://github.com/JiayuZou2020/DiffBEV#citation'\u003eCitation\n    \u003ca href='https://github.com/JiayuZou2020/DiffBEV#acknowledgement'\u003eAcknowledgement\n\u003c/div\u003e\n\n---\n\n## Arxiv\n[https://arxiv.org/abs/2303.08333](https://arxiv.org/abs/2303.08333)\n\n## Abstract\nBEV perception is of great importance in the field of autonomous driving, serving as the cornerstone of planning, controlling, and motion prediction. The quality of the BEV feature highly affects the performance of BEV perception. However, taking the noises in camera parameters and LiDAR scans into consideration, we usually obtain BEV representation with harmful noises. Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation. In this work, we propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation. To the best of our knowledge, we are the first to apply diffusion model to BEV perception. In practice, we design three types of conditions to guide the training of the diffusion model which denoises the coarse samples and refines the semantic feature in a progressive way. What's more, a cross-attention module is leveraged to fuse the context of BEV feature and the semantic content of conditional diffusion model. DiffBEV achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the best-performing existing approach. Quantitative and qualitative results on multiple benchmarks demonstrate the effectiveness of DiffBEV in BEV semantic segmentation and 3D object detection tasks.\n![framework](figures/framework.png)\n        \n## Dataset\n### Download Datasets From Official Websites\nExtensive experiments are conducted on the _[nuScenes](https://www.nuscenes.org/download)_, _[KITTI Raw](https://www.cvlibs.net/datasets/kitti/raw_data.php)_, _[KITTI Odometry](https://www.cvlibs.net/datasets/kitti/eval_odometry.php)_, and _[KITTI 3D Object](https://www.cvlibs.net/datasets/kitti/eval_3dobject.php)_ benchmarks. \n\n### Prepare Depth Maps\nFollow the [script](https://github.com/TRAILab/CaDDN/blob/master/pcdet/datasets/kitti/kitti_dataset.py) to generate depth maps for KITTI datasets. The depth maps of KITTI datasets are available at _[Google Drive](https://drive.google.com/drive/folders/1FR7SRXQw2TxVsjMn5SWel8sdfclWAtR8?usp=sharing)_ and _[Baidu Net Disk](https://pan.baidu.com/s/1ZCWPMSdmp33LsTBxZaETZQ?pwd=2023)_. We also provide the [script](https://github.com/JiayuZou2020/DiffBEV/blob/main/scripts/nuS2depth.py) to get the depth map for nuScenes dataset. Replace the dataset path in the script accroding to your dataset directory.\n\n### Dataset Processing\nAfter downing these datasets, we need to generate the annotations in BEV. Follow the instructions below to get the corresponding annotations. \n#### nuScenes\nRun the script [make_nuscenes_labels](https://github.com/tom-roddick/mono-semantic-maps/blob/master/scripts/make_nuscenes_labels.py) to get the BEV annotation for the nuScenes benchmark. Please follow here to generate the BEV annotation (ann_bev_dir) for KITTI datasets. \n\n#### KITTI Datasets\nFollow the _[instruction](https://github.com/manila95/monolayout#datasets)_ to get the BEV annotations for KITTI Raw, KITTI Odometry, and KITTI 3D Object datasets.\n\nThe datasets' structure is organized as follows.\n```\ndata\n├── nuscenes\n    ├── img_dir\n        ├── train\n        ├── val\n    ├── ann_bev_dir\n        ├── train\n        ├── val\n        ├── train_depth\n        ├── val_depth\n    ├── calib.json\n├── kitti_processed\n    ├── kitti_raw\n        ├── img_dir\n            ├── train\n            ├── val\n        ├── ann_bev_dir\n            ├── train\n            ├── val\n            ├── train_depth\n            ├── val_depth\n        ├── calib.json\n    ├── kitti_odometry\n        ├── img_dir\n            ├── train\n            ├── val\n        ├── ann_bev_dir\n            ├── train\n            ├── val\n            ├── train_depth\n            ├── val_depth\n        ├── calib.json\n    ├── kitti_object\n        ├── img_dir\n            ├── train\n            ├── val\n        ├── ann_bev_dir\n            ├── train\n            ├── val\n            ├── train_depth\n            ├── val_depth\n        ├── calib.json\n```\n\n### Prepare Calibration Files\nFor the camera parameters on each dataset, we write them into the corresponding _calib.json file. For each dataset, we upload the _calib.json to _[Google Drive](https://drive.google.com/drive/folders/1Ahaed1OsA1EqlJOCHHN-MQQr2VpF8H7U)_ and _[Baidu Net Disk](https://pan.baidu.com/s/1wEzHWkazS5vLPZJVjpzHMw?pwd=2022#list/path=%2F)_.\n\nPlease change the dataset path according to the real data directory in the _[nuScenes, KITTI Raw, KITTI Odometry, and KITTI 3D Object dataset configurations](https://github.com/JiayuZou2020/DiffBEV/tree/main/configs/_base_/datasets)_. Modify the path of pretrained model in _model configurations_.\n\n## Installation\nDiffBEV is tested on:\n* Python 3.7/3.8\n* CUDA 11.1\n* Torch 1.9.1\n\nPlease check [install](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/get_started.md#installation) for installation.\n* Create a conda environment for the project.\n```python\nconda create -n diffbev python=3.7\nconda activate diffbev\n```\n* Install Pytorch following the [instruction](https://pytorch.org/get-started/locally/).\n`conda install pytorch torchvision -c pytorch`\n* Install [mmcv](https://github.com/open-mmlab/mmcv)\n\n```python\npip install -U openmim\nmim install mmcv-full\n```\n* Git clone this repository\n        \n```python\ngit clone https://github.com/JiayuZou2020/DiffBEV.git\n```\n\n* Install and compile the required packages.\n```python\ncd DiffBEV\npip install -v -e .\n```\n\n## Visualization\n![vis](figures/vis.png)\n\n## Citation\nIf you find our work is helpful for your research, please consider citing as follows.\n```\n@article{zou2023diffbev,\n      title={DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception}, \n      author={Jiayu, Zou and Zheng, Zhu and Yun, Ye and Xingang, Wang},\n      journal={arXiv preprint arXiv:2303.08333},\n      year={2023}\n}\n```\n## Acknowledgement\nOur work is partially based on the following open-sourced projects: [mmsegmentation](https://github.com/open-mmlab/mmsegmentation), [VPN](https://github.com/pbw-Berwin/View-Parsing-Network), [PYVA](https://github.com/JonDoe-297/cross-view), [PON](https://github.com/tom-roddick/mono-semantic-maps), [LSS](https://github.com/nv-tlabs/lift-splat-shoot). \nThanks for their contribution to the research community of BEV perception.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJiayuZou2020%2FDiffBEV","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJiayuZou2020%2FDiffBEV","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJiayuZou2020%2FDiffBEV/lists"}