https://github.com/DerrickXuNu/CoBEVT

[CoRL2022] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
https://github.com/DerrickXuNu/CoBEVT

autonomous-driving autonomous-vehicles bev-perception collaborative-perception computer-vision multi-agent-perception nuscenes segmentation semantic semantic-segmentation v2v v2x

Last synced: 8 months ago
JSON representation

[CoRL2022] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Host: GitHub
URL: https://github.com/DerrickXuNu/CoBEVT
Owner: DerrickXuNu
License: apache-2.0
Created: 2022-09-10T21:25:32.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-08-18T01:06:04.000Z (about 1 year ago)
Last Synced: 2024-10-28T06:57:45.696Z (about 1 year ago)
Topics: autonomous-driving, autonomous-vehicles, bev-perception, collaborative-perception, computer-vision, multi-agent-perception, nuscenes, segmentation, semantic, semantic-segmentation, v2v, v2x
Language: Python
Homepage:
Size: 60 MB
Stars: 201
Watchers: 9
Forks: 17
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-object-detection-datasets - CoBEVT

README

          # CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers [CORL2022] 

[![paper](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/pdf/2207.02202.pdf)

[![supplement](https://img.shields.io/badge/Supplementary-Material-red)](https://arxiv.org/pdf/2207.02202.pdf)

[![video](https://img.shields.io/badge/Video-Presentation-F9D371)]()

This is the official implementation of CoRL2022 paper "CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers".

[Runsheng Xu](https://derrickxunu.github.io/), [Zhengzhong Tu](https://github.com/vztu), [Hao Xiang](https://xhwind.github.io/), [Wei Shao](https://www.linkedin.com/in/wei-shao-94972295/), [Bolei Zhou](https://boleizhou.github.io/), [Jiaqi Ma](https://mobility-lab.seas.ucla.edu/)

UCLA, UT-Austin








Overview of CoBEVT






## Introduction

CoBEVT is the first generic multi-agent multi-camera perception framework that can cooperatively generate BEV

map predictions. The core component of CoBEVT, named fused axial

attention or FAX module,  can capture sparsely local and global spatial interactions across views and agents. We 

achieve SOTA performance both on [OPV2V](https://mobility-lab.seas.ucla.edu/opv2v/) and [nuScenes](https://www.nuscenes.org/) dataset with **real-time performance**.








nuScenes demo:

Our CoBEVT can be used on single-vehicle multi-camera semantic BEV Segmentations.













OPV2V demo:

Our CoBEVT can also be used for multi-agent BEV map prediction.






## Installation

The pipeline for nuScenes dataset and OPV2V dataset is different. Please refer to the specific folder for more details based on your research purpose.

:point_right: [nuScenes Users](nuscenes) 


:point_right: [OPV2V Users](opv2v)

## Models

  Fused Axial Attention Module (FAX) (click to expand) 



  SinBEVT (single-agent multi-view fusion) and FuseBEVT (multi-agent BEV fusion)  (click to expand) 



  CoBEVT Architecture (click to expand) 



## Results

  Main results (OPV2V-camera, -LiDAR, and nuScenes.) (click to expand) 



  Qualitative results on OPV2V-camera (click to expand) 



 



  Qualitative results on OPV2V-LiDAR (click to expand) 





  Qualitative results on nuScenes (click to expand) 



  Ablation study (click to expand) 



## Citation

 ```bibtex

@inproceedings{xu2022cobevt,

  author = {Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma},

  title = {CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers},

  booktitle={Conference on Robot Learning (CoRL)},

  year = {2022}}

@article{xu2022v2x,

  title={V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer},

  author={Xu, Runsheng and Xiang, Hao and Tu, Zhengzhong and Xia, Xin and Yang, Ming-Hsuan and Ma, Jiaqi},

  journal={Proceedings of the European Conference on Computer Vision (ECCV)},

  year={2022}

}

@inproceedings{tu2022maxim,

  title={Maxim: Multi-axis mlp for image processing},

  author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},

  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  pages={5769--5780},

  year={2022}

}

@article{tu2022maxvit,

  title={Maxvit: Multi-axis vision transformer},

  author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},

  journal={Proceedings of the European Conference on Computer Vision (ECCV)},

  year={2022}

}

```

## Acknowledgement

CoBEVT is build upon [OpenCOOD](https://github.com/DerrickXuNu/OpenCOOD), which is the first Open Cooperative Detection framework for autonomous driving.

Our nuScenes experiments used the training pipeline in [CVT(CVPR2022)](https://github.com/bradyz/cross_view_transformers).

CoBEVT is partly inspired by [V2X-ViT](https://github.com/DerrickXuNu/v2x-vit), [MAXIM](https://github.com/google-research/maxim) and [MaxViT](https://github.com/google-research/maxvit).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/DerrickXuNu/CoBEVT

Awesome Lists containing this project

README