Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bradyz/cross_view_transformers

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)
https://github.com/bradyz/cross_view_transformers

cvpr2022 deep-learning pytorch transformer

Last synced: 3 months ago
JSON representation

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

Host: GitHub
URL: https://github.com/bradyz/cross_view_transformers
Owner: bradyz
License: mit
Created: 2022-03-28T17:39:38.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2023-11-06T08:29:41.000Z (over 1 year ago)
Last Synced: 2024-08-01T03:42:31.825Z (6 months ago)
Topics: cvpr2022, deep-learning, pytorch, transformer
Language: Python
Homepage:
Size: 13 MB
Stars: 519
Watchers: 14
Forks: 79
Open Issues: 44
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # 
**Cross View Transformers**







This repository contains the source code and data for our paper:

> [**Cross-view Transformers for real-time Map-view Semantic Segmentation**](http://www.philkr.net/media/zhou2022crossview.pdf)  

> [Brady Zhou](https://www.bradyzhou.com/), [Philipp Krähenbühl](http://www.philkr.net/)  

> [*CVPR 2022*](https://cvpr2022.thecvf.com/)

## 
**Demos**









Map-view Segmentation:

The model uses multi-view images to produce a map-view segmentation at 45 FPS










Map Making:

With vehicle pose, we can construct a map by fusing model predictions over time










Cross-view Attention:

For a given map-view location, we show which image patches are being attended to






## 
**Installation**


```bash

# Clone repo

git clone https://github.com/bradyz/cross_view_transformers.git

cd cross_view_transformers

# Setup conda environment

conda create -y --name cvt python=3.8

conda activate cvt

conda install -y pytorch torchvision cudatoolkit=11.3 -c pytorch

# Install dependencies

pip install -r requirements.txt

pip install -e .

```

## 
**Data**







Documentation:

* [Dataset setup](docs/dataset_setup.md)

* [Label generation](docs/label_generation.md) (optional)




Download the original datasets and our generated map-view labels

| | Dataset | Labels |

| :-- | :-- | :-- |

| nuScenes | [keyframes + map expansion](https://www.nuscenes.org/nuscenes#download) (60 GB) | [cvt_labels_nuscenes.tar.gz](https://www.cs.utexas.edu/~bzhou/cvt/cvt_labels_nuscenes.tar.gz) (361 MB) |

| Argoverse 1.1 | [3D tracking](https://www.argoverse.org/av1.html#download-link) | coming soon™ |




The structure of the extracted data should look like the following

```

/datasets/

├─ nuscenes/

│  ├─ v1.0-trainval/

│  ├─ v1.0-mini/

│  ├─ samples/

│  ├─ sweeps/

│  └─ maps/

│     ├─ basemap/

│     └─ expansion/

└─ cvt_labels_nuscenes/

   ├─ scene-0001/

   ├─ scene-0001.json

   ├─ ...

   ├─ scene-1000/

   └─ scene-1000.json

```

When everything is setup correctly, check out the dataset with

```bash

python3 scripts/view_data.py \

  data=nuscenes \

  data.dataset_dir=/media/datasets/nuscenes \

  data.labels_dir=/media/datasets/cvt_labels_nuscenes \

  data.version=v1.0-mini \

  visualization=nuscenes_viz \

  +split=val

```

# 
**Training**










      







      












An average job of 50k training iterations takes ~8 hours.  

Our models were trained using 4 GPU jobs, but also can be trained on single GPU.

To train a model,

```bash

python3 scripts/train.py \

  +experiment=cvt_nuscenes_vehicle

  data.dataset_dir=/media/datasets/nuscenes \

  data.labels_dir=/media/datasets/cvt_labels_nuscenes

```

For more information, see

* `config/config.yaml` - base config

* `config/model/cvt.yaml` - model architecture

* `config/experiment/cvt_nuscenes_vehicle.yaml` - additional overrides

## 
**Additional Information**


### **Awesome Related Repos**

* https://github.com/wayveai/fiery

* https://github.com/nv-tlabs/lift-splat-shoot

* https://github.com/tom-roddick/mono-semantic-maps

### **License**

This project is released under the [MIT license](LICENSE)

### **Citation**

If you find this project useful for your research, please use the following BibTeX entry.

```bibtex

@inproceedings{zhou2022cross,

    title={Cross-view Transformers for real-time Map-view Semantic Segmentation},

    author={Zhou, Brady and Kr{\"a}henb{\"u}hl, Philipp},

    booktitle={CVPR},

    year={2022}

}

```