Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bradyz/cross_view_transformers
Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)
https://github.com/bradyz/cross_view_transformers
cvpr2022 deep-learning pytorch transformer
Last synced: 16 days ago
JSON representation
Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)
- Host: GitHub
- URL: https://github.com/bradyz/cross_view_transformers
- Owner: bradyz
- License: mit
- Created: 2022-03-28T17:39:38.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-11-06T08:29:41.000Z (about 1 year ago)
- Last Synced: 2024-08-01T03:42:31.825Z (3 months ago)
- Topics: cvpr2022, deep-learning, pytorch, transformer
- Language: Python
- Homepage:
- Size: 13 MB
- Stars: 519
- Watchers: 14
- Forks: 79
- Open Issues: 44
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#
**Cross View Transformers**
This repository contains the source code and data for our paper:
> [**Cross-view Transformers for real-time Map-view Semantic Segmentation**](http://www.philkr.net/media/zhou2022crossview.pdf)
> [Brady Zhou](https://www.bradyzhou.com/), [Philipp Krähenbühl](http://www.philkr.net/)
> [*CVPR 2022*](https://cvpr2022.thecvf.com/)##
**Demos**
Map-view Segmentation:
The model uses multi-view images to produce a map-view segmentation at 45 FPS
Map Making:
With vehicle pose, we can construct a map by fusing model predictions over time
Cross-view Attention:
For a given map-view location, we show which image patches are being attended to
##
**Installation**```bash
# Clone repo
git clone https://github.com/bradyz/cross_view_transformers.gitcd cross_view_transformers
# Setup conda environment
conda create -y --name cvt python=3.8conda activate cvt
conda install -y pytorch torchvision cudatoolkit=11.3 -c pytorch# Install dependencies
pip install -r requirements.txt
pip install -e .
```##
**Data**
Documentation:
* [Dataset setup](docs/dataset_setup.md)
* [Label generation](docs/label_generation.md) (optional)
Download the original datasets and our generated map-view labels
| | Dataset | Labels |
| :-- | :-- | :-- |
| nuScenes | [keyframes + map expansion](https://www.nuscenes.org/nuscenes#download) (60 GB) | [cvt_labels_nuscenes.tar.gz](https://www.cs.utexas.edu/~bzhou/cvt/cvt_labels_nuscenes.tar.gz) (361 MB) |
| Argoverse 1.1 | [3D tracking](https://www.argoverse.org/av1.html#download-link) | coming soon™ |
The structure of the extracted data should look like the following
```
/datasets/
├─ nuscenes/
│ ├─ v1.0-trainval/
│ ├─ v1.0-mini/
│ ├─ samples/
│ ├─ sweeps/
│ └─ maps/
│ ├─ basemap/
│ └─ expansion/
└─ cvt_labels_nuscenes/
├─ scene-0001/
├─ scene-0001.json
├─ ...
├─ scene-1000/
└─ scene-1000.json
```When everything is setup correctly, check out the dataset with
```bash
python3 scripts/view_data.py \
data=nuscenes \
data.dataset_dir=/media/datasets/nuscenes \
data.labels_dir=/media/datasets/cvt_labels_nuscenes \
data.version=v1.0-mini \
visualization=nuscenes_viz \
+split=val
```#
**Training**
An average job of 50k training iterations takes ~8 hours.
Our models were trained using 4 GPU jobs, but also can be trained on single GPU.To train a model,
```bash
python3 scripts/train.py \
+experiment=cvt_nuscenes_vehicle
data.dataset_dir=/media/datasets/nuscenes \
data.labels_dir=/media/datasets/cvt_labels_nuscenes
```For more information, see
* `config/config.yaml` - base config
* `config/model/cvt.yaml` - model architecture
* `config/experiment/cvt_nuscenes_vehicle.yaml` - additional overrides##
**Additional Information**### **Awesome Related Repos**
* https://github.com/wayveai/fiery
* https://github.com/nv-tlabs/lift-splat-shoot
* https://github.com/tom-roddick/mono-semantic-maps### **License**
This project is released under the [MIT license](LICENSE)
### **Citation**
If you find this project useful for your research, please use the following BibTeX entry.
```bibtex
@inproceedings{zhou2022cross,
title={Cross-view Transformers for real-time Map-view Semantic Segmentation},
author={Zhou, Brady and Kr{\"a}henb{\"u}hl, Philipp},
booktitle={CVPR},
year={2022}
}
```