Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/isl-org/DPT

Dense Prediction Transformers
https://github.com/isl-org/DPT

Last synced: about 2 months ago
JSON representation

Dense Prediction Transformers

Host: GitHub
URL: https://github.com/isl-org/DPT
Owner: isl-org
License: mit
Created: 2021-03-22T16:17:15.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-07-25T11:16:17.000Z (5 months ago)
Last Synced: 2024-11-04T19:52:23.076Z (about 2 months ago)
Language: Python
Homepage:
Size: 424 KB
Stars: 2,015
Watchers: 43
Forks: 258
Open Issues: 38
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - isl-org/DPT

README

        ## Vision Transformers for Dense Prediction

This repository contains code and models for our [paper](https://arxiv.org/abs/2103.13413):

> Vision Transformers for Dense Prediction  

> René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

### Changelog 

* [March 2021] Initial release of inference code and models

### Setup 

1) Download the model weights and place them in the `weights` folder:

Monodepth:

- [dpt_hybrid-midas-501f0c75.pt](https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid-midas-501f0c75.pt), [Mirror](https://drive.google.com/file/d/1dgcJEYYw1F8qirXhZxgNK8dWWz_8gZBD/view?usp=sharing)

- [dpt_large-midas-2f21e586.pt](https://github.com/intel-isl/DPT/releases/download/1_0/dpt_large-midas-2f21e586.pt), [Mirror](https://drive.google.com/file/d/1vnuhoMc6caF-buQQ4hK0CeiMk9SjwB-G/view?usp=sharing)

Segmentation:

 - [dpt_hybrid-ade20k-53898607.pt](https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid-ade20k-53898607.pt), [Mirror](https://drive.google.com/file/d/1zKIAMbltJ3kpGLMh6wjsq65_k5XQ7_9m/view?usp=sharing)

 - [dpt_large-ade20k-b12dca68.pt](https://github.com/intel-isl/DPT/releases/download/1_0/dpt_large-ade20k-b12dca68.pt), [Mirror](https://drive.google.com/file/d/1foDpUM7CdS8Zl6GPdkrJaAOjskb7hHe-/view?usp=sharing)

  

2) Set up dependencies: 

    ```shell

    pip install -r requirements.txt

    ```

   The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5

### Usage 

1) Place one or more input images in the folder `input`.

2) Run a monocular depth estimation model:

    ```shell

    python run_monodepth.py

    ```

    Or run a semantic segmentation model:

    ```shell

    python run_segmentation.py

    ```

3) The results are written to the folder `output_monodepth` and `output_semseg`, respectively.

Use the flag `-t` to switch between different models. Possible options are `dpt_hybrid` (default) and `dpt_large`.

**Additional models:**

- Monodepth finetuned on KITTI: [dpt_hybrid_kitti-cb926ef4.pt](https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid_kitti-cb926ef4.pt) [Mirror](https://drive.google.com/file/d/1-oJpORoJEdxj4LTV-Pc17iB-smp-khcX/view?usp=sharing)

- Monodepth finetuned on NYUv2: [dpt_hybrid_nyu-2ce69ec7.pt](https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid_nyu-2ce69ec7.pt) [Mirror](https\://drive.google.com/file/d/1NjiFw1Z9lUAfTPZu4uQ9gourVwvmd58O/view?usp=sharing)

Run with 

```shell

python run_monodepth -t [dpt_hybrid_kitti|dpt_hybrid_nyu] 

```

### Evaluation

Hints on how to evaluate monodepth models can be found here: https://github.com/intel-isl/DPT/blob/main/EVALUATION.md

### Citation

Please cite our papers if you use this code or any of the models. 

```

@article{Ranftl2021,

	author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},

	title     = {Vision Transformers for Dense Prediction},

	journal   = {ArXiv preprint},

	year      = {2021},

}

```

```

@article{Ranftl2020,

	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},

	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},

	journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},

	year      = {2020},

}

```

### Acknowledgements

Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [PyTorch-Encoding](https://github.com/zhanghang1989/PyTorch-Encoding). We'd like to thank the authors for making these libraries available.

### License 

MIT License