Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/JiaRenChang/PSMNet

Pyramid Stereo Matching Network (CVPR2018)
https://github.com/JiaRenChang/PSMNet

psmnet pytorch stereo-matching stereo-vision

Last synced: 3 months ago
JSON representation

Pyramid Stereo Matching Network (CVPR2018)

Host: GitHub
URL: https://github.com/JiaRenChang/PSMNet
Owner: JiaRenChang
License: mit
Created: 2018-03-19T08:06:18.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2021-09-22T09:09:08.000Z (over 3 years ago)
Last Synced: 2024-04-26T15:20:45.849Z (9 months ago)
Topics: psmnet, pytorch, stereo-matching, stereo-vision
Language: Python
Size: 106 KB
Stars: 1,387
Watchers: 33
Forks: 421
Open Issues: 162
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Pyramid Stereo Matching Network

This repository contains the code (in PyTorch) for "[Pyramid Stereo Matching Network](https://arxiv.org/abs/1803.08669)" paper (CVPR 2018) by [Jia-Ren Chang](https://jiarenchang.github.io/) and [Yong-Sheng Chen](https://people.cs.nctu.edu.tw/~yschen/).

#### changelog

2020/12/20: Update PSMNet: now support torch 1.6.0 / torchvision 0.5.0 and python 3.7, Removed inconsistent indentation.

2020/12/20: Our proposed Real-Time Stereo can be found here [Real-time Stereo](https://github.com/JiaRenChang/RealtimeStereo).

### Citation

```

@inproceedings{chang2018pyramid,

  title={Pyramid Stereo Matching Network},

  author={Chang, Jia-Ren and Chen, Yong-Sheng},

  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},

  pages={5410--5418},

  year={2018}

}

```

## Contents

1. [Introduction](#introduction)

2. [Usage](#usage)

3. [Results](#results)

4. [Contacts](#contacts)

## Introduction

Recent work has shown that depth estimation from a stereo pair of images can be formulated as a supervised learning task to be resolved with convolutional neural networks (CNNs). However, current architectures rely on patch-based Siamese networks, lacking the means to exploit context information for finding correspondence in illposed regions. To tackle this problem, we propose PSMNet, a pyramid stereo matching network consisting of two main modules: spatial pyramid pooling and 3D CNN. The spatial pyramid pooling module takes advantage of the capacity of global context information by aggregating context in different scales and locations to form a cost volume. The 3D CNN learns to regularize cost volume using stacked multiple hourglass networks in conjunction with intermediate supervision.



## Usage

### Dependencies

- [Python 3.7](https://www.python.org/downloads/)

- [PyTorch(1.6.0+)](http://pytorch.org)

- torchvision 0.5.0

- [KITTI Stereo](http://www.cvlibs.net/datasets/kitti/eval_stereo.php)

- [Scene Flow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)

```

Usage of Scene Flow dataset

Download RGB cleanpass images and its disparity for three subset: FlyingThings3D, Driving, and Monkaa.

Put them in the same folder.

And rename the folder as: "driving_frames_cleanpass", "driving_disparity", "monkaa_frames_cleanpass", "monkaa_disparity", "frames_cleanpass", "frames_disparity".

```

### Notice

1. Warning of upsample function in PyTorch 0.4.1+: add "align_corners=True" to upsample functions.

2. Output disparity may be better with multipling by 1.17. Reported from issues [#135](https://github.com/JiaRenChang/PSMNet/issues/135) and [#113](https://github.com/JiaRenChang/PSMNet/issues/113).

### Train

As an example, use the following command to train a PSMNet on Scene Flow

```

python main.py --maxdisp 192 \

               --model stackhourglass \

               --datapath (your scene flow data folder)\

               --epochs 10 \

               --loadmodel (optional)\

               --savemodel (path for saving model)

```

As another example, use the following command to finetune a PSMNet on KITTI 2015

```

python finetune.py --maxdisp 192 \

                   --model stackhourglass \

                   --datatype 2015 \

                   --datapath (KITTI 2015 training data folder) \

                   --epochs 300 \

                   --loadmodel (pretrained PSMNet) \

                   --savemodel (path for saving model)

```

You can also see those examples in run.sh.

### Evaluation

Use the following command to evaluate the trained PSMNet on KITTI 2015 test data

```

python submission.py --maxdisp 192 \

                     --model stackhourglass \

                     --KITTI 2015 \

                     --datapath (KITTI 2015 test data folder) \

                     --loadmodel (finetuned PSMNet) \

```

### Pretrained Model

※NOTE: The pretrained model were saved in .tar; however, you don't need to untar it. Use torch.load() to load it.

Update: 2018/9/6 We released the pre-trained KITTI 2012 model.

Update: 2021/9/22 a pretrained model using torch 1.8.1 (the previous model weight are trained torch 0.4.1)

| KITTI 2015 |  Scene Flow | KITTI 2012 | Scene Flow (torch 1.8.1)

|---|---|---|---|

|[Google Drive](https://drive.google.com/file/d/1pHWjmhKMG4ffCrpcsp_MTXMJXhgl3kF9/view?usp=sharing)|[Google Drive](https://drive.google.com/file/d/1xoqkQ2NXik1TML_FMUTNZJFAHrhLdKZG/view?usp=sharing)|[Google Drive](https://drive.google.com/file/d/1p4eJ2xDzvQxaqB20A_MmSP9-KORBX1pZ/view?usp=sharing)| [Google Drive](https://drive.google.com/file/d/1NDKrWHkwgMKtDwynXVU12emK3G5d5kkp/view?usp=sharing)

### Test on your own stereo pair

```

python Test_img.py --loadmodel (finetuned PSMNet) --leftimg ./left.png --rightimg ./right.png

```

## Results

### Evaluation of PSMNet with different settings



※Note that the reported 3-px validation errors were calculated using KITTI's official matlab code, not our code.

### Results on KITTI 2015 leaderboard

[Leaderboard Link](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)

| Method | D1-all (All) | D1-all (Noc)| Runtime (s) |

|---|---|---|---|

| PSMNet | 2.32 % | 2.14 % | 0.41 |

| [iResNet-i2](https://arxiv.org/abs/1712.01039) | 2.44 % | 2.19 % | 0.12 |

| [GC-Net](https://arxiv.org/abs/1703.04309) | 2.87 % | 2.61 % | 0.90 |

| [MC-CNN](https://github.com/jzbontar/mc-cnn) | 3.89 % | 3.33 % | 67 |

### Qualitative results

#### Left image



#### Predicted disparity



#### Error



### Visualization of Receptive Field

We visualize the receptive fields of different settings of PSMNet, full setting and baseline.

Full setting: dilated conv, SPP, stacked hourglass

Baseline: no dilated conv, no SPP, no stacked hourglass

The receptive fields were calculated for the pixel at image center, indicated by the red cross.



## Contacts

[email protected]

Any discussions or concerns are welcomed!