{"id":13427645,"url":"https://github.com/JiaRenChang/PSMNet","last_synced_at":"2025-03-16T00:31:50.722Z","repository":{"id":40523866,"uuid":"125822629","full_name":"JiaRenChang/PSMNet","owner":"JiaRenChang","description":"Pyramid Stereo Matching Network (CVPR2018)","archived":false,"fork":false,"pushed_at":"2021-09-22T09:09:08.000Z","size":109,"stargazers_count":1387,"open_issues_count":162,"forks_count":421,"subscribers_count":33,"default_branch":"master","last_synced_at":"2024-04-26T15:20:45.849Z","etag":null,"topics":["psmnet","pytorch","stereo-matching","stereo-vision"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JiaRenChang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-19T08:06:18.000Z","updated_at":"2024-04-23T05:50:56.000Z","dependencies_parsed_at":"2022-07-12T18:02:40.278Z","dependency_job_id":null,"html_url":"https://github.com/JiaRenChang/PSMNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiaRenChang%2FPSMNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiaRenChang%2FPSMNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiaRenChang%2FPSMNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiaRenChang%2FPSMNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JiaRenChang","download_url":"https://codeload.github.com/JiaRenChang/PSMNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221631802,"owners_count":16855011,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["psmnet","pytorch","stereo-matching","stereo-vision"],"created_at":"2024-07-31T01:00:35.464Z","updated_at":"2024-10-27T05:30:17.446Z","avatar_url":"https://github.com/JiaRenChang.png","language":"Python","readme":"# Pyramid Stereo Matching Network\n\nThis repository contains the code (in PyTorch) for \"[Pyramid Stereo Matching Network](https://arxiv.org/abs/1803.08669)\" paper (CVPR 2018) by [Jia-Ren Chang](https://jiarenchang.github.io/) and [Yong-Sheng Chen](https://people.cs.nctu.edu.tw/~yschen/).\n\n#### changelog\n2020/12/20: Update PSMNet: now support torch 1.6.0 / torchvision 0.5.0 and python 3.7, Removed inconsistent indentation.\n\n2020/12/20: Our proposed Real-Time Stereo can be found here [Real-time Stereo](https://github.com/JiaRenChang/RealtimeStereo).\n### Citation\n```\n@inproceedings{chang2018pyramid,\n  title={Pyramid Stereo Matching Network},\n  author={Chang, Jia-Ren and Chen, Yong-Sheng},\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n  pages={5410--5418},\n  year={2018}\n}\n```\n\n## Contents\n\n1. [Introduction](#introduction)\n2. [Usage](#usage)\n3. [Results](#results)\n4. [Contacts](#contacts)\n\n## Introduction\n\nRecent work has shown that depth estimation from a stereo pair of images can be formulated as a supervised learning task to be resolved with convolutional neural networks (CNNs). However, current architectures rely on patch-based Siamese networks, lacking the means to exploit context information for finding correspondence in illposed regions. To tackle this problem, we propose PSMNet, a pyramid stereo matching network consisting of two main modules: spatial pyramid pooling and 3D CNN. The spatial pyramid pooling module takes advantage of the capacity of global context information by aggregating context in different scales and locations to form a cost volume. The 3D CNN learns to regularize cost volume using stacked multiple hourglass networks in conjunction with intermediate supervision.\n\n\u003cimg align=\"center\" src=\"https://user-images.githubusercontent.com/11732099/43501836-1d32897c-958a-11e8-8083-ad41ec26be17.jpg\"\u003e\n\n## Usage\n\n### Dependencies\n\n- [Python 3.7](https://www.python.org/downloads/)\n- [PyTorch(1.6.0+)](http://pytorch.org)\n- torchvision 0.5.0\n- [KITTI Stereo](http://www.cvlibs.net/datasets/kitti/eval_stereo.php)\n- [Scene Flow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)\n\n```\nUsage of Scene Flow dataset\nDownload RGB cleanpass images and its disparity for three subset: FlyingThings3D, Driving, and Monkaa.\nPut them in the same folder.\nAnd rename the folder as: \"driving_frames_cleanpass\", \"driving_disparity\", \"monkaa_frames_cleanpass\", \"monkaa_disparity\", \"frames_cleanpass\", \"frames_disparity\".\n```\n### Notice\n1. Warning of upsample function in PyTorch 0.4.1+: add \"align_corners=True\" to upsample functions.\n2. Output disparity may be better with multipling by 1.17. Reported from issues [#135](https://github.com/JiaRenChang/PSMNet/issues/135) and [#113](https://github.com/JiaRenChang/PSMNet/issues/113).\n\n### Train\nAs an example, use the following command to train a PSMNet on Scene Flow\n\n```\npython main.py --maxdisp 192 \\\n               --model stackhourglass \\\n               --datapath (your scene flow data folder)\\\n               --epochs 10 \\\n               --loadmodel (optional)\\\n               --savemodel (path for saving model)\n```\n\nAs another example, use the following command to finetune a PSMNet on KITTI 2015\n\n```\npython finetune.py --maxdisp 192 \\\n                   --model stackhourglass \\\n                   --datatype 2015 \\\n                   --datapath (KITTI 2015 training data folder) \\\n                   --epochs 300 \\\n                   --loadmodel (pretrained PSMNet) \\\n                   --savemodel (path for saving model)\n```\nYou can also see those examples in run.sh.\n\n### Evaluation\nUse the following command to evaluate the trained PSMNet on KITTI 2015 test data\n\n```\npython submission.py --maxdisp 192 \\\n                     --model stackhourglass \\\n                     --KITTI 2015 \\\n                     --datapath (KITTI 2015 test data folder) \\\n                     --loadmodel (finetuned PSMNet) \\\n```\n\n### Pretrained Model\n※NOTE: The pretrained model were saved in .tar; however, you don't need to untar it. Use torch.load() to load it.\n\nUpdate: 2018/9/6 We released the pre-trained KITTI 2012 model.\n\nUpdate: 2021/9/22 a pretrained model using torch 1.8.1 (the previous model weight are trained torch 0.4.1)\n\n| KITTI 2015 |  Scene Flow | KITTI 2012 | Scene Flow (torch 1.8.1)\n|---|---|---|---|\n|[Google Drive](https://drive.google.com/file/d/1pHWjmhKMG4ffCrpcsp_MTXMJXhgl3kF9/view?usp=sharing)|[Google Drive](https://drive.google.com/file/d/1xoqkQ2NXik1TML_FMUTNZJFAHrhLdKZG/view?usp=sharing)|[Google Drive](https://drive.google.com/file/d/1p4eJ2xDzvQxaqB20A_MmSP9-KORBX1pZ/view?usp=sharing)| [Google Drive](https://drive.google.com/file/d/1NDKrWHkwgMKtDwynXVU12emK3G5d5kkp/view?usp=sharing)\n\n### Test on your own stereo pair\n```\npython Test_img.py --loadmodel (finetuned PSMNet) --leftimg ./left.png --rightimg ./right.png\n```\n\n## Results\n\n### Evaluation of PSMNet with different settings\n\u003cimg align=\"center\" src=\"https://user-images.githubusercontent.com/11732099/37817886-45a12ece-2eb3-11e8-8254-ae92c723b2f6.png\"\u003e\n\n※Note that the reported 3-px validation errors were calculated using KITTI's official matlab code, not our code.\n\n### Results on KITTI 2015 leaderboard\n[Leaderboard Link](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)\n\n| Method | D1-all (All) | D1-all (Noc)| Runtime (s) |\n|---|---|---|---|\n| PSMNet | 2.32 % | 2.14 % | 0.41 |\n| [iResNet-i2](https://arxiv.org/abs/1712.01039) | 2.44 % | 2.19 % | 0.12 |\n| [GC-Net](https://arxiv.org/abs/1703.04309) | 2.87 % | 2.61 % | 0.90 |\n| [MC-CNN](https://github.com/jzbontar/mc-cnn) | 3.89 % | 3.33 % | 67 |\n\n### Qualitative results\n#### Left image\n\u003cimg align=\"center\" src=\"http://www.cvlibs.net/datasets/kitti/results/efb9db97938e12a20b9c95ce593f633dd63a2744/image_0/000004_10.png\"\u003e\n\n#### Predicted disparity\n\u003cimg align=\"center\" src=\"http://www.cvlibs.net/datasets/kitti/results/efb9db97938e12a20b9c95ce593f633dd63a2744/result_disp_img_0/000004_10.png\"\u003e\n\n#### Error\n\u003cimg align=\"center\" src=\"http://www.cvlibs.net/datasets/kitti/results/efb9db97938e12a20b9c95ce593f633dd63a2744/errors_disp_img_0/000004_10.png\"\u003e\n\n### Visualization of Receptive Field\nWe visualize the receptive fields of different settings of PSMNet, full setting and baseline.\n\nFull setting: dilated conv, SPP, stacked hourglass\n\nBaseline: no dilated conv, no SPP, no stacked hourglass\n\nThe receptive fields were calculated for the pixel at image center, indicated by the red cross.\n\n\u003cimg align=\"center\" src=\"https://user-images.githubusercontent.com/11732099/37876179-6d6dd97e-307b-11e8-803e-bcdbec29fb94.png\"\u003e\n\n\n\n## Contacts\nfollowwar@gmail.com\n\nAny discussions or concerns are welcomed!\n","funding_links":[],"categories":["Supervised Stereo vision","Python","Topics"],"sub_categories":["Perception"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJiaRenChang%2FPSMNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJiaRenChang%2FPSMNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJiaRenChang%2FPSMNet/lists"}