{"id":13738515,"url":"https://github.com/FangjinhuaWang/PatchmatchNet","last_synced_at":"2025-05-08T16:34:27.591Z","repository":{"id":37337676,"uuid":"307415299","full_name":"FangjinhuaWang/PatchmatchNet","owner":"FangjinhuaWang","description":"Official code of PatchmatchNet (CVPR 2021 Oral)","archived":false,"fork":false,"pushed_at":"2022-05-28T05:42:22.000Z","size":381075,"stargazers_count":495,"open_issues_count":9,"forks_count":70,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-08-04T03:12:45.135Z","etag":null,"topics":["3d-reconstruction","deep-learning","multi-view-stereo"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FangjinhuaWang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-26T15:19:41.000Z","updated_at":"2024-07-27T16:24:31.000Z","dependencies_parsed_at":"2022-07-12T12:31:24.435Z","dependency_job_id":null,"html_url":"https://github.com/FangjinhuaWang/PatchmatchNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FangjinhuaWang%2FPatchmatchNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FangjinhuaWang%2FPatchmatchNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FangjinhuaWang%2FPatchmatchNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FangjinhuaWang%2FPatchmatchNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FangjinhuaWang","download_url":"https://codeload.github.com/FangjinhuaWang/PatchmatchNet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224746839,"owners_count":17363126,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-reconstruction","deep-learning","multi-view-stereo"],"created_at":"2024-08-03T03:02:24.784Z","updated_at":"2024-11-15T07:31:18.186Z","avatar_url":"https://github.com/FangjinhuaWang.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# PatchmatchNet (CVPR2021 Oral)\nofficial source code of paper 'PatchmatchNet: Learned Multi-View Patchmatch Stereo'\n![](imgs/structure_teaser.jpg)\n\n## Updates\n- 13.12.2021: New unified format for training and evaluation datasets, support for arbitrary image sizes\n  and multi-camera setups, and new names for script parameters.\n- 27.09.2021: The code now allows for Torchscript export and includes a pre-trained TorchScript module.\n\n## Introduction\nPatchmatchNet is a novel cascade formulation of learning-based Patchmatch which aims at decreasing memory consumption and\ncomputation time for high-resolution multi-view stereo. If you find this project useful for your research, please cite:\n```\n@misc{wang2020patchmatchnet,\n      title={PatchmatchNet: Learned Multi-View Patchmatch Stereo}, \n      author={Fangjinhua Wang and Silvano Galliani and Christoph Vogel and Pablo Speciale and Marc Pollefeys},\n      journal={CVPR},\n      year={2021}\n}\n```\n\n## Installation\n### Requirements\n* python 3.8\n* CUDA \u003e= 10.1\n\n```\npip install -r requirements.txt\n```\n\n## Reproducing Results\n* Download our pre-processed dataset:\n  [DTU's evaluation set](https://drive.google.com/file/d/1jN8yEQX0a-S22XwUjISM8xSJD39pFLL_/view?usp=sharing),\n  [Tanks \u0026 Temples](https://drive.google.com/file/d/1gAfmeoGNEFl9dL4QcAU4kF0BAyTd-r8Z/view?usp=sharing) and\n  [ETH3D benchmark](https://polybox.ethz.ch/index.php/s/pmTGWobErOnhEg0). Each dataset is already organized as follows:\n```\nroot_directory\n├──scan1 (scene_name1)\n├──scan2 (scene_name2) \n      ├── images                 \n      │   ├── 00000000.jpg       \n      │   ├── 00000001.jpg       \n      │   └── ...                \n      ├── cams                   \n      │   ├── 00000000_cam.txt   \n      │   ├── 00000001_cam.txt   \n      │   └── ...                \n      └── pair.txt  \n```\nNote: \n- The subfolders for Tanks \u0026 Temples and ETH3D will not be named `scanN` but the lists included under\n  `./lists/eth3d` and `./lists/tanks` will have the correct naming conventions.\n- If the folders for images and cameras, and the pair file don't follow the standard naming conventions you can modify\n  the settings of `MVSDataset` in `datasets/mvs.py` to specify the custom `image_folder`, `cam_folder`, and `pair_path`\n- The `MVSDataset` is configured by default for JPEG images. If you're using a different format (e.g., PNG) you can change\n  the `image_extension` parameter of `MVSDataset` accordingly.\n\nCamera file `cam.txt` stores the camera parameters, which includes extrinsic, intrinsic, minimum depth and maximum depth:\n```\nextrinsic\nE00 E01 E02 E03\nE10 E11 E12 E13\nE20 E21 E22 E23\nE30 E31 E32 E33\n\nintrinsic\nK00 K01 K02\nK10 K11 K12\nK20 K21 K22\n\nDEPTH_MIN DEPTH_MAX \n```\n\n`pair.txt ` stores the view selection result. For each reference image, N (10 or more) best source views are stored in the file:\n```\nTOTAL_IMAGE_NUM\nIMAGE_ID0                       # index of reference image 0 \n10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 0 \nIMAGE_ID1                       # index of reference image 1\n10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 1 \n...\n``` \n\n* In `eval.sh`, set `DTU_TESTING`, `ETH3D_TESTING` or `TANK_TESTING` as the root directory of corresponding dataset\n  and uncomment the evaluation command for corresponding dataset (default is to evaluate on DTU's evaluation set).\n  If you want to change the output location (default is same as input one), modify the `--output_folder` parameter.\n  For Tanks the `--scan_list` can be intermediate or advanced and for ETH3D it can be test or train.\n* `CKPT_FILE` is the checkpoint file (our pretrained model is `./checkpoints/params_000007.ckpt`), change it if you want\n  to use your own model. If you want to use the model from the TorchScript module instead, you can specify the checkpoint\n  file as `./checkpoints/module_000007.pt` and set the option `--input_type module`.\n* Test on GPU by running `sh eval.sh`. The code includes depth map estimation and depth fusion. The outputs are the\n  point clouds in `ply` format. \n* For quantitative evaluation on DTU dataset, download [SampleSet](http://roboimagedata.compute.dtu.dk/?page_id=36) and\n  [Points](http://roboimagedata.compute.dtu.dk/?page_id=36). Unzip them and place `Points` folder in `SampleSet/MVS Data/`.\n  The structure looks like:\n```\nSampleSet\n├──MVS Data\n      └──Points\n```\n\nIn `evaluations/dtu/BaseEvalMain_web.m`, set `dataPath` as path to `SampleSet/MVS Data/`, `plyPath` as directory that\nstores the reconstructed point clouds and `resultsPath` as directory to store the evaluation results. Then run\n`evaluations/dtu/BaseEvalMain_web.m` in matlab.\n\nThe results look like:\n\n| Acc. (mm) | Comp. (mm) | Overall (mm) |\n|-----------|------------|--------------|\n| 0.427     | 0.277      | 0.352        |\n\n* For detailed quantitative results on Tanks \u0026 Temples and ETH3D, please check the leaderboards\n  ([Tanks \u0026 Temples](https://www.tanksandtemples.org/details/1170/), [ETH3D](https://www.eth3d.net/result_details?id=216))\n\n## Evaluation on Custom Dataset\n* For evaluation, we support preparing the custom dataset from COLMAP's results. The script `colmap_input.py`\n  (modified based on the script from [MVSNet](https://github.com/YoYo000/MVSNet)) converts COLMAP's sparse reconstruction\n  results into the same format as the datasets that we provide. After reconstruction, COLMAP will generate a folder\n  `COLMAP/dense/`, which contains `COLMAP/dense/images/` and `COLMAP/dense/sparse`. Then you need to run like this:\n```\npython colmap_input.py --input_folder COLMAP/dense/ \n```\n* The default output location is the same as the input one. If you want to change that, set the `--output_folder` parameter.\n* The default behavior of the converter will find all possible related images for each source image. If you want to constrain\n  the max number of related images set the `--num_src_images` parameter.\n* In `eval.sh`, set `CUSTOM_TESTING` as the root directory of the dataset, set `--output_folder` as the directory to store\n  the reconstructed point clouds (default is same as input directory), set `--image_max_dim` to an appropriate size (this\n  is determined by the available GPU memory and the desired processing speed) or use the native size by removing the\n  parameter, and uncomment the evaluation command. Test on GPU by running `sh eval.sh`.\n\n## Training\nDownload pre-processed [DTU's training set](https://polybox.ethz.ch/index.php/s/ugDdJQIuZTk4S35). The dataset is already\norganized as follows:\n```\nroot_directory\n├── Cameras_1\n│    ├── train\n│    │    ├── 00000000_cam.txt\n│    │    ├── 00000000_cam.txt\n│    │    └── ...\n│    └── pair.txt\n├── Depths_raw\n│    ├── scan1\n│    │    ├── depth_map_0000.pfm\n│    │    ├── depth_visual_0000.png\n│    │    ├── depth_map_0001.pfm\n│    │    ├── depth_visual_0001.png\n│    │    └── ...\n│    ├── scan2\n│    └── ...\n└── Rectified\n     ├── scan1_train\n     │    ├── rect_001_0_r5000.png\n     │    ├── rect_001_1_r5000.png\n     │    ├── ...\n     │    ├── rect_001_6_r5000.png\n     │    ├── rect_002_0_r5000.png\n     │    ├── rect_002_1_r5000.png\n     │    ├── ...\n     │    ├── rect_002_6_r5000.png\n     │    └── ...\n     ├── scan2_train\n     └── ...\n```\nTo use this dataset directly look into the [Legacy Training](#legacy-training) section below. For the current version of training the\ndataset needs to be converted to a format compatible with `MVSDataset` in `./datasets/mvs.py` using the script\n`convert_dtu_dataset.py` as follows:\n```\npython convert_dtu_dataset.py --input_folder \u003coriginal_dataset\u003e --output_folder \u003cconverted_dataset\u003e --scan_list ./lists/dtu/all.txt\n```\nThe converted dataset will now be in a format similar to the evaluation datasets:\n```\nroot_directory\n├── scan1 (scene_name1)\n├── scan2 (scene_name2) \n│     ├── cams (camera parameters)\n│     │   ├── 00000000_cam.txt   \n│     │   ├── 00000001_cam.txt   \n│     │   └── ...                \n│     ├── depth_gt (ground truth depth maps)\n│     │   ├── 00000000.pfm   \n│     │   ├── 00000001.pfm   \n│     │   └── ...                \n│     ├── images (images at 7 light indexes) \n│     │   ├── 0 (light index 0)\n│     │   │   ├── 00000000.jpg       \n│     │   │   ├── 00000001.jpg\n│     │   │   └── ...\n│     │   ├── 1 (light index 1)\n│     │   └── ...                \n│     ├── masks (depth map masks) \n│     │   ├── 00000000.png       \n│     │   ├── 00000001.png       \n│     │   └── ...                \n│     └── pair.txt\n└── ...\n```\n* In `train.sh`, set `MVS_TRAINING` as the root directory of the converted dataset; set `--output_path` as the directory\n  to store the checkpoints.\n* Train the model by running `sh train.sh`.\n* The output consists of one checkpoint (model parameters) and one TorchScript module per epoch named as\n  `params_\u003cepoch_id\u003e.ckpt` and `module_\u003cepoch_id\u003e.pt` respectively.\n\n### Legacy Training\nTo train directly on the [original DTU dataset](https://polybox.ethz.ch/index.php/s/ugDdJQIuZTk4S35) the legacy training\nscript `train_dtu.py` (using the legacy `MVSDataset` from `datasets/dtu_yao.py`) needs to be called from the `train.sh`\nscript.\n* In `train.sh`, set `MVS_TRAINING` as the root directory of the original dataset; set `--logdir` as the directory to\n  store the checkpoints. \n* Uncomment the appropriate section for legacy training and comment out the other entry.\n* Train the model by running `sh train.sh`.\n\n### Note:\n`--patchmatch_iteration` represents the number of iterations of Patchmatch on multi-stages (e.g., the default number `1,2,2`\nmeans 1 iteration on stage 1, 2 iterations on stage 2 and 2 iterations on stage 3). `--propagate_neighbors` represents the\nnumber of neighbors for adaptive propagation (e.g., the default number `0,8,16` means no propagation for Patchmatch on\nstage 1, using 8 neighbors for propagation on stage 2 and using 16 neighbors for propagation on stage 3). As explained in\nour paper, we do not include adaptive propagation for the last iteration of Patchmatch on stage 1 due to the requirement\nof photometric consistency filtering. So in our default case (also for our pretrained model), we set the number of propagation\nneighbors on stage 1 as `0` since the number of iteration on stage 1 is `1`. If you want to train the model with more\niterations on stage 1, change the corresponding number in `--propagate_neighbors` to include adaptive propagation for\nPatchmatch expect for the last iteration.\n\n## Acknowledgements\nThis project is done in collaboration with \"Microsoft Mixed Reality \u0026 AI Zurich Lab\".\n\nThanks to Yao Yao for open-sourcing his excellent work [MVSNet](https://github.com/YoYo000/MVSNet). Thanks to Xiaoyang Guo\nfor open-sourcing his PyTorch implementation of MVSNet [MVSNet-pytorch](https://github.com/xy-guo/MVSNet_pytorch).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFangjinhuaWang%2FPatchmatchNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFangjinhuaWang%2FPatchmatchNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFangjinhuaWang%2FPatchmatchNet/lists"}