{"id":15033302,"url":"https://github.com/compphoto/boostingmonoculardepth","last_synced_at":"2025-05-15T18:11:11.586Z","repository":{"id":40625952,"uuid":"372716052","full_name":"compphoto/BoostingMonocularDepth","owner":"compphoto","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-06T21:34:51.000Z","size":56840,"stargazers_count":1499,"open_issues_count":0,"forks_count":207,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-04-07T23:08:43.467Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/compphoto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-01T05:52:46.000Z","updated_at":"2025-04-02T09:02:09.000Z","dependencies_parsed_at":"2024-08-07T00:27:57.690Z","dependency_job_id":null,"html_url":"https://github.com/compphoto/BoostingMonocularDepth","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compphoto%2FBoostingMonocularDepth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compphoto%2FBoostingMonocularDepth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compphoto%2FBoostingMonocularDepth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/compphoto%2FBoostingMonocularDepth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/compphoto","download_url":"https://codeload.github.com/compphoto/BoostingMonocularDepth/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247744335,"owners_count":20988783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-24T20:20:39.634Z","updated_at":"2025-04-07T23:09:20.838Z","avatar_url":"https://github.com/compphoto.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### **(NEW!)** We released our recent work on SI-depth estimation published at SIGGRAPH 2024, Checkout the [project webpage](https://yaksoy.github.io/sidepth/)!\nIn our recent publication \"Scale-Invariant Monocular Depth Estimation via SSI Depth\" we estimate Scale-Invariant (SI) depth. SI-invariant depth provides a goemetrical depth representation that can projected to 3D without distortions.\n\nWe also introduce a new scale and shift-invariant depth model that although not geometric, generates significant amount of details compared to very recent state of the art.\n\nCheckout our new [project](https://yaksoy.github.io/sidepth/) and its [implementation](https://github.com/compphoto/sidepth).\n\n\n### My thesis provides a thorough explanation of this work. Checkout my [video presenation](https://youtu.be/DZ0ft1l50KY)!\n\nI recently graduated from the Computing Science Master's Program at Simon Fraser University. My thesis is on \"Boosting Monocular Depth Estimation to High Resolution\" which includes **a more detailed explanation of our paper**. Checkout the thesis webpage [here](http://yaksoy.github.io/bmd-msc/). \n\n### [Boost Your Own depth](https://github.com/compphoto/BoostYourOwnDepth) with our new repo\n\nWe present a stand-alone implementation of our [Merging Operator](#method). This new repo allows using any pair of monocular depth estimations in our double estimation. This includes using separate networks for base and high-res estimations, using networks not supported by this repo (such as [Midas-v3](https://github.com/isl-org/MiDaS)), or using manually edited depth maps for artistic use. This will also be useful for scientists developing CNN-based MDE as a way to quickly apply double estimation to their own network. For more details please take a look [here](https://github.com/compphoto/BoostYourOwnDepth).\n\n| Input | Original result | After manual editing of base|\n|----|------------|------------|\n|![patchselection](./figures/lunch_rgb.jpg)|![patchselection](./figures/lunch_orig.png)|![patchselection](./figures/lunch_edited.png)|\n\n\n### [LeRes][2] is now supported within our method.\n\nHere is a visualization of the improvement gained using [LeRes][2] instead of [MiDas][1].\n|RGB | Our method using [MiDaS][1] | Our method using [LeRes][2] (**NEW**!) |\n|----|------------|-----------|\n|![patchselection](./inputs/sample2.jpg)|![Patchexpand](./figures/sample2_midas.png)|![Patchexpand](./figures/sample2_leres.jpg)|\n\n\n\n### Maximum resolution can be set for a faster run time.\n\nUse **\\--max_res** as input argument for run.py in combination with **--Final** to set a limit on the resolution of the results that our method generates.\n\nWe provide this parameter as a trade-off between run-time and resolution. Using this reduces the run-time if only a result up to *specific-megapixel* is needed.\n\nThis parameter sets a limit on the bigger dimension of the result in term of pixels (while keeping aspect ratio). For example, to generate results with a bigger dimension size up to 2000 pixels use the following:   \n\n```python\npython run.py --Final --max_res 2000 --data_dir PATH_TO_INPUT --output_dir PATH_TO_RESULT --depthNet 0\n```\n\n\n### Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging \n\n\u003e S. Mahdi H. Miangoleh\\*, Sebastian Dille\\*, Long Mai, Sylvain Paris, Yağız Aksoy.\n\u003e [Main pdf](http://yaksoy.github.io/papers/CVPR21-HighResDepth.pdf),\n\u003e [Supplementary pdf](http://yaksoy.github.io/papers/CVPR21-HighResDepth-Supp.pdf),\n\u003e [Project Page](http://yaksoy.github.io/highresdepth/).\n\n[![video](./figures/video_thumbnail.jpg)](https://www.youtube.com/watch?v=lDeI17pHlqo)\n\nWe propose a method that can generate highly detailed high-resolution depth estimations from a single image. Our method is based on optimizing the performance of a pre-trained network by merging estimations in different resolutions and different patches to generate a high-resolution estimate. \n\nTry our model easily on Colab : [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/compphoto/BoostingMonocularDepth/blob/main/Boostmonoculardepth.ipynb)\n\n\n\n### Change log:\n\n* (**NEW!**) Now you can set the maximum resolution of the results to reduce runtime.  \n* **(NEW!)** Our method implementation using [LeReS][2] is now available. [July 2021]\n* A Quick overview of the method is now presented in README.md. [July 2021]\n* [Google Colaboratory notebook](./Boostmonoculardepth.ipynb) is now available.  [June 2021]   [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/compphoto/BoostingMonocularDepth/blob/main/Boostmonoculardepth.ipynb)\n* Merge net training dataset generation [instructions](./dataset_prepare/mergenet_dataset_prepare.md) is now available. [June 2021] \n* Bug fix. [June 2021, July 2021]\n\n\n\n## Method\n\nWe use **existing** *monocular depth estimation* networks to generate highly detailed estimations **without re-training**.\n\nWe achieve our results by getting several estimations at different resolutions. We then merge these into a structurally consistent high-resolution depth map followed by a local boosting to enhance the results and generate our final result.\n\n![Overview](./figures/overview.png)\n\n\n\n##### Observations\n\nMonocular depth estimation uses contextual cues such as occlusions or the relative sizes of objects to estimate the structure of the scene.\n\nWe will use a pre-trained [MiDas-v2][1] here, but our analysis with the [SGR][2] network also supports our claims.\n\nWhen we feed the image to the network at different resolutions, some interesting patterns arise.\nAt lower resolutions, many details in the scene are missing, such as birds in this example. At high resolutions, however, we start to see inconsistent overall structure, and this flat board gets significantly less flat. The advantage is that the network is able to generate high frequency details.\nThis shows that there is a trade-off between structural consistency and high-frequency details with respect to input resolution.\n\n![Observations](./figures/observation.png)\n\n\n\n##### Explanations\n\nWe explain this behavior through two properties of convolutional neural networks: limited **receptive field** size and **network capacity**.\nThe lack of high frequency details in low resolutions are due to a limited network capacity. A small network that generates the structure of a complex scene cannot also generate fine details.\n\nThe loss of structure at high resolutions comes from a limited receptive field size. The receptive field is the region around a pixel that contributes to the estimation at that pixel. It is set by the network configuration and training resolution, and effectively gets smaller as resolution increases. At a low resolution, every pixel can see the edges of the board, so the network judges that this is a flat wall. At a high resolution, however, some pixels do not receive any contextual information. This results in large structural inconsistencies.\n\n![Explanations](./figures/explanation.png)\n\n##### Best resolution search \n\nFor any given image, we determine the highest resolution that will result in a consistent structure by making sure that every pixel has contextual information. For this purpose, we need the distribution of contextual cues in the image. We approximate contextual cues with a simple edge map. \n\nThe resolution where every pixel is at most a half receptive field size away from context edges is called R_0.\nWhen we increase the resolution any further, structural inconsistencies will arise but more details will be generated. When 20% of the pixels do not receive any context, we call this resolution R_20.\n**Note that R_0 and R_20 depend on the image content!**\n\n![Resolutionsearch](./figures/ressearch.png)\n\n##### Double Estimation\n\nWe are still able to go beyond R0 by merging the high-frequency details in the R20 resolution onto the structure of the base resolution. We call this **Double Estimation**.\nWe train an image-to-image translation network to merge the low-resolution depth range of the base with the high-resolution details of R_20. It does so without inheriting the structural inconsistencies of the high-res input. This way, we go beyond R_0 and generate more details by using R_20 as our high-resolution input. In fact, the network is so robust against low-frequency artifacts that we can even use R_20 as our high-resolution input.\n\n![merge](./figures/merge.png)\n\n##### Local Boosting\n\nNote that R20 is *bounded by the smoothest regions* in the image, while there are image patches that could support a higher resolution.\nWe choose candidate patches by tiling the image and discarding all patches without useful details (step1). The leftover patches are expanded until their edge density matches that of the image(step2). Finally, we merge a double estimation for each patch onto our R20 results and generate our final results (step3).\n\n\n|Step 1: Tile and discard | Step 2: Expand | Step 3: Merge|\n|-------------------------|----------------|--------------|\n|![patchselection](./figures/patchselection.gif)|![Patchexpand](./figures/patchexpand.gif)|![Patchexpand](./figures/patchmerge.gif)|\n\n## Setup\n\nWe Provided the implementation of our method using [MiDas-v2][1], [LeReS][2] and [SGRnet][3] as the base. Note that [MiDas-v2][1] and [SGRnet][3] estimate inverse depth while [LeReS][2] estimates depth. \n\n### Environments\nOur mergenet model is trained using torch 0.4.1 and python 3.7 and is tested with torch\u003c=1.8.\n\nDownload our mergenet model weights from [here](https://sfu.ca/~yagiz/CVPR21/latest_net_G.pth) and put it in \n\u003e .\\pix2pix\\checkpoints\\mergemodel\\latest_net_G.pth\n\nTo use [MiDas-v2][1] or [LeReS][2] as base:\nInstall dependancies as following:\n```sh\nconda install pytorch torchvision opencv cudatoolkit=10.2 -c pytorch\nconda install matplotlib\nconda install scipy\nconda install scikit-image\n```\nFor MiDaS-v2, download the model weights from [MiDas-v2][1] and put it in \n\u003e ./midas/model.pt\n\n```sh\nactivate the environment\npython run.py --Final --data_dir PATH_TO_INPUT --output_dir PATH_TO_RESULT --depthNet 0\n```\n\nFor LeReS, download the model weights from [LeReS][2] (Resnext101) and put it in root:\n\u003e ./res101.pth\n\n*** (Feb 2024): Model weight link on the LeReS website is expired. You can download it from their [HuggingFace demo](https://huggingface.co/lllyasviel/Annotators/resolve/850be791e8f704b2fa2e55ec9cc33a6ae3e28832/res101.pth) instead.   \n\n```sh\nactivate the environment\npython run.py --Final --data_dir PATH_TO_INPUT --output_dir PATH_TO_RESULT --depthNet 2\n```\n\nTo use [SGRnet][3] as base:\nInstall dependencies as following:\n```sh\nconda install pytorch=0.4.1 cuda92 -c pytorch\nconda install torchvision\nconda install matplotlib\nconda install scikit-image\npip install opencv-python\n```\nFollow the official [SGRnet][3] repository to compile the syncbn module in ./structuredrl/models/syncbn.\nDownload the model weights from [SGRnet][3] and put it in \n\u003e ./structuredrl/model.pth.tar\n\n```sh\nactivate the environment\npython run.py --Final --data_dir PATH_TO_INPUT --output_dir PATH_TO_RESULT --depthNet 1\n```\n\nDifferent input arguments can be used to generate R0 and R20 results as discussed in the paper. \n\n```python\npython run.py --R0 --data_dir PATH_TO_INPUT --output_dir PATH_TO_RESULT --depthNet #[0,1 or 2]\npython run.py --R20 --data_dir PATH_TO_INPUT --output_dir PATH_TO_RESULT --depthNet #[0,1 or 2]\n```\n\nTo generate the results with *CV.INFERNO* colormap use **--colorize_results** like the sample below:\n\n```python\npython run.py --colorize_results --Final --data_dir PATH_TO_INPUT --output_dir PATH_TO_RESULT --depthNet #[0,1 or 2]\n```\n\n### Evaluation\nFill in the needed variables in the following matlab file and run:\n\u003e./evaluation/evaluatedataset.m\n\n* **estimation_path** : path to estimated disparity maps\n* **gt_depth_path** : path to gt depth/disparity maps\n* **dataset_disp_gttype** : (true) if ground truth data is disparity and (false) if gt depth data is depth.\n* **evaluation_matfile_save_dir** : directory to save the evalution results as .mat file. \n* **superpixel_scale** : scale parameter to run the superpixels on scaled version of the ground truth images to accelarate the evaluation. use 1 for small gt images.\n\n\n### Training\n\nNavigate to [dataset preparation instructions](./dataset_prepare/mergenet_dataset_prepare.md) to download and prepare the training dataset. \n\n```sh\npython ./pix2pix/train.py --dataroot DATASETDIR --name mergemodeltrain --model pix2pix4depth --no_flip --no_dropout\n```\n```sh\npython ./pix2pix/test.py --dataroot DATASETDIR --name mergemodeleval --model pix2pix4depth --no_flip --no_dropout\n```\n\n\n## Citation\n\nThis implementation is provided for academic use only. Please cite our paper if you use this code or any of the models.\n```\n@INPROCEEDINGS{Miangoleh2021Boosting,\nauthor={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\\u{g}{\\i}z Aksoy},\ntitle={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},\njournal={Proc. CVPR},\nyear={2021},\n}\n```\n\n## Credits\n\nThe \"Merge model\" code skeleton (./pix2pix folder) was adapted from the [pytorch-CycleGAN-and-pix2pix][4] repository. \n\nFor MiDaS, LeReS and SGR inferences we used the scripts and models from [MiDas-v2][1], [LeReS][2] and [SGRnet][3] respectively (./midas, ./lib and ./structuredrl folders). \n\nThanks to [k-washi](https://github.com/k-washi) for providing us with a Google Colaboratory notebook implementation.\n\n[1]: https://github.com/intel-isl/MiDaS/tree/v2\n[2]: https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS\n[3]: https://github.com/KexianHust/Structure-Guided-Ranking-Loss\n[4]: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompphoto%2Fboostingmonoculardepth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcompphoto%2Fboostingmonoculardepth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompphoto%2Fboostingmonoculardepth/lists"}