{"id":15108012,"url":"https://github.com/facebookresearch/consistent_depth","last_synced_at":"2025-10-23T02:31:47.271Z","repository":{"id":37383640,"uuid":"261504203","full_name":"facebookresearch/consistent_depth","owner":"facebookresearch","description":"We estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.","archived":true,"fork":false,"pushed_at":"2023-07-23T15:08:54.000Z","size":7673,"stargazers_count":1636,"open_issues_count":56,"forks_count":236,"subscribers_count":55,"default_branch":"main","last_synced_at":"2025-01-10T17:20:54.160Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-05-05T15:04:53.000Z","updated_at":"2025-01-10T09:15:55.000Z","dependencies_parsed_at":"2024-02-07T13:14:21.749Z","dependency_job_id":null,"html_url":"https://github.com/facebookresearch/consistent_depth","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fconsistent_depth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fconsistent_depth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fconsistent_depth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fconsistent_depth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/consistent_depth/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237769067,"owners_count":19363250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-25T21:43:53.781Z","updated_at":"2025-10-23T02:31:41.205Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","readme":" # [SIGGRAPH 2020] Consistent Video Depth Estimation\n \n[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1i5_uVHWOJlh2adRFT5BuDhoRftq9Oosx#scrollTo=lNc6HHfHDfnE)\n \n ### [[Paper](https://arxiv.org/abs/2004.15021)] [[Project Website](https://roxanneluo.github.io/Consistent-Video-Depth-Estimation/)] [[Google Colab](https://colab.research.google.com/drive/1i5_uVHWOJlh2adRFT5BuDhoRftq9Oosx#scrollTo=lNc6HHfHDfnE)]\n\n\u003cp align='center'\u003e\n\u003cimg src=\"thumbnail.gif\" width='100%'/\u003e\n\u003c/p\u003e\n\nWe present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video. We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video. Unlike the ad-hoc priors in classical reconstruction, we use a learning-based prior, i.e., a convolutional neural network trained for single-image depth estimation. At test time, we fine-tune this network to satisfy the geometric constraints of a particular input video, while retaining its ability to synthesize plausible depth details in parts of the video that are less constrained. We show through quantitative validation that our method achieves higher accuracy and a higher degree of geometric consistency than previous monocular reconstruction methods. Visually, our results appear more stable. Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion. The improved quality of the reconstruction enables several applications, such as scene reconstruction and advanced video-based visual effects.\n\u003cbr/\u003e\n\n**Consistent Video Despth Estimation**\n\u003cbr/\u003e\n[Xuan Luo](https://roxanneluo.github.io), \n[Jia-Bin Huang](https://filebox.ece.vt.edu/~jbhuang/), \n[Richard Szeliski](http://szeliski.org/RichardSzeliski.htm), \n[Kevin Matzen](https://www.linkedin.com/in/kevin-matzen-b3714414/), and\n[Johannes Kopf](https://johanneskopf.de/)\n\u003cbr/\u003e\nIn SIGGRAPH 2020.\n\n \n# Prerequisite\n- Pull third-party packages.\n  ```\n  git submodule update --init --recursive\n  ```\n- Install python packages.\n  ```\n  conda create -n consistent_depth python=3.6\n  conda activate consistent_depth\n  ./scripts/install.sh\n  ```\n- [FFmpeg](http://ffmpeg.org)\n- Install COLMAP following https://colmap.github.io/install.html. Note **[COLMAP \u003e= 3.6](https://github.com/colmap/colmap/releases)** is required to exclude [extracting features](https://colmap.github.io/faq.html#mask-image-regions) on dynamic objects. \n  If you are using Ubuntu, you can install COLMAP by [`./scripts/install_colmap_ubuntu.sh`](scripts/install_colmap_ubuntu.sh).\n \n\n# Quick Start\nYou can run the following demo **without** installing **COLMAP**.\nThe demo takes 37 min when tested on one NVIDIA GeForce RTX 2080 GPU. \n- Download models and the demo video together with its precomputed COLMAP results. \n  ```\n  ./scripts/download_model.sh\n  ./scripts/download_demo.sh results/ayush\n  ```\n- Run\n  ```\n  python main.py --video_file data/videos/ayush.mp4 --path results/ayush \\\n    --camera_params \"1671.770118, 540, 960\" --camera_model \"SIMPLE_PINHOLE\" \\\n    --make_video\n  ```\n  where `1671.770118, 540, 960` is camera intrinsics (`f, cx, cy`) and `SIMPLE_PINHOLE` is the [camera model](https://colmap.github.io/cameras.html).\n- You can inspect the test-time training process by \n  ```\n  tensorboard --logdir results/ayush/R_hierarchical2_mc/B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam/tensorboard/ \n  ```\n- You can find your results as below.\n  ```\n  results/ayush/R_hierarchical2_mc\n    videos/\n      color_depth_mc_depth_colmap_dense_B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam.mp4    # comparison of disparity maps from mannequin challenge, COLMAP and ours\n    B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam/\n      depth/                      # final disparity maps\n      checkpoints/0020.pth        # final checkpoint\n      eval/                       # disparity maps and losses after each epoch of training\n  ```\n  Expected output can be found [here](https://www.dropbox.com/sh/zsvmbc5iy2br8ol/AAAcEo5M9KYBSN7aiAuSPttka?dl=0).\n    Your results can be different due to randomness in the test-time training process. \n\nThe demo runs everything including flow estimation, test-time training, etc. except the COLMAP part for quick demonstration and ease of installation.\n To enable testing the COLMAP part, you can delete `results/ayush/colmap_dense` and  `results/ayush/depth_colmap_dense`.\n And then run the python command above again. \n\n# Customized Run:\nPlease refer to [`params.py`](params.py) or run `python main.py --help` for the full list of parameters. \nHere I demonstrate some examples for common usage of the system.\n  \n### Run on Your Own Videos\n- Place your video file at `$video_file_path`. \n- [Optional] Calibrate camera using [`PINHOLE` (fx, fy, cx, cy) or `SIMPLE_PINHOLE` (f, cx, cy) model](https://colmap.github.io/cameras.html). \nCamera intrinsics calibration is optional but suggested for more accurate and faster camera registration. \nWe typically calibrate the camera by capturing a video of a textured plane with really slow camera motion while trying to let target features\ncover the full field of view, selecting non-blurry frames, running **COLMAP** on these images.\n- Run \n    - Run without camera calibration.\n      ```\n      python main.py --video_file $video_file_path --path $output_path --make_video\n      ```\n    - Run with camera calibration. For instance, run with `PINHOLE` model and `fx, fy, cx, cy = 1660.161322, 1600, 540, 960`\n      ```\n      python main.py --video_file $video_file_path --path $output_path \\\n        --camera_model \"PINHOLE\" --camera_params \"1660.161322, 1600, 540, 960\" \\\n        --make_video\n      ```\n    - You can also specify backend monocular depth estimation network by\n      ```\n      python main.py --video_file $video_file_path --path $output_path \\\n        --camera_model \"PINHOLE\" --camera_params \"1660.161322, 1600, 540, 960\" \\\n        --make_video --model_type \"${model_type}\"\n      ```\n      The supported model types are `mc` ([Mannequin Challenge by Zhang et al. 2019](https://github.com/google/mannequinchallenge)),\n      , `midas2` ([MiDaS by Ranftl el al. 2019](https://github.com/intel-isl/MiDaS)) \n      and `monodepth2` ([Monodepth2 by Godard et al. 2019](https://github.com/nianticlabs/monodepth2)).\n       \n### Run with Precomputed Camera Poses\nWe rely on **COLMAP** to for camera pose registration. If you have precomputed camera poses instead, \nyou can provide them to the system in folder `$path` as follows.\n(Example file structure of `$path` see [here](https://www.dropbox.com/sh/tdmhdesotk8ph4w/AAAV3wQodMMYjJ0NaJXwkWh1a?dl=0).)\n- Save your color images as [`color_full/frame_%06d.png`](https://www.dropbox.com/sh/5zsmtity0punwjp/AABN4WdU2H2PVgjUfy3Ehwura?dl=0).\n- Create `frame.txt` of format (example see [here](https://www.dropbox.com/s/1hmuvm4njledahx/frames.txt?dl=0)):\n  ```\n  number_of_frames\n  width\n  height\n  frame_000000_timestamp_in_seconds\n  frame_000001_timestamp_in_seconds\n  ...\n  ```\n- Convert your camera pose to COLMAP sparse reconstruction format following [this](https://colmap.github.io/format.html#text-format).\n  Put your `images.txt`, `cameras.txt` and `points3D.txt` (or `.bin`) under [`colmap_dense/pose_init/`](https://www.dropbox.com/sh/4f5t0tlvvmay9a3/AABVO1zNCPf7OQn3yDqdAoO3a?dl=0).\n  Note that the `POINTS2D` in `images.txt` and the `points3D.txt` can be empty.\n- Run.\n  ```\n  python main.py --path $path --initialize_pose\n  ``` \n  \n### Mask out Dynamic Object for Camera Pose Estimation\nTo get better pose for dynamic scene, you can mask out dynamic objects when extracting features with **COLMAP**. \nNote **[COLMAP \u003e= 3.6](https://github.com/colmap/colmap/releases)** is required to [extract features in masked regions](https://colmap.github.io/faq.html#mask-image-regions). \n- Extract frames \n  ```\n  python main.py --video_file $video_file_path --path $output_path --op extract_frames\n  ```\n\n- Run your favourite segmentation method (e.g., [Mask-RCNN](https://github.com/facebookresearch/detectron2)) \non images in `$output_path/color_full` to extract binary mask for dynamic objects (e.g., human). \nNo features will be extracted in regions, where the mask image is black (pixel intensity value 0 in grayscale).\nFollowing [COLMAP document](https://colmap.github.io/faq.html#mask-image-regions),\nsave the mask of frame `$output_path/color_full/frame_000010.png`, for instance, at `$output_path/mask/frame_000010.png.png`.\n\n- Run the rest of the pipeline.\n  ```\n  python main.py --path $output_path --mask_path $output_path/mask \\\n    --camera_model \"${camera_model}\" --camera_params \"${camera_intrinsics}\" \\\n    --make_video\n  ``` \n\n# Result Folder Structure\nThe result folder is of the following structure. Lots of files are saved only for debugging purposes. \n```\nframes.txt              # meta data about number of frames, image resolution and timestamps for each frame\ncolor_full/             # extracted frames in the original resolution\ncolor_down/             # extracted frames in the resolution for disparity estimation \ncolor_down_png/      \ncolor_flow/             # extracted frames in the resolution for flow estimation\nflow_list.json          # indices of frame pairs to finetune the model with\nflow/                   # optical flow \nmask/                   # mask of consistent flow estimation between frame pairs.\nvis_flow/               # optical flow visualization. Green regions contain inconsistent flow. \nvis_flow_warped/        # visualzing flow accuracy by warping one frame to another using the estimated flow. e.g., frame_000000_000032_warped.png warps frame_000032 to frame_000000.\ncolmap_dense/           # COLMAP results\n    metadata.npz        # camera intrinsics and extrinsics converted from COLMAP sparse reconstruction.\n    sparse/             # COLMAP sparse reconstruction\n    dense/              # COLMAP dense reconstruction\ndepth_colmap_dense/     # COLMAP dense depth maps converted to disparity maps in .raw format\ndepth_${model_type}/    # initial disparity estimation using the original monocular depth model before test-time training\nR_hierarchical2_${model_type}/ \n    flow_list_0.20.json                 # indices of frame pairs passing overlap ratio test of threshold 0.2. Same content as ../flow_list.json.\n    metadata_scaled.npz                 # camera intrinsics and extrinsics after scale calibration. It is the camera parameters used in the test-time training process.\n    scales.csv                          # frame indices and corresponding scales between initial monocular disparity estimation and COLMAP dense disparity maps.\n    depth_scaled_by_colmap_dense/       # monocular disparity estimation scaled to match COLMAP disparity results\n    vis_calibration_dense/              # for debugging scale calibration. frame_000000_warped_to_000029.png warps frame_000000 to frame_000029 by scaled camera translations and disparity maps from initial monocular depth estimation.\n    videos/                             # video visualization of results \n    B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam/\n        checkpoints/                    # checkpoint after each epoch\n        depth/                          # final disparity map results after finishing test-time training\n        eval/                           # intermediate losses and disparity maps after each epoch \n        tensorboard/                    # tensorboard log for the test-time training process\n```\n\n# Citation\nIf you find our code useful, please consider citing our paper:\n```\n@article{Luo-VideoDepth-2020,\n  author    = {Luo, Xuan and Huang, Jia{-}Bin and Szeliski, Richard and Matzen, Kevin and Kopf, Johannes},\n  title     = {Consistent Video Depth Estimation},\n  booktitle = {ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH)},\n  publisher = {ACM},\n  volume = {39},\n  number = {4},\n  year = {2020}\n}\n```\n\n# License\nThis work is licensed under MIT License. See [LICENSE](LICENSE) for details.\n\n# Acknowledgments\nWe would like to thank Patricio Gonzales Vivo, Dionisio Blanco, and Ocean Quigley for creating the artistic effects in the accompanying video. \nWe thank True Price for his practical and insightful advice on reconstruction and Ayush Saraf for his suggestions in engineering.\n","funding_links":[],"categories":["Topics"],"sub_categories":["Perception"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fconsistent_depth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Fconsistent_depth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fconsistent_depth/lists"}