{"id":13869840,"url":"https://github.com/nianticlabs/manydepth","last_synced_at":"2025-04-04T21:08:56.424Z","repository":{"id":43056155,"uuid":"351760475","full_name":"nianticlabs/manydepth","owner":"nianticlabs","description":"[CVPR 2021] Self-supervised depth estimation from short sequences","archived":false,"fork":false,"pushed_at":"2023-08-09T11:04:28.000Z","size":9947,"stargazers_count":640,"open_issues_count":36,"forks_count":85,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-03-28T20:08:45.413Z","etag":null,"topics":["cityscapes","cost-volumes","cvpr","cvpr2021","depth-estimation","depths","estimating-depths","kitti","monodepth","pytorch","self-supervised","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nianticlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-03-26T11:33:03.000Z","updated_at":"2025-03-24T09:09:21.000Z","dependencies_parsed_at":"2024-01-16T07:23:29.783Z","dependency_job_id":"8e49ec62-913b-4217-a929-84dbd42bc7cc","html_url":"https://github.com/nianticlabs/manydepth","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmanydepth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmanydepth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmanydepth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmanydepth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nianticlabs","download_url":"https://codeload.github.com/nianticlabs/manydepth/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247249527,"owners_count":20908212,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cityscapes","cost-volumes","cvpr","cvpr2021","depth-estimation","depths","estimating-depths","kitti","monodepth","pytorch","self-supervised","self-supervised-learning"],"created_at":"2024-08-05T20:01:19.248Z","updated_at":"2025-04-04T21:08:56.402Z","avatar_url":"https://github.com/nianticlabs.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth\n\n[Jamie Watson](https://scholar.google.com/citations?user=5pC7fw8AAAAJ\u0026hl=en),\n[Oisin Mac Aodha](https://homepages.inf.ed.ac.uk/omacaod/),\n[Victor Prisacariu](https://www.robots.ox.ac.uk/~victor/),\n[Gabriel J. Brostow](http://www0.cs.ucl.ac.uk/staff/g.brostow/) and\n[Michael Firman](http://www.michaelfirman.co.uk) – **CVPR 2021**\n\n[[Link to paper]](https://arxiv.org/abs/2104.14540)\n\nWe introduce ***ManyDepth***, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available.\n\n* ✅ **Self-supervised**: We train from monocular video only. No depths or poses are needed at training or test time.\n* ✅ Good depths from single frames; even better depths from **short sequences**.\n* ✅ **Efficient**: Only one forward pass at test time. No test-time optimization needed.\n* ✅ **State-of-the-art** self-supervised monocular-trained depth estimation on KITTI and CityScapes.\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca\nhref=\"https://storage.googleapis.com/niantic-lon-static/research/manydepth/manydepth_cvpr_cc.mp4\"\u003e\n  \u003cimg src=\"assets/video_thumbnail.png\" alt=\"5 minute CVPR presentation video link\" width=\"400\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n## Overview\n\nCost volumes are commonly used for estimating depths from multiple input views:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/cost_volume.jpg\" alt=\"Cost volume used for aggreagting sequences of frames\" width=\"700\" /\u003e\n\u003c/p\u003e\n\nHowever, cost volumes do not easily work with self-supervised training.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/baseline.gif\" alt=\"Baseline: Depth from cost volume input without our contributions\" width=\"700\" /\u003e\n\u003c/p\u003e\n\nIn our paper, we:\n\n* Introduce an adaptive cost volume to deal with unknown scene scales\n* Fix problems with moving objects\n* Introduce augmentations to deal with static cameras and start-of-sequence frames\n\nThese contributions enable cost volumes to work with self-supervised training:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/ours.gif\" alt=\"ManyDepth: Depth from cost volume input with our contributions\" width=\"700\" /\u003e\n\u003c/p\u003e\n\nWith our contributions, short test-time sequences give better predictions than methods which predict depth from just a single frame.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/manydepth_vs_monodepth2.jpg\" alt=\"ManyDepth vs Monodepth2 depths and error maps\" width=\"700\" /\u003e\n\u003c/p\u003e\n\n## ✏️ 📄 Citation\n\nIf you find our work useful or interesting, please cite our paper:\n\n```latex\n@inproceedings{watson2021temporal,\n    author = {Jamie Watson and\n              Oisin Mac Aodha and\n              Victor Prisacariu and\n              Gabriel Brostow and\n              Michael Firman},\n    title = {{The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth}},\n    booktitle = {Computer Vision and Pattern Recognition (CVPR)},\n    year = {2021}\n}\n```\n\n## 📈 Results\n\nOur **ManyDepth** method outperforms all previous methods in all subsections across most metrics, whether or not the baselines use multiple frames at test time.\nSee our paper for full details.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/results_table.png\" alt=\"KITTI results table\" width=\"700\" /\u003e\n\u003c/p\u003e\n\n## 👀 Reproducing Paper Results\n\nTo recreate the results from our paper, run:\n\n```bash\nCUDA_VISIBLE_DEVICES=\u003cyour_desired_GPU\u003e \\\npython -m manydepth.train \\\n    --data_path \u003cyour_KITTI_path\u003e \\\n    --log_dir \u003cyour_save_path\u003e  \\\n    --model_name \u003cyour_model_name\u003e\n```\n\nDepending on the size of your GPU, you may need to set `--batch_size` to be lower than 12. Additionally you can train\na high resolution model by adding `--height 320 --width 1024`.\n\nFor instructions on downloading the KITTI dataset, see [Monodepth2](https://github.com/nianticlabs/monodepth2)\n\nTo train a CityScapes model, run:\n\n```bash\nCUDA_VISIBLE_DEVICES=\u003cyour_desired_GPU\u003e \\\npython -m manydepth.train \\\n    --data_path \u003cyour_preprocessed_cityscapes_path\u003e \\\n    --log_dir \u003cyour_save_path\u003e  \\\n    --model_name \u003cyour_model_name\u003e \\\n    --dataset cityscapes_preprocessed \\\n    --split cityscapes_preprocessed \\\n    --freeze_teacher_epoch 5 \\\n    --height 192 --width 512\n```\n\nNote here the `--freeze_teacher_epoch 5` command - we found this to be important for Cityscapes models, due to the large number of images in the training set. \n\nThis assumes you have already preprocessed the CityScapes dataset using SfMLearner's [prepare_train_data.py](https://github.com/tinghuiz/SfMLearner/blob/master/data/prepare_train_data.py) script.\nWe used the following command:\n\n```bash\npython prepare_train_data.py \\\n    --img_height 512 \\\n    --img_width 1024 \\\n    --dataset_dir \u003cpath_to_downloaded_cityscapes_data\u003e \\\n    --dataset_name cityscapes \\\n    --dump_root \u003cyour_preprocessed_cityscapes_path\u003e \\\n    --seq_length 3 \\\n    --num_threads 8\n```\n\nNote that while we use the `--img_height 512` flag, the `prepare_train_data.py` script will save images which are `1024x384` as it also crops off the bottom portion of the image.\nYou could probably save disk space without a loss of accuracy by preprocessing with `--img_height 256 --img_width 512` (to create `512x192` images), but this isn't what we did for our experiments.\n\n## 💾 Pretrained weights and evaluation\n\nYou can download weights for some pretrained models here:\n\n* [KITTI MR (640x192)](https://storage.googleapis.com/niantic-lon-static/research/manydepth/models/KITTI_MR.zip)\n* [KITTI HR (1024x320)](https://storage.googleapis.com/niantic-lon-static/research/manydepth/models/KITTI_HR.zip)\n* [CityScapes (512x192)](https://storage.googleapis.com/niantic-lon-static/research/manydepth/models/CityScapes_MR.zip)\n\nTo evaluate a model on KITTI, run:\n\n```bash\nCUDA_VISIBLE_DEVICES=\u003cyour_desired_GPU\u003e \\\npython -m manydepth.evaluate_depth \\\n    --data_path \u003cyour_KITTI_path\u003e \\\n    --load_weights_folder \u003cyour_model_path\u003e\n    --eval_mono\n```\n\nMake sure you have first run `export_gt_depth.py` to extract ground truth files.\n\nAnd to evaluate a model on Cityscapes, run:\n\n```bash\nCUDA_VISIBLE_DEVICES=\u003cyour_desired_GPU\u003e \\\npython -m manydepth.evaluate_depth \\\n    --data_path \u003cyour_cityscapes_path\u003e \\\n    --load_weights_folder \u003cyour_model_path\u003e\n    --eval_mono \\\n    --eval_split cityscapes\n```\n\nDuring evaluation, we crop and evaluate on the middle 50% of the images.\n\nWe provide ground truth depth files [HERE](https://storage.googleapis.com/niantic-lon-static/research/manydepth/gt_depths_cityscapes.zip),\nwhich were converted from pixel disparities using intrinsics and the known baseline. Download this and unzip into `splits/cityscapes`.\n\n\nIf you want to evaluate a teacher network (i.e. the monocular network used for consistency loss), then add the flag `--eval_teacher`. This will \nload the weights of `mono_encoder.pth` and `mono_depth.pth`, which are provided for our KITTI models. \n\n## 🖼 Running on your own images\n\nWe provide some sample code in `test_simple.py` which demonstrates multi-frame inference.\nThis predicts depth for a sequence of two images cropped from a [dashcam video](https://www.youtube.com/watch?v=sF0wXxZwISw).\nPrediction also requires an estimate of the intrinsics matrix, in json format.\nFor the provided test images, we have estimated the intrinsics to be equivalent to those of the KITTI dataset.\nNote that the intrinsics provided in the json file are expected to be in [normalised coordinates](https://github.com/nianticlabs/monodepth2/issues/6#issuecomment-494407590).\n\nDownload and unzip model weights from one of the links above, and then run the following command:\n\n```bash\npython -m manydepth.test_simple \\\n    --target_image_path assets/test_sequence_target.jpg \\\n    --source_image_path assets/test_sequence_source.jpg \\\n    --intrinsics_json_path assets/test_sequence_intrinsics.json \\\n    --model_path path/to/weights\n```\n\nA predicted depth map rendering will be saved to `assets/test_sequence_target_disp.jpeg`.\n\n## 👩‍⚖️ License\n\nCopyright © Niantic, Inc. 2021. Patent Pending.\nAll rights reserved.\nPlease see the [license file](LICENSE) for terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fmanydepth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnianticlabs%2Fmanydepth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fmanydepth/lists"}