{"id":13737218,"url":"https://github.com/nianticlabs/monodepth2","last_synced_at":"2025-05-14T14:07:53.581Z","repository":{"id":37385113,"uuid":"184593965","full_name":"nianticlabs/monodepth2","owner":"nianticlabs","description":"[ICCV 2019] Monocular depth estimation from a single image","archived":false,"fork":false,"pushed_at":"2024-08-10T22:20:36.000Z","size":10520,"stargazers_count":4295,"open_issues_count":33,"forks_count":978,"subscribers_count":66,"default_branch":"master","last_synced_at":"2025-04-09T03:11:33.806Z","etag":null,"topics":["computer-vision","deep-learning","depth-estimation","monodepth","neural-network","pytorch","self-supervision"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nianticlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-02T14:17:17.000Z","updated_at":"2025-04-09T02:31:18.000Z","dependencies_parsed_at":"2023-02-12T00:30:33.995Z","dependency_job_id":"3eec4414-d387-4a40-bfdc-0649831ae2e2","html_url":"https://github.com/nianticlabs/monodepth2","commit_stats":{"total_commits":31,"total_committers":5,"mean_commits":6.2,"dds":"0.32258064516129037","last_synced_commit":"b676244e5a1ca55564eb5d16ab521a48f823af31"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmonodepth2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmonodepth2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmonodepth2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fmonodepth2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nianticlabs","download_url":"https://codeload.github.com/nianticlabs/monodepth2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254159194,"owners_count":22024558,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","depth-estimation","monodepth","neural-network","pytorch","self-supervision"],"created_at":"2024-08-03T03:01:37.768Z","updated_at":"2025-05-14T14:07:53.558Z","avatar_url":"https://github.com/nianticlabs.png","language":"Jupyter Notebook","funding_links":[],"categories":["Topics","Jupyter Notebook","Awesome Self-Supervised Depth Estimation","Depth Estimation"],"sub_categories":["Perception","Classic Models"],"readme":"# Monodepth2\n\nThis is the reference PyTorch implementation for training and testing depth estimation models using the method described in\n\n\u003e **Digging into Self-Supervised Monocular Depth Prediction**\n\u003e\n\u003e [Clément Godard](http://www0.cs.ucl.ac.uk/staff/C.Godard/), [Oisin Mac Aodha](http://vision.caltech.edu/~macaodha/), [Michael Firman](http://www.michaelfirman.co.uk) and [Gabriel J. Brostow](http://www0.cs.ucl.ac.uk/staff/g.brostow/)\n\u003e\n\u003e [ICCV 2019 (arXiv pdf)](https://arxiv.org/abs/1806.01260)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/teaser.gif\" alt=\"example input output gif\" width=\"600\" /\u003e\n\u003c/p\u003e\n\nThis code is for non-commercial use; please see the [license file](LICENSE) for terms.\n\nIf you find our work useful in your research please consider citing our paper:\n\n```\n@article{monodepth2,\n  title     = {Digging into Self-Supervised Monocular Depth Prediction},\n  author    = {Cl{\\'{e}}ment Godard and\n               Oisin {Mac Aodha} and\n               Michael Firman and\n               Gabriel J. Brostow},\n  booktitle = {The International Conference on Computer Vision (ICCV)},\n  month = {October},\nyear = {2019}\n}\n```\n\n\n\n## ⚙️ Setup\n\nAssuming a fresh [Anaconda](https://www.anaconda.com/download/) distribution, you can install the dependencies with:\n```shell\nconda install pytorch=0.4.1 torchvision=0.2.1 -c pytorch\npip install tensorboardX==1.4\nconda install opencv=3.3.1   # just needed for evaluation\n```\nWe ran our experiments with PyTorch 0.4.1, CUDA 9.1, Python 3.6.6 and Ubuntu 18.04.\nWe have also successfully trained models with PyTorch 1.0, and our code is compatible with Python 2.7. You may have issues installing OpenCV version 3.3.1 if you use Python 3.7, we recommend to create a virtual environment with Python 3.6.6 `conda create -n monodepth2 python=3.6.6 anaconda `.\n\n\u003c!-- We recommend using a [conda environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) to avoid dependency conflicts.\n\nWe also recommend using `pillow-simd` instead of `pillow` for faster image preprocessing in the dataloaders. --\u003e\n\n\n## 🖼️ Prediction for a single image\n\nYou can predict scaled disparity for a single image with:\n\n```shell\npython test_simple.py --image_path assets/test_image.jpg --model_name mono+stereo_640x192\n```\n\nor, if you are using a stereo-trained model, you can estimate metric depth with\n\n```shell\npython test_simple.py --image_path assets/test_image.jpg --model_name mono+stereo_640x192 --pred_metric_depth\n```\n\nOn its first run either of these commands will download the `mono+stereo_640x192` pretrained model (99MB) into the `models/` folder.\nWe provide the following  options for `--model_name`:\n\n| `--model_name`          | Training modality | Imagenet pretrained? | Model resolution  | KITTI abs. rel. error |  delta \u003c 1.25  |\n|-------------------------|-------------------|--------------------------|-----------------|------|----------------|\n| [`mono_640x192`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_640x192.zip)          | Mono              | Yes | 640 x 192                | 0.115                 | 0.877          |\n| [`stereo_640x192`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_640x192.zip)        | Stereo            | Yes | 640 x 192                | 0.109                 | 0.864          |\n| [`mono+stereo_640x192`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_640x192.zip)   | Mono + Stereo     | Yes | 640 x 192                | 0.106                 | 0.874          |\n| [`mono_1024x320`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_1024x320.zip)         | Mono              | Yes | 1024 x 320               | 0.115                 | 0.879          |\n| [`stereo_1024x320`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_1024x320.zip)       | Stereo            | Yes | 1024 x 320               | 0.107                 | 0.874          |\n| [`mono+stereo_1024x320`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_1024x320.zip)  | Mono + Stereo     | Yes | 1024 x 320               | 0.106                 | 0.876          |\n| [`mono_no_pt_640x192`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_no_pt_640x192.zip)          | Mono              | No | 640 x 192                | 0.132                 | 0.845          |\n| [`stereo_no_pt_640x192`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_no_pt_640x192.zip)        | Stereo            | No | 640 x 192                | 0.130                 | 0.831          |\n| [`mono+stereo_no_pt_640x192`](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_no_pt_640x192.zip)   | Mono + Stereo     | No | 640 x 192                | 0.127                 | 0.836          |\n\nYou can also download models trained on the odometry split with [monocular](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_odom_640x192.zip) and [mono+stereo](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_odom_640x192.zip) training modalities.\n\nFinally, we provide resnet 50 depth estimation models trained with [ImageNet pretrained weights](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_resnet50_640x192.zip) and [trained from scratch](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_resnet50_no_pt_640x192.zip).\nMake sure to set `--num_layers 50` if using these.\n\n## 💾 KITTI training data\n\nYou can download the entire [raw KITTI dataset](http://www.cvlibs.net/datasets/kitti/raw_data.php) by running:\n```shell\nwget -i splits/kitti_archives_to_download.txt -P kitti_data/\n```\nThen unzip with\n```shell\ncd kitti_data\nunzip \"*.zip\"\ncd ..\n```\n**Warning:** it weighs about **175GB**, so make sure you have enough space to unzip too!\n\nOur default settings expect that you have converted the png images to jpeg with this command, **which also deletes the raw KITTI `.png` files**:\n```shell\nfind kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg \u0026\u0026 rm {}'\n```\n**or** you can skip this conversion step and train from raw png files by adding the flag `--png` when training, at the expense of slower load times.\n\nThe above conversion command creates images which match our experiments, where KITTI `.png` images were converted to `.jpg` on Ubuntu 16.04 with default chroma subsampling `2x2,1x1,1x1`.\nWe found that Ubuntu 18.04 defaults to `2x2,2x2,2x2`, which gives different results, hence the explicit parameter in the conversion command.\n\nYou can also place the KITTI dataset wherever you like and point towards it with the `--data_path` flag during training and evaluation.\n\n**Splits**\n\nThe train/test/validation splits are defined in the `splits/` folder.\nBy default, the code will train a depth model using [Zhou's subset](https://github.com/tinghuiz/SfMLearner) of the standard Eigen split of KITTI, which is designed for monocular training.\nYou can also train a model using the new [benchmark split](http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_prediction) or the [odometry split](http://www.cvlibs.net/datasets/kitti/eval_odometry.php) by setting the `--split` flag.\n\n\n**Custom dataset**\n\nYou can train on a custom monocular or stereo dataset by writing a new dataloader class which inherits from `MonoDataset` – see the `KITTIDataset` class in `datasets/kitti_dataset.py` for an example.\n\n\n## ⏳ Training\n\nBy default models and tensorboard event files are saved to `~/tmp/\u003cmodel_name\u003e`.\nThis can be changed with the `--log_dir` flag.\n\n\n**Monocular training:**\n```shell\npython train.py --model_name mono_model\n```\n\n**Stereo training:**\n\nOur code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set – see paper for details.\n```shell\npython train.py --model_name stereo_model \\\n  --frame_ids 0 --use_stereo --split eigen_full\n```\n\n**Monocular + stereo training:**\n```shell\npython train.py --model_name mono+stereo_model \\\n  --frame_ids 0 -1 1 --use_stereo\n```\n\n\n### GPUs\n\nThe code can only be run on a single GPU.\nYou can specify which GPU to use with the `CUDA_VISIBLE_DEVICES` environment variable:\n```shell\nCUDA_VISIBLE_DEVICES=2 python train.py --model_name mono_model\n```\n\nAll our experiments were performed on a single NVIDIA Titan Xp.\n\n| Training modality | Approximate GPU memory  | Approximate training time   |\n|-------------------|-------------------------|-----------------------------|\n| Mono              | 9GB                     | 12 hours                    |\n| Stereo            | 6GB                     | 8 hours                     |\n| Mono + Stereo     | 11GB                    | 15 hours                    |\n\n\n\n### 💽 Finetuning a pretrained model\n\nAdd the following to the training command to load an existing model for finetuning:\n```shell\npython train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19\n```\n\n\n### 🔧 Other training options\n\nRun `python train.py -h` (or look at `options.py`) to see the range of other training options, such as learning rates and ablation settings.\n\n\n## 📊 KITTI evaluation\n\nTo prepare the ground truth depth maps run:\n```shell\npython export_gt_depth.py --data_path kitti_data --split eigen\npython export_gt_depth.py --data_path kitti_data --split eigen_benchmark\n```\n...assuming that you have placed the KITTI dataset in the default location of `./kitti_data/`.\n\nThe following example command evaluates the epoch 19 weights of a model named `mono_model`:\n```shell\npython evaluate_depth.py --load_weights_folder ~/tmp/mono_model/models/weights_19/ --eval_mono\n```\nFor stereo models, you must use the `--eval_stereo` flag (see note below):\n```shell\npython evaluate_depth.py --load_weights_folder ~/tmp/stereo_model/models/weights_19/ --eval_stereo\n```\nIf you train your own model with our code you are likely to see slight differences to the publication results due to randomization in the weights initialization and data loading.\n\nAn additional parameter `--eval_split` can be set.\nThe three different values possible for `eval_split` are explained here:\n\n| `--eval_split`        | Test set size | For models trained with... | Description  |\n|-----------------------|---------------|----------------------------|--------------|\n| **`eigen`**           | 697           | `--split eigen_zhou` (default) or `--split eigen_full` | The standard Eigen test files |\n| **`eigen_benchmark`** | 652           | `--split eigen_zhou` (default) or `--split eigen_full`  | Evaluate with the improved ground truth from the [new KITTI depth benchmark](http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_prediction) |\n| **`benchmark`**       | 500           | `--split benchmark`        | The [new KITTI depth benchmark](http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_prediction) test files. |\n\nBecause no ground truth is available for the new KITTI depth benchmark, no scores will be reported  when `--eval_split benchmark` is set.\nInstead, a set of `.png` images will be saved to disk ready for upload to the evaluation server.\n\n\n**External disparities evaluation**\n\nFinally you can also use `evaluate_depth.py` to evaluate raw disparities (or inverse depth) from other methods by using the `--ext_disp_to_eval` flag:\n\n```shell\npython evaluate_depth.py --ext_disp_to_eval ~/other_method_disp.npy\n```\n\n\n**📷📷 Note on stereo evaluation**\n\nOur stereo models are trained with an effective baseline of `0.1` units, while the actual KITTI stereo rig has a baseline of `0.54m`. This means a scaling of `5.4` must be applied for evaluation.\nIn addition, for models trained with stereo supervision we disable median scaling.\nSetting the `--eval_stereo` flag when evaluating will automatically disable median scaling and scale predicted depths by `5.4`.\n\n\n**⤴️⤵️ Odometry evaluation**\n\nWe include code for evaluating poses predicted by models trained with `--split odom --dataset kitti_odom --data_path /path/to/kitti/odometry/dataset`.\n\nFor this evaluation, the [KITTI odometry dataset](http://www.cvlibs.net/datasets/kitti/eval_odometry.php) **(color, 65GB)** and **ground truth poses** zip files must be downloaded.\nAs above, we assume that the pngs have been converted to jpgs.\n\nIf this data has been unzipped to folder `kitti_odom`, a model can be evaluated with:\n```shell\npython evaluate_pose.py --eval_split odom_9 --load_weights_folder ./odom_split.M/models/weights_29 --data_path kitti_odom/\npython evaluate_pose.py --eval_split odom_10 --load_weights_folder ./odom_split.M/models/weights_29 --data_path kitti_odom/\n```\n\n\n## 📦 Precomputed results\n\nYou can download our precomputed disparity predictions from the following links:\n\n\n| Training modality | Input size  | `.npy` filesize | Eigen disparities                                                                             |\n|-------------------|-------------|-----------------|-----------------------------------------------------------------------------------------------|\n| Mono              | 640 x 192   | 343 MB          | [Download 🔗](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_640x192_eigen.npy)           |\n| Stereo            | 640 x 192   | 343 MB          | [Download 🔗](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_640x192_eigen.npy)         |\n| Mono + Stereo     | 640 x 192   | 343 MB          | [Download 🔗](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_640x192_eigen.npy)  |\n| Mono              | 1024 x 320  | 914 MB          | [Download 🔗](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_1024x320_eigen.npy)          |\n| Stereo            | 1024 x 320  | 914 MB          | [Download 🔗](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_1024x320_eigen.npy)        |\n| Mono + Stereo     | 1024 x 320  | 914 MB          | [Download 🔗](https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_1024x320_eigen.npy) |\n\n\n\n## 👩‍⚖️ License\nCopyright © Niantic, Inc. 2019. Patent Pending.\nAll rights reserved.\nPlease see the [license file](LICENSE) for terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fmonodepth2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnianticlabs%2Fmonodepth2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fmonodepth2/lists"}