{"id":25440231,"url":"https://github.com/jiawangbian/sc_depth_pl","last_synced_at":"2025-04-05T15:07:23.182Z","repository":{"id":38408721,"uuid":"410431484","full_name":"JiawangBian/sc_depth_pl","owner":"JiawangBian","description":"SC-Depth (V1, V2, and V3) for Unsupervised Monocular Depth Estimation  Webpage:https://jiawangbian.github.io/sc_depth_pl/","archived":false,"fork":false,"pushed_at":"2023-10-06T08:29:41.000Z","size":92608,"stargazers_count":458,"open_issues_count":28,"forks_count":72,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-29T14:09:00.967Z","etag":null,"topics":["computer-vision","deep-learning","depth-estimation","pose-estimation","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JiawangBian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-26T02:40:07.000Z","updated_at":"2025-03-29T04:56:10.000Z","dependencies_parsed_at":"2025-02-17T11:40:45.432Z","dependency_job_id":null,"html_url":"https://github.com/JiawangBian/sc_depth_pl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiawangBian%2Fsc_depth_pl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiawangBian%2Fsc_depth_pl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiawangBian%2Fsc_depth_pl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JiawangBian%2Fsc_depth_pl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JiawangBian","download_url":"https://codeload.github.com/JiawangBian/sc_depth_pl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247353745,"owners_count":20925329,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","depth-estimation","pose-estimation","self-supervised-learning"],"created_at":"2025-02-17T11:29:59.267Z","updated_at":"2025-04-05T15:07:23.149Z","avatar_url":"https://github.com/JiawangBian.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SC_Depth:\n\nThis repo provides the pytorch lightning implementation of **SC-Depth** (V1, V2, and V3) for **self-supervised learning of monocular depth from video**.\n\nIn the SC-DepthV1 ([IJCV 2021](https://jwbian.net/Papers/SC_Depth_IJCV_21.pdf) \u0026 [NeurIPS 2019](https://papers.nips.cc/paper/2019/file/6364d3f0f495b6ab9dcf8d3b5c6e0b01-Paper.pdf)), we propose (i) **geometry consistency loss** for scale-consistent depth prediction over time and (ii) **self-discovered mask** for detecting and removing dynamic regions and occlusions during training towards higher accuracy. The predicted depth is sufficiently accurate and consistent for use in the ORB-SLAM2 system. The below video showcases the estimated depth in the form of pointcloud (top) and color map (bottom right).\n\n[\u003cimg src=\"https://jwbian.net/wp-content/uploads/2020/06/77CXZX@H37PIWDBX0R7T.png\" width=\"600\"\u003e](https://www.youtube.com/watch?v=OkfK3wmMnpo)\n\nIn the SC-DepthV2 ([TPMAI 2022](https://arxiv.org/abs/2006.02708v2)), we prove that the large relative rotational motions in the hand-held camera captured videos is the main challenge for unsupervised monocular depth estimation in indoor scenes. Based on this findings, we propose auto-recitify network (**ARN**) to handle the large relative rotation between consecutive video frames. It is integrated into SC-DepthV1 and jointly trained with self-supervised losses, greatly boosting the performance.\n\n\u003cimg src=\"https://jwbian.net/wp-content/uploads/2020/06/vis_depth.png\" width=\"600\"\u003e\n\nIn the SC-DepthV3 ([TPAMI 2023](https://arxiv.org/abs/2211.03660)), we propose a robust learning framework for accurate and sharp monocular depth estimation in (highly) dynamic scenes. As the photometric loss, which is the main loss in the self-supervised methods, is not valid in dynamic object regions and occlusion, previous methods show poor accuracy in dynamic scenes and blurred depth prediction at object boundaries. We propose to leverage an external pretrained depth estimation network for generating the single-image depth prior, based on which we propose effective losses to constrain self-supervised depth learning. The evaluation results on six challenging datasets including both static and dynamic scenes demonstrate the efficacy of the proposed method.\n\nQualitative depth estimation results: DDAD, BONN, TUM, IBIMS-1\n\n\u003cimg src=\"https://jwbian.net/Demo/vis_ddad.jpg\" width=\"400\"\u003e \u003cimg src=\"https://jwbian.net/Demo/vis_bonn.jpg\" width=\"400\"\u003e \u003cimg src=\"https://jwbian.net/Demo/vis_tum.jpg\" width=\"400\"\u003e \u003cimg src=\"https://jwbian.net/Demo/vis_ibims.jpg\" width=\"400\"\u003e\n\nDemo Videos\n\n\n\nhttps://user-images.githubusercontent.com/11647217/201716221-94fb20ec-0947-4ea0-b83e-572ffa9a46b5.mp4\n\n\n\n\u003cimg align=\"left\" src=\"https://user-images.githubusercontent.com/11647217/201711956-7d2c2f48-8d3c-4c05-9402-9e4115e4b5d7.mp4\" width=\"400\"\u003e \n\u003cimg align=\"left\" src=\"https://user-images.githubusercontent.com/11647217/201712014-decd56ba-16eb-4772-90fb-200d489c309c.mp4\" width=\"400\"\u003e \n\n\n\n\n\n\n\n## Install\n```\nconda create -n sc_depth_env python=3.8\nconda activate sc_depth_env\nconda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia\npip install -r requirements.txt\n```\n\n## Dataset\n\nWe organize the video datasets into the following format for training and testing models:\n\n    Dataset\n      -Training\n        --Scene0000\n          ---*.jpg (list of color images)\n          ---cam.txt (3x3 camera intrinsic matrix)\n          ---depth (a folder containing ground-truth depth maps, optional for validation)\n          ---leres_depth (a folder containing psuedo-depth generated by LeReS, it is required for training SC-DepthV3)\n        --Scene0001\n        ...\n        train.txt (containing training scene names)\n        val.txt (containing validation scene names)\n      -Testing\n        --color (containg testing images)\n        --depth (containg ground-truth depths)\n        --seg_mask (containing semantic segmentation masks for depth evaluation on dynamic/static regions)\n\nWe provide pre-processed datasets:\n\n[**[kitti, nyu, ddad, bonn, tum]**](https://1drv.ms/u/s!AiV6XqkxJHE2mUFwH6FrHGCuh_y6?e=RxOheF) \n\n\n## Training\n\nWe provide a bash script (\"scripts/run_train.sh\"), which shows how to train on kitti, nyu, and datasets. Generally, you need edit the config file (e.g., \"configs/v1/kitti.txt\") based on your devices and run\n```bash\npython train.py --config $CONFIG --dataset_dir $DATASET\n```\nThen you can start a `tensorboard` session in this folder by running\n```bash\ntensorboard --logdir=ckpts/\n```\nBy opening [https://localhost:6006](https://localhost:6006) on your browser, you can watch the training progress.  \n\n\n## Train on Your Own Data\n\nYou need re-organize your own video datasets according to the above mentioned format for training. Then, you may meet three problems: (1) no ground-truth depth for validation; (2) hard to choose an appropriate frame rate (FPS) to subsample videos; (3) no pseudo-depth for training V3.\n\n### No GT depth for validation\nAdd \"--val_mode photo\" in the training script or the configure file, which uses the photometric loss for validation. \n```bash\npython train.py --config $CONFIG --dataset_dir $DATASET --val_mode photo\n```\n\n### Subsample video frames (to have sufficient motion) for training \nWe provide a script (\"generate_valid_frame_index.py\"), which computes and saves a \"frame_index.txt\" in each training scene. It uses the opencv-based optical flow method to compute the camera shift in consecutive frames. You might need to change the parameters for detecting sufficient keypoints in your images if necessary (usually you do not need). Once you prepare your dataset as the above-mentioned format, you can call it by running\n```bash\npython generate_valid_frame_index.py --dataset_dir $DATASET\n```\nThen, you can add \"--use_frame_index\" in the training script or the configure file to train models on the filtered frames.\n```bash\npython train.py --config $CONFIG --dataset_dir $DATASET --use_frame_index\n```\n\n### Generating Pseudo-depth for training V3\n\nWe use the [LeReS](https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS) to generate pseudo-depth in this project. You need to install it and generate pseudo-depth for your own images (the pseudo-depth for standard datasets have been provided above). More specifically, you can refer to the code in [this line](https://github.com/aim-uofa/AdelaiDepth/blob/803abcfc186b5cda73c5ca4c369f350e44a8ae1b/LeReS/Minist_Test/tools/test_shape.py#L134) for saving the pseudo-depth.\n\nBesides, it is also possible to use other state-of-the-art monocular depth estimation models to generate psuedo-depth, such as [DPT](https://github.com/isl-org/DPT).\n\n\n## Pretrained models\n\n[**[Models]**](https://1drv.ms/u/s!AiV6XqkxJHE2mULfSmi4yy-_JHSm?e=s97YRM) \n\nYou need uncompress and put it into \"ckpts\" folder. Then you can run \"scripts/run_test.sh\" or \"scripts/run_inference.sh\" with the pretrained model. \n\nFor v1, we provide models trained on KITTI and DDAD.\n\nFor v2, we provide models trained on NYUv2.\n\nFor v3, we provide models trained on KITTI, NYUv2, DDAD, BONN, and TUM.\n\n\n## Testing (Evaluation on Full Images)\n\nWe provide the script (\"scripts/run_test.sh\"), which shows how to test on kitti, nyu, and ddad datasets. The script only evaluates depth accuracy on full images. See the next section for an evaluation of depth estimation on dynamic/static regions, separately.\n\n    python test.py --config $CONFIG --dataset_dir $DATASET --ckpt_path $CKPT\n    \n\n## Demo\n\nA simple demo is given here. You can put your images in \"demo/input/\" folder and run\n```bash\npython inference.py --config configs/v3/nyu.txt \\\n--input_dir demo/input/ \\\n--output_dir demo/output/ \\\n--ckpt_path ckpts/nyu_scv3/epoch=93-val_loss=0.1384.ckpt \\\n--save-vis --save-depth\n```\nYou will see the results saved in \"demo/output/\" folder.\n\n\n## Evaluation on dynamic/static regions\n\nYou need to use (\"scripts/run_inference.sh\") firstly to save the predicted depth, and then you can use the (\"scripts/run_evaluation.sh\") for doing evaluation. A demo on DDAD dataset is provided in these files. Generally, you need do\n\n### Inference\n```bash\npython inference.py --config $YOUR_CONFIG \\\n--input_dir $TESTING_IMAGE_FOLDER \\\n--output_dir $RESULTS_FOLDER \\\n--ckpt_path $YOUR_CKPT \\\n--save-vis --save-depth\n```\n\n### Evaluation\n```bash\npython eval_depth.py \\\n--dataset $DATASET_FOLDER \\\n--pred_depth=$RESULTS_FOLDER \\\n--gt_depth=$GT_FOLDER \\\n--seg_mask=$SEG_MASK_FOLDER\n```\n\n\n## References\n\n#### SC-DepthV1:\n**Unsupervised Scale-consistent Depth Learning from Video (IJCV 2021)** \\\n*Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid* \n[**[paper]**](https://jwbian.net/Papers/SC_Depth_IJCV_21.pdf)\n```\n@article{bian2021ijcv, \n  title={Unsupervised Scale-consistent Depth Learning from Video}, \n  author={Bian, Jia-Wang and Zhan, Huangying and Wang, Naiyan and Li, Zhichao and Zhang, Le and Shen, Chunhua and Cheng, Ming-Ming and Reid, Ian}, \n  journal= {International Journal of Computer Vision (IJCV)}, \n  year={2021} \n}\n```\nwhich is an extension of the previous conference version:\n\n**Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video (NeurIPS 2019)** \\\n*Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid* \n[**[paper]**](https://papers.nips.cc/paper/2019/file/6364d3f0f495b6ab9dcf8d3b5c6e0b01-Paper.pdf)\n```\n@inproceedings{bian2019neurips,\n  title={Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video},\n  author={Bian, Jiawang and Li, Zhichao and Wang, Naiyan and Zhan, Huangying and Shen, Chunhua and Cheng, Ming-Ming and Reid, Ian},\n  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},\n  year={2019}\n}\n```\n\n#### SC-DepthV2:\n**Auto-Rectify Network for Unsupervised Indoor Depth Estimation (TPAMI 2022)** \\\n*Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua Shen, Ian Reid*\n[**[paper]**](https://arxiv.org/abs/2006.02708v2)\n```\n@article{bian2021tpami, \n  title={Auto-Rectify Network for Unsupervised Indoor Depth Estimation}, \n  author={Bian, Jia-Wang and Zhan, Huangying and Wang, Naiyan and Chin, Tat-Jin and Shen, Chunhua and Reid, Ian}, \n  journal= {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, \n  year={2021} \n}\n```\n\n#### SC-DepthV3:\n**SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes (TPAMI 2023)** \\\n*Libo Sun\\*, Jia-Wang Bian\\*, Huangying Zhan, Wei Yin, Ian Reid, Chunhua Shen*\n[**[paper]**](https://arxiv.org/abs/2211.03660) \\\n\\* denotes equal contribution and joint first author\n```\n@article{sc_depthv3, \n  title={SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes}, \n  author={Sun, Libo and Bian, Jia-Wang and Zhan, Huangying and Yin, Wei and Reid, Ian and Shen, Chunhua}, \n  journal= {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, \n  year={2023} \n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiawangbian%2Fsc_depth_pl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjiawangbian%2Fsc_depth_pl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiawangbian%2Fsc_depth_pl/lists"}