{"id":13442422,"url":"https://github.com/megvii-research/OccDepth","last_synced_at":"2025-03-20T13:33:41.079Z","repository":{"id":66226259,"uuid":"601909043","full_name":"megvii-research/OccDepth","owner":"megvii-research","description":"Maybe the first academic open work on stereo 3D SSC method with vision-only input.","archived":false,"fork":false,"pushed_at":"2023-04-11T06:17:59.000Z","size":68152,"stargazers_count":281,"open_issues_count":11,"forks_count":23,"subscribers_count":17,"default_branch":"main","last_synced_at":"2024-10-28T05:13:22.181Z","etag":null,"topics":["camera-based","occupancy","semantic-scene-completion","stereo-camera"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/megvii-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-15T04:41:59.000Z","updated_at":"2024-10-23T13:05:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"f6b6ca23-5001-4054-807c-a1048ba5a159","html_url":"https://github.com/megvii-research/OccDepth","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2FOccDepth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2FOccDepth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2FOccDepth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2FOccDepth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/megvii-research","download_url":"https://codeload.github.com/megvii-research/OccDepth/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244619322,"owners_count":20482399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["camera-based","occupancy","semantic-scene-completion","stereo-camera"],"created_at":"2024-07-31T03:01:45.541Z","updated_at":"2025-03-20T13:33:41.065Z","avatar_url":"https://github.com/megvii-research.png","language":"Python","funding_links":[],"categories":["Python","3. Perception"],"sub_categories":["3.1.1 Vision based"],"readme":"# OccDepth: A Depth-aware Method for 3D Semantic Occupancy Network \n\n![](https://img.shields.io/badge/Ranked%20%231-Camera--Only%203D%20Semantic%20Scene%20Completion%20on%20SemanticKITTI-blue \"\")\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/occdepth-a-depth-aware-method-for-3d-semantic/3d-semantic-scene-completion-on-semantickitti)](https://paperswithcode.com/sota/3d-semantic-scene-completion-on-semantickitti?p=occdepth-a-depth-aware-method-for-3d-semantic)\n\t\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/occdepth-a-depth-aware-method-for-3d-semantic/3d-semantic-scene-completion-on-nyuv2)](https://paperswithcode.com/sota/3d-semantic-scene-completion-on-nyuv2?p=occdepth-a-depth-aware-method-for-3d-semantic)\n# News\n- **2023/03/30** Release trained models on GeForce RTX 2080 Ti.\n- **2023/02/28** Initial code release. Both Stereo images and RGB-D images inputs are supported.\n- **2023/02/28** Paper released on [Arxiv](https://arxiv.org/abs/2302.13540).\n- **2023/02/17** Demo release.\n\n# Abstract\nIn this paper, we propose the first stereo SSC method named OccDepth, which fully exploits implicit depth information from stereo images (or RGBD images) to help the recovery of 3D geometric structures. The Stereo Soft Feature Assignment (Stereo-SFA) module is proposed to better fuse 3D depth-aware features by implicitly learning the correlation between stereo images. In particular, when the input are RGBD image, a virtual stereo images can be generated through original RGB image and depth map. Besides, the Occupancy Aware Depth (OAD) module is used to obtain geometry-aware 3D features by knowledge distillation using pre-trained depth models.\n\n# Video Demo\n\nMesh results compared with ground truth on KITTI-08:\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./assets/demo.gif\" alt=\"video loading...\" /\u003e\n\u003c/p\u003e\nVoxel results compared with ground truth on KITTI-08:\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./assets/demo_voxel.gif\" alt=\"video loading...\" /\u003e\n\u003c/p\u003e\nFull demo videos can be downloaded via `git lfs pull`, the demo videos are saved as \"assets/demo.mp4\" and \"assets/demo_voxel.mp4\". \n\n# Results\n## Trained models\n\nThe trained models on GeForce RTX 2080 Ti are provided:\n| Config| dataset |IoU| mIoU |  Download |\n| :---: | :---: | :---: | :---: | :---:|\n| [config](occdepth/config/semantic_kitti/multicam_flospdepth_crp_stereodepth_cascadecls_2080ti.yaml) | SemanticKITTI | 41.60| 12.84|[model](https://drive.google.com/file/d/1MGJ_HZcuW5UpULpOeJV0M5ZrT-98j7OE/view?usp=share_link) |\n| [config](occdepth/config/NYU/multicam_flosp_crp_stereodepth_cascadecls_2080ti.yaml) | NYUv2 | 49.23| 29.34|[model](https://drive.google.com/file/d/1tBKB-J6NAxDTRTOE1hwAacRmwu57q8L8/view?usp=share_link)|\n\nNote: If you want to get better results, you should set `share_2d_backbone_gradient = false`, `backbone_2d_name = tf_efficientnet_b7_ns` and `feature = feature_2d_oc = 64 (SemanticKITTI)` which needs more GPU memory.\n## Qualitative Results\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=374 src=\"./assets/result1-1.png\"/\u003e\u003cimg width=400 src=\"./assets/result1-2.png\"/\u003e\n\n\nFig. 1: RGB based Semantic Scene Completion with/without depth-aware. (a) Our proposed OccDepth method can detect smaller and farther objects. (b) Our proposed OccDepth method complete road better.\n\u003c/div\u003e\n\n## Quantitative results on SemanticKITTI\n\n\u003cdiv align=\"center\"\u003e\nTable 1. Performance on SemanticKITTI (hidden test set). \n\n|Method            |Input        | SC  IoU       | SSC mIoU       |\n|:----------------:|:----------:|:--------------:|:--------------:|\n| **2.5D/3D**      |            |                |                |\n| LMSCNet(st)   | OCC        | 33.00          | 5.80           |\n| AICNet(st)    | RGB, DEPTH | 32.8           | 6.80           |\n| JS3CNet(st)   | PTS        | 39.30          | 9.10           |\n| **2D**           |            |                |                |\n| MonoScene        | RGB        | 34.16          | 11.08          |\n| MonoScene(st) | Stereo RGB | 40.84          | 13.57          |\n| OccDepth (ours)  | Stereo RGB | **45.10**      | **15.90**      |\n\u003c/div\u003e\nThe scene completion (SC IoU) and semantic scene completion (SSC mIoU) are reported for modified baselines (marked with \"st\") and our OccDepth.\n\n## Detailed results on SemanticKITTI.\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./assets/result2.png\"/\u003e\n\u003c/div\u003e\n\n## Compared with baselines.\n\u003cdiv align=\"center\"\u003e\n\u003cimg width=400 src=\"./assets/result3.png\"/\u003e\n\u003c/div\u003e\nBaselines of 2.5D/3D-input methods. ”∗\n” means results are cited from MonoScene. ”/”\nmeans missing results\n\n# Usage\n\n## Environment\n1. Create conda environment:\n``` bash\nconda create -y -n occdepth python=3.7\nconda activate occdepth\nconda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia\n```\n2. Install dependencies:\n``` bash\npip install -r requirements.txt\nconda install -c bioconda tbb=2020.2\n```\n\n## Preparing\n\n### SemanticKITTI\n- Download kitti odometry and semantic dataset\n    - [SemanticKITTI voxel data (700 MB).](http://www.semantic-kitti.org/assets/data_odometry_voxels.zip)\n    - [KITTI Odometry Benchmark RGB images (color, 65 GB) and KITTI Odometry Benchmark calibration data  (calibration files, 1 MB)](https://www.cvlibs.net/datasets/kitti/eval_odometry.php)\n\n- Download preprocessed depth\n   - [KITTI_Odometry_Stereo_Depth](https://drive.google.com/file/d/1eJPJ1niczagkJfEv21_RdvYBDUbpaQ0w/view?usp=sharing)\n\n- Preprocessed kitti semantic data\n    ``` bash\n    cd OccDepth/\n    python occdepth/data/semantic_kitti/preprocess.py data_root=\"/path/to/semantic_kitti\" data_preprocess_root=\"/path/to/kitti/preprocess/folder\"\n    ```\n\n\n### NYUv2\n- Download NYUv2 dataset\n    - [NYUv2](https://www.rocq.inria.fr/rits_files/computer-vision/monoscene/nyu.zip)\n\n- Preprocessed NYUv2 data\n    ``` bash\n    cd OccDepth/\n    python occdepth/data/NYU/preprocess.py data_root=\"/path/to/NYU/depthbin\"\n    data_preprocess_root=\"/path/to/NYU/preprocess/folder\"\n    ```\n### Settings\n1. Setting `DATA_LOG`, `DATA_CONFIG` in `env_{dataset}.sh`, examples:\n    ``` bash\n    ##examples\n    export DATA_LOG=$workdir/logdir/semanticKITTI\n    export DATA_CONFIG=$workdir/occdepth/config/semantic_kitti/multicam_flospdepth_crp_stereodepth_cascadecls_2080ti.yaml\n    ```\n2. Setting `data_root`, `data_preprocess_root` and `data_stereo_depth_root` in config file (occdepth/config/xxxx.yaml), examples:\n    ``` yaml\n    ##examples\n    data_root: '/data/dataset/KITTI_Odometry_Semantic'\n    data_preprocess_root: '/data/dataset/kitti_semantic_preprocess'\n    data_stereo_depth_root: '/data/dataset/KITTI_Odometry_Stereo_Depth'\n    ```\n\n## Inference\n\n``` bash\ncd OccDepth/\nsource env_{dataset}.sh\n## move the trained model to OccDepth/trained_models/occdepth.ckpt\n## 4 gpus and batch size on each gpu is 1\npython occdepth/scripts/generate_output.py n_gpus=4 batch_size_per_gpu=1\n```\n\n## Evaluation\n``` bash\ncd OccDepth/\nsource env_{dataset}.sh\n## move the trained model to OccDepth/trained_models/occdepth.ckpt\n## 1 gpu and batch size on each gpu is 1\npython occdepth/scripts/eval.py n_gpus=1 batch_size_per_gpu=1\n```\n## Training\n``` bash\ncd OccDepth/\nsource env_{dataset}.sh\n## 4 gpus and batch size on each gpu is 1\npython occdepth/scripts/train.py logdir=${DATA_LOG} n_gpus=4 batch_size_per_gpu=1\n```\n# License\nThis repository is released under the Apache 2.0 license as found in the [LICENSE](LICENSE) file.\n\n# Acknowledgements\nOur code is based on these excellent open source projects: \n- [MonoScene](https://github.com/astra-vision/MonoScene)\n- [SSC](https://github.com/waterljwant/SSC)\n- [UVTR](https://github.com/dvlab-research/UVTR)\n- [CaDDN](https://github.com/TRAILab/CaDDN)\n- [BEVDepth](https://github.com/Megvii-BaseDetection/BEVDepth)\n\nMany thanks to them!\n\n# Related Repos\n* https://github.com/wzzheng/TPVFormer\n* https://github.com/FANG-MING/occupancy-for-nuscenes\n* https://github.com/nvlabs/voxformer\n\n# Citation\nIf you find this project useful in your research, please consider cite:\n```\n@article{miao2023occdepth,\nAuthor = {Ruihang Miao and Weizhou Liu and Mingrui Chen and Zheng Gong and Weixin Xu and Chen Hu and Shuchang Zhou},\nTitle = {OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion},\njournal = {arXiv:2302.13540},\nYear = {2023},\n}\n```\n# Contact\nIf you have any questions, feel free to open an issue or contact us at miaoruihang@megvii.com, huchen@megvii.com.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegvii-research%2FOccDepth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmegvii-research%2FOccDepth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegvii-research%2FOccDepth/lists"}