{"id":18600794,"url":"https://github.com/autonomousvision/monosdf","last_synced_at":"2025-04-05T06:09:48.904Z","repository":{"id":44888508,"uuid":"499115480","full_name":"autonomousvision/monosdf","owner":"autonomousvision","description":"[NeurIPS'22] MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction","archived":false,"fork":false,"pushed_at":"2023-05-07T00:14:27.000Z","size":28537,"stargazers_count":583,"open_issues_count":28,"forks_count":53,"subscribers_count":33,"default_branch":"main","last_synced_at":"2025-03-29T05:11:14.330Z","etag":null,"topics":["3d-reconstruction","implicit-neural-representation","multi-resolution-grids","multi-view-reconstruction","scene-representations","sdf","surface-reconstruction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/autonomousvision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-02T11:53:42.000Z","updated_at":"2025-03-26T13:02:44.000Z","dependencies_parsed_at":"2025-03-29T05:10:28.465Z","dependency_job_id":null,"html_url":"https://github.com/autonomousvision/monosdf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Fmonosdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Fmonosdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Fmonosdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autonomousvision%2Fmonosdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/autonomousvision","download_url":"https://codeload.github.com/autonomousvision/monosdf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294541,"owners_count":20915340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-reconstruction","implicit-neural-representation","multi-resolution-grids","multi-view-reconstruction","scene-representations","sdf","surface-reconstruction"],"created_at":"2024-11-07T02:05:36.999Z","updated_at":"2025-04-05T06:09:48.868Z","avatar_url":"https://github.com/autonomousvision.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n\n  \u003ch1 align=\"center\"\u003eMonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://niujinshuchong.github.io/\"\u003eZehao Yu\u003c/a\u003e\n    ·\n    \u003ca href=\"https://pengsongyou.github.io/\"\u003eSongyou Peng\u003c/a\u003e\n    ·\n    \u003ca href=\"https://m-niemeyer.github.io/\"\u003eMichael Niemeyer\u003c/a\u003e\n    ·\n    \u003ca href=\"https://tsattler.github.io/\"\u003eTorsten Sattler\u003c/a\u003e\n    ·\n    \u003ca href=\"http://www.cvlibs.net/\"\u003eAndreas Geiger\u003c/a\u003e\n\n  \u003c/p\u003e\n  \u003ch2 align=\"center\"\u003eNeurIPS 2022\u003c/h2\u003e\n  \u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2206.00665\"\u003ePaper\u003c/a\u003e | \u003ca href=\"https://niujinshuchong.github.io/monosdf/\"\u003eProject Page\u003c/a\u003e | \u003ca href=\"https://autonomousvision.github.io/sdfstudio/\"\u003eSDFStudio\u003c/a\u003e \u003c/h3\u003e\n  \u003cdiv align=\"center\"\u003e\u003c/div\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003e\n    \u003cimg src=\"./media/teaser.gif\" alt=\"Logo\" width=\"95%\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\nWe demonstrate that state-of-the-art depth and normal cues extracted from monocular images are complementary to reconstruction cues and hence significantly improve the performance of implicit surface reconstruction methods. \n\u003c/p\u003e\n\u003cbr\u003e\n\n# Update\nMonoSDF is integrated to [SDFStudio](https://github.com/autonomousvision/sdfstudio), where monocular depth and normal cues can be applied to [UniSurf](https://github.com/autonomousvision/unisurf/tree/main/model) and [NeuS](https://github.com/Totoro97/NeuS/tree/main/models). Please check it out.\n\n# Setup\n\n## Installation\nClone the repository and create an anaconda environment called monosdf using\n```\ngit clone git@github.com:autonomousvision/monosdf.git\ncd monosdf\n\nconda create -y -n monosdf python=3.8\nconda activate monosdf\n\nconda install pytorch torchvision cudatoolkit=11.3 -c pytorch\nconda install cudatoolkit-dev=11.3 -c conda-forge\n\npip install -r requirements.txt\n```\nThe hash encoder will be compiled on the fly when running the code.\n\n## Dataset\nFor downloading the preprocessed data, run the following script. The data for the DTU, Replica, Tanks and Temples is adapted from [VolSDF](https://github.com/lioryariv/volsdf), [Nice-SLAM](https://github.com/cvg/nice-slam), and [Vis-MVSNet](https://github.com/jzhangbs/Vis-MVSNet), respectively.\n```\nbash scripts/download_dataset.sh\n```\n# Training\n\nRun the following command to train monosdf:\n```\ncd ./code\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf CONFIG  --scan_id SCAN_ID\n```\nwhere CONFIG is the config file in `code/confs`, and SCAN_ID is the id of the scene to reconstruct.\n\nWe provide example commands for training DTU, ScanNet, and Replica dataset as follows:\n```\n# DTU scan65\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/dtu_mlp_3views.conf  --scan_id 65\n\n# ScanNet scan 1 (scene_0050_00)\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/scannet_mlp.conf  --scan_id 1\n\n# Replica scan 1 (room0)\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/replica_mlp.conf  --scan_id 1\n```\n\nWe created individual config file on Tanks and Temples dataset so you don't need to set the scan_id. Run training on the courtroom scene as:\n```\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_mlp_1.conf\n```\n\nWe also generated high resolution monocular cues on the courtroom scene and it's better to train with more gpus. First download the dataset\n```\nbash scripts/download_highres_TNT.sh\n```\n\nThen run training with 8 gpus:\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 python -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_highres_grids_courtroom.conf\n```\nOf course, you can also train on all other scenes with multi-gpus.\n\n# Evaluations\n\n## DTU\nFirst, download the ground truth DTU point clouds:\n```\nbash scripts/download_dtu_ground_truth.sh\n```\nthen you can evaluate the quality of extracted meshes (take scan 65 for example):\n```\npython evaluate_single_scene.py --input_mesh scan65_mesh.ply --scan_id 65 --output_dir dtu_scan65\n```\n\nWe also provide script for evaluating all DTU scenes:\n```\npython evaluate.py\n```\nEvaluation results will be saved to ```evaluation/DTU.csv``` by default, please check the script for more details.\n\n## Replica\nEvaluate on one scene (take scan 1 room0 for example)\n```\ncd replica_eval\npython evaluate_single_scene.py --input_mesh replica_scan1_mesh.ply --scan_id 1 --output_dir replica_scan1\n```\n\nWe also provided script for evaluating all Replica scenes:\n```\ncd replica_eval\npython evaluate.py\n```\nplease check the script for more details.\n\n## ScanNet\n```\ncd scannet_eval\npython evaluate.py\n```\nplease check the script for more details.\n\n## Tanks and Temples\nYou need to submit the reconstruction results to the [official evaluation server](https://www.tanksandtemples.org), please follow their guidance. We also provide an example of our submission [here](https://drive.google.com/file/d/1Cr-UVTaAgDk52qhVd880Dd8uF74CzpcB/view?usp=sharing) for reference.\n\n# Custom dataset\nWe provide an example of how to train monosdf on custom data (Apartment scene from nice-slam). First, download the dataset and run the script to subsample training images, normalize camera poses, and etc.\n```\nbash scripts/download_apartment.sh \ncd preprocess\npython nice_slam_apartment_to_monosdf.py\n```\n\nThen, we can extract monocular depths and normals (please install [omnidata model](https://github.com/EPFL-VILAB/omnidata) before running the command):\n```\npython extract_monocular_cues.py --task depth --img_path ../data/Apartment/scan1/image --output_path ../data/Apartment/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS\npython extract_monocular_cues.py --task normal --img_path ../data/Apartment/scan1/image --output_path ../data/Apartment/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS\n```\n\nFinally, we train monosdf as\n```\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/nice_slam_grids.conf\n```\n\n# Pretrained Models\nFirst download the pretrained models with\n```\nbash scripts/download_pretrained.sh\n```\nThen you can run inference with (DTU for example)\n```\ncd code\npython evaluation/eval.py --conf confs/dtu_mlp_3views.conf --checkpoint ../pretrained_models/dtu_3views_mlp/scan65.pth --scan_id 65 --resolution 512 --eval_rendering --evals_folder ../pretrained_results\n```\n\nYou can also run the following script to extract all the meshes:\n```\npython scripts/extract_all_meshes_from_pretrained_models.py\n```\n\n# High-resolution Cues\nHere we privode script to generate high-resolution cues, and training with high-resolution cues. Please refer to our supplementary for more details.\n\nFirst you need to download the Tanks and Temples dataset from [here](https://drive.google.com/file/d/1YArOJaX9WVLJh4757uE8AEREYkgszrCo/view) and unzip it to ```data/tanksandtemples```. Then you can run the script to create overlapped patches \n```\ncd preprocess\npython generate_high_res_map.py --mode create_patches\n```\n\nand run the Omnidata model to predict monocular cues for each patch \n```\npython extract_monocular_cues.py --task depth --img_path ./highres_tmp/scan1/image/ --output_path ./highres_tmp/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS\npython extract_monocular_cues.py --task depth --img_path ./highres_tmp/scan1/image/ --output_path ./highres_tmp/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS\n```\nThis step will take a long time (~2 hours) since there are many patches and the model only use a batch size of 1. \n\nThen run the script again to merge the output of Omnidata.\n```\npython generate_high_res_map.py --mode merge_patches\n```\n\nNow you can train the model with\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 python -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_highres_grids_courtroom.conf\n```\n\nPlease note that the script for generating high-resolution cues only works for the Tanks and Temples dataset. You need to adapt it if you want to apply to other dataset.\n\n# Acknowledgements\nThis project is built upon [VolSDF](https://github.com/lioryariv/volsdf). We use pretrained [Omnidata](https://omnidata.vision) for monocular depth and normal extraction. Cuda implementation of Multi-Resolution hash encoding is based on [torch-ngp](https://github.com/ashawkey/torch-ngp). Evaluation scripts for DTU, Replica, and ScanNet are taken from [DTUeval-python](https://github.com/jzhangbs/DTUeval-python), [Nice-SLAM](https://github.com/cvg/nice-slam) and [manhattan-sdf](https://github.com/zju3dv/manhattan_sdf) respectively. We thank all the authors for their great work and repos. \n\n\n# Citation\nIf you find our code or paper useful, please cite\n```bibtex\n@article{Yu2022MonoSDF,\n  author    = {Yu, Zehao and Peng, Songyou and Niemeyer, Michael and Sattler, Torsten and Geiger, Andreas},\n  title     = {MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction},\n  journal   = {Advances in Neural Information Processing Systems (NeurIPS)},\n  year      = {2022},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautonomousvision%2Fmonosdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fautonomousvision%2Fmonosdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautonomousvision%2Fmonosdf/lists"}