{"id":15032994,"url":"https://github.com/nianticlabs/simplerecon","last_synced_at":"2025-05-16T09:04:38.839Z","repository":{"id":58368555,"uuid":"513074143","full_name":"nianticlabs/simplerecon","owner":"nianticlabs","description":"[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions","archived":false,"fork":false,"pushed_at":"2025-05-09T13:05:38.000Z","size":13307,"stargazers_count":1352,"open_issues_count":10,"forks_count":124,"subscribers_count":34,"default_branch":"main","last_synced_at":"2025-05-09T14:02:03.999Z","etag":null,"topics":["computer-vision","cost-volume","depth","depth-estimation","eccv2022","multi-view-stereo","mvs","pytorch","scannet","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nianticlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-12T09:12:15.000Z","updated_at":"2025-05-09T13:05:41.000Z","dependencies_parsed_at":"2024-10-29T18:35:37.179Z","dependency_job_id":null,"html_url":"https://github.com/nianticlabs/simplerecon","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fsimplerecon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fsimplerecon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fsimplerecon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fsimplerecon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nianticlabs","download_url":"https://codeload.github.com/nianticlabs/simplerecon/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254501556,"owners_count":22081528,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cost-volume","depth","depth-estimation","eccv2022","multi-view-stereo","mvs","pytorch","scannet","visualization"],"created_at":"2024-09-24T20:19:53.649Z","updated_at":"2025-05-16T09:04:33.830Z","avatar_url":"https://github.com/nianticlabs.png","language":"Python","readme":"# SimpleRecon: 3D Reconstruction Without 3D Convolutions\n\nThis is the reference PyTorch implementation for training and testing MVS depth estimation models using the method described in\n\n\u003e **SimpleRecon: 3D Reconstruction Without 3D Convolutions**\n\u003e\n\u003e [Mohamed Sayed](https://masayed.com), [John Gibson](https://www.linkedin.com/in/john-e-gibson-ii/), [Jamie Watson](https://www.linkedin.com/in/jamie-watson-544825127/), [Victor Adrian Prisacariu](https://www.robots.ox.ac.uk/~victor/), [Michael Firman](http://www.michaelfirman.co.uk), and [Clément Godard](http://www0.cs.ucl.ac.uk/staff/C.Godard/)\n\u003e\n\u003e [Paper, ECCV 2022 (arXiv pdf)](https://arxiv.org/abs/2208.14743), [Supplemental Material](https://nianticlabs.github.io/simplerecon/resources/SimpleRecon_supp.pdf), [Project Page](https://nianticlabs.github.io/simplerecon/), [Video](https://youtu.be/3LP8jp45Ef8)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"media/teaser.jpeg\" alt=\"example output\" width=\"720\" /\u003e\n\u003c/p\u003e\n\nhttps://github.com/nianticlabs/simplerecon/assets/14994206/ae5074c2-6537-45f1-9f5e-0b3646a96dcb\n\nhttps://user-images.githubusercontent.com/14994206/189788536-5fa8a1b5-ae8b-4f64-92d6-1ff1abb03eaf.mp4\n\nThis code is for non-commercial use; please see the [license file](LICENSE) for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below and link this repo. Thanks!\n\n## 🆕 Updates\n\n25/05/2023: Fixed package verions for `llvm-openmp`, `clang`, and `protobuf`. Do use this new environment file if you have trouble running the code and/or if dataloading is being limited to a single thread.\n\n09/03/2023: Added kornia version to the environments file to fix kornia typing issue. (thanks @natesimon!)\n\n26/01/2023: The license has been modified to make running the model for academic reasons easier. Please the LICENSE file for the exact details.\n\nThere is an update as of 31/12/2022 that fixes slightly wrong intrinsics, flip augmentation for the cost volume, and a \nnumerical precision bug in projection. All scores improve. You will need to update your forks and use new weights. See [Bug Fixes](#-bug-fixes).\n\nPrecomputed scans for online default frames are here: https://drive.google.com/drive/folders/1dSOFI9GayYHQjsx4I_NG0-3ebCAfWXjV?usp=share_link \n\n## Table of Contents\n\n  * [🗺️ Overview](#%EF%B8%8F-overview)\n  * [⚙️ Setup](#%EF%B8%8F-setup)\n  * [📦 Models](#-models)\n  * [🚀 Speed](#-speed)\n  * [📝 TODOs:](#-todos)\n  * [🏃 Running out of the box!](#-running-out-of-the-box)\n  * [💾 ScanNetv2 Dataset](#-scannetv2-dataset)\n  * [🖼️🖼️🖼️ Frame Tuples](#%EF%B8%8F%EF%B8%8F%EF%B8%8F-frame-tuples)\n  * [📊 Testing and Evaluation](#-testing-and-evaluation)\n  * [👉☁️ Point Cloud Fusion](#%EF%B8%8F-point-cloud-fusion)\n  * [📊 Mesh Metrics](#-mesh-metrics)\n  * [⏳ Training](#-training)\n    + [🎛️ Finetuning a pretrained model](#%EF%B8%8F-finetuning-a-pretrained-model)\n  * [🔧 Other training and testing options](#-other-training-and-testing-options)\n  * [✨ Visualization](#-visualization)\n  * [📝🧮👩‍💻 Notation for Transformation Matrices](#-notation-for-transformation-matrices)\n  * [🗺️ World Coordinate System](#%EF%B8%8F-world-coordinate-system)\n  * [🐜🔧 Bug Fixes](#-bug-fixes)\n  * [🗺️💾 COLMAP Dataset](#%EF%B8%8F-colmap-dataset)\n  * [🙏 Acknowledgements](#-acknowledgements)\n  * [📜 BibTeX](#-bibtex)\n  * [👩‍⚖️ License](#%EF%B8%8F-license)\n\n## 🗺️ Overview\n\nSimpleRecon takes as input posed RGB images, and outputs a depth map for a target image.\n\n## ⚙️ Setup\n\nAssuming a fresh [Anaconda](https://www.anaconda.com/download/) distribution, you can install dependencies with:\n```shell\nconda env create -f simplerecon_env.yml\n```\nWe ran our experiments with PyTorch 1.10, CUDA 11.3, Python 3.9.7 and Debian GNU/Linux 10.\n\n## 📦 Models\n\nDownload a pretrained model into the `weights/` folder.\n\nWe provide the following models (scores are with online default keyframes):\n\n| `--config`  | Model  | Abs Diff↓| Sq Rel↓ | delta \u003c 1.05↑| Chamfer↓ | F-Score↑ |\n|-------------|----------|--------------------|---------|---------|--------------|----------|\n| [`hero_model.yaml`](https://drive.google.com/file/d/1hCuKZjEq-AghrYAmFxJs_4eeixIlP488/view?usp=sharing) | Metadata + Resnet Matching | 0.0868 | 0.0127 | 74.26 | 5.69 | 0.680 |\n| [`dot_product_model.yaml`](https://drive.google.com/file/d/13lW-VPgsl2eAo95E87RKWoK8KUZelkUK/view?usp=sharing) | Dot Product + Resnet Matching | 0.0910 | 0.0134 | 71.90 | 5.92 | 0.667 |\n\n`hero_model` is the one we use in the paper as **Ours**\n\n## 🚀 Speed\n\n| `--config` |  Model | Inference Speed (`--batch_size 1`) | Inference GPU memory  | Approximate training time   |\n|------------|------------|------------|-------------------------|-----------------------------|\n| `hero_model` | Hero, Metadata + Resnet | 130ms / 70ms (speed optimized) | 2.6GB / 5.7GB (speed optimized)        | 36 hours                    |\n| `dot_product_model` | Dot Product + Resnet | 80ms | 2.6GB        | 36 hours                    |\n\nWith larger batches speed increases considerably. With batch size 8 on the non-speed optimized model, the latency drops to \n~40ms.\n\n## 📝 TODOs:\n- [x] Simple scan for folks to quickly try the code, instead of downloading the ScanNetv2 test scenes. DONE\n- [x] ScanNetv2 extraction, ~~ETA 10th October~~ DONE\n- [ ] FPN model weights.\n- ~~[ ] Tutorial on how to use Scanniverse data, ETA 5th October 10th October 20th October~~ At present there is no publically available way of exporting scans from Scanniverse. You'll have to use ios-logger; NeuralRecon have a good tutorial on [this](https://github.com/zju3dv/NeuralRecon/blob/master/DEMO.md), and a dataloader that accepts the processed format is at ```datasets/arkit_dataset.py```. UPDATE: There is now a quick readme [data_scripts/IOS_LOGGER_ARKIT_README.md](data_scripts/IOS_LOGGER_ARKIT_README.md) for how to process and run inference an ios-logger scan using the script at ```data_scripts/ios_logger_preprocessing.py```.\n\n## 🏃 Running out of the box!\n\nWe've now included two scans for people to try out immediately with the code. You can download these scans [from here](https://drive.google.com/file/d/1x-auV7vGCMdu5yZUMPcoP83p77QOuasT/view?usp=sharing).\n\nSteps:\n1. Download weights for the `hero_model` into the weights directory.\n2. Download the scans and unzip them to a directory of your choosing.\n3. Modify the value for the option `dataset_path` in `configs/data/vdr_dense.yaml` to the base path of the unzipped vdr folder.\n4. You should be able to run it! Something like this will work:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/vdr_dense.yaml \\\n            --num_workers 8 \\\n            --batch_size 2 \\\n            --fast_cost_volume \\\n            --run_fusion \\\n            --depth_fuser open3d \\\n            --fuse_color \\\n            --dump_depth_visualization;\n```\n\nThis will output meshes, quick depth viz, and socres when benchmarked against LiDAR depth under `OUTPUT_PATH`. \n\nThis command uses `vdr_dense.yaml` which will generate depths for every frame and fuse them into a mesh. In the paper we report scores with fused keyframes instead, and you can run those using `vdr_default.yaml`. You can also use `dense_offline` tuples by instead using `vdr_dense_offline.yaml`.\n\n\n\nSee the section below on testing and evaluation. Make sure to use the correct config flags for datasets. \n\n## 💾 ScanNetv2 Dataset\n\n~~Please follow the instructions [here](https://github.com/ScanNet/ScanNet) to download the dataset. This dataset is quite big (\u003e2TB), so make sure you have enough space, especially for extracting files.~~\n\n~~Once downloaded, use this [script](https://github.com/ScanNet/ScanNet/tree/master/SensReader/python) to export raw sensor data to images and depth files.~~\n\nWe've written a quick tutorial and included modified scripts to help you with downloading and extracting ScanNetv2. You can find them at [data_scripts/scannet_wrangling_scripts/](data_scripts/scannet_wrangling_scripts)\n\nYou should change the `dataset_path` config argument for ScanNetv2 data configs at `configs/data/` to match where your dataset is.\n\nThe codebase expects ScanNetv2 to be in the following format:\n\n    dataset_path\n        scans_test (test scans)\n            scene0707\n                scene0707_00_vh_clean_2.ply (gt mesh)\n                sensor_data\n                    frame-000261.pose.txt\n                    frame-000261.color.jpg \n                    frame-000261.color.512.png (optional, image at 512x384)\n                    frame-000261.color.640.png (optional, image at 640x480)\n                    frame-000261.depth.png (full res depth, stored scale *1000)\n                    frame-000261.depth.256.png (optional, depth at 256x192 also\n                                                scaled)\n                scene0707.txt (scan metadata and image sizes)\n                intrinsic\n                    intrinsic_depth.txt\n                    intrinsic_color.txt\n            ...\n        scans (val and train scans)\n            scene0000_00\n                (see above)\n            scene0000_01\n            ....\n\nIn this example `scene0707.txt` should contain the scan's metadata:\n\n        colorHeight = 968\n        colorToDepthExtrinsics = 0.999263 -0.010031 0.037048 ........\n        colorWidth = 1296\n        depthHeight = 480\n        depthWidth = 640\n        fx_color = 1170.187988\n        fx_depth = 570.924255\n        fy_color = 1170.187988\n        fy_depth = 570.924316\n        mx_color = 647.750000\n        mx_depth = 319.500000\n        my_color = 483.750000\n        my_depth = 239.500000\n        numColorFrames = 784\n        numDepthFrames = 784\n        numIMUmeasurements = 1632\n\n`frame-000261.pose.txt` should contain pose in the form:\n\n        -0.384739 0.271466 -0.882203 4.98152\n        0.921157 0.0521417 -0.385682 1.46821\n        -0.0587002 -0.961035 -0.270124 1.51837\n\n`frame-000261.color.512.png` and `frame-000261.color.640.png` are precached resized versions of the original image to save load and compute time during training and testing. `frame-000261.depth.256.png` is also a \nprecached resized version of the depth map. \n\nAll resized precached versions of depth and images are nice to have but not \nrequired. If they don't exist, the full resolution versions will be loaded, and downsampled on the fly.\n\n\n## 🖼️🖼️🖼️ Frame Tuples\n\nBy default, we estimate a depth map for each keyframe in a scan. We use DeepVideoMVS's heuristic for keyframe separation and construct tuples to match. We use the depth maps at these keyframes for depth fusion. For each keyframe, we associate a list of source frames that will be used to build the cost volume. We also use dense tuples, where we predict a depth map for each frame in the data, and not just at specific keyframes; these are mostly used for visualization.\n\nWe generate and export a list of tuples across all scans that act as the dataset's elements. We've precomputed these lists and they are available at `data_splits` under each dataset's split. For ScanNet's test scans they are at `data_splits/ScanNetv2/standard_split`. Our core depth numbers are computed using `data_splits/ScanNetv2/standard_split/test_eight_view_deepvmvs.txt`.\n\n\n\nHere's a quick taxonamy of the type of tuples for test:\n\n- `default`: a tuple for every keyframe following DeepVideoMVS where all source frames are in the past. Used for all depth and mesh evaluation unless stated otherwise. For ScanNet use `data_splits/ScanNetv2/standard_split/test_eight_view_deepvmvs.txt`.\n- `offline`: a tuple for every frame in the scan where source frames can be both in the past and future relative to the current frame. These are useful when a scene is captured offline, and you want the best accuracy possible. With online tuples, the cost volume will contain empty regions as the camera moves away and all source frames lag behind; however with offline tuples, the cost volume is full on both ends, leading to a better scale (and metric) estimate.\n- `dense`: an online tuple (like default) for every frame in the scan where all source frames are in the past. For ScanNet this would be `data_splits/ScanNetv2/standard_split/test_eight_view_deepvmvs_dense.txt`.\n- `offline`: an offline tuple for every keyframefor every keyframe in the scan.\n\n\nFor the train and validation sets, we follow the same tuple augmentation strategy as in DeepVideoMVS and use the same core generation script.\n\nIf you'd like to generate these tuples yourself, you can use the scripts at `data_scripts/generate_train_tuples.py` for train tuples and `data_scripts/generate_test_tuples.py` for test tuples. These follow the same config format as `test.py` and will use whatever dataset class you build to read pose informaiton.\n\nExample for test:\n\n```bash\n# default tuples\npython ./data_scripts/generate_test_tuples.py \n    --data_config configs/data/scannet_default_test.yaml\n    --num_workers 16\n\n# dense tuples\npython ./data_scripts/generate_test_tuples.py \n    --data_config configs/data/scannet_dense_test.yaml\n    --num_workers 16\n```\n\nExamples for train:\n\n```bash\n# train\npython ./data_scripts/generate_train_tuples.py \n    --data_config configs/data/scannet_default_train.yaml\n    --num_workers 16\n\n# val\npython ./data_scripts/generate_val_tuples.py \n    --data_config configs/data/scannet_default_val.yaml\n    --num_workers 16\n```\n\nThese scripts will first check each frame in the dataset to make sure it has an existing RGB frame, an existing depth frame (if appropriate for the dataset), and also an existing and valid pose file. It will save these `valid_frames` in a text file in each scan's folder, but if the directory is read only, it will ignore saving a `valid_frames` file and generate tuples anyway.\n\n\n## 📊 Testing and Evaluation\n\nYou can use `test.py` for inferring and evaluating depth maps and fusing meshes. \n\nAll results will be stored at a base results folder (results_path) at:\n\n    opts.output_base_path/opts.name/opts.dataset/opts.frame_tuple_type/\n\nwhere opts is the `options` class. For example, when `opts.output_base_path` is `./results`, `opts.name` is `HERO_MODEL`,\n`opts.dataset` is `scannet`, and `opts.frame_tuple_type` is `default`, the output directory will be \n\n    ./results/HERO_MODEL/scannet/default/\n\nMake sure to set `--opts.output_base_path` to a directory suitable for you to store results.\n\n`--frame_tuple_type` is the type of image tuple used for MVS. A selection should \nbe provided in the `data_config` file you used. \n\nBy default `test.py` will attempt to compute depth scores for each frame and provide both frame averaged and scene averaged metrics. The script will save these scores (per scene and totals) under `results_path/scores`.\n\nWe've done our best to ensure that a torch batching bug through the matching \nencoder is fixed for (\u003c10^-4) accurate testing by disabling image batching \nthrough that encoder. Run `--batch_size 4` at most if in doubt, and if \nyou're looking to get as stable as possible numbers and avoid PyTorch \ngremlins, use `--batch_size 1` for comparison evaluation.\n\nIf you want to use this for speed, set `--fast_cost_volume` to True. This will\nenable batching through the matching encoder and will enable an einops \noptimized feature volume.\n\n\n```bash\n# Example command to just compute scores \nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_default_test.yaml \\\n            --num_workers 8 \\\n            --batch_size 4;\n\n# If you'd like to get a super fast version use:\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_default_test.yaml \\\n            --num_workers 8 \\\n            --fast_cost_volume \\\n            --batch_size 2;\n```\n\nThis script can also be used to perform a few different auxiliary tasks, \nincluding:\n\n**TSDF Fusion**\n\nTo run TSDF fusion provide the `--run_fusion` flag. You have two choices for \nfusers\n1) `--depth_fuser ours` (default) will use our fuser, whose meshes are used \n    in most visualizations and for scores. This fuser does not support \n    color. We've provided a custom branch of scikit-image with our custom\n    implementation of `measure.matching_cubes` that allows single walled. We use \n    single walled meshes for evaluation. If this is isn't important to you, you\n    can set the export_single_mesh to `False` for call to `export_mesh` in `test.py`.\n2) `--depth_fuser open3d` will use the open3d depth fuser. This fuser \n    supports color and you can enable this by using the `--fuse_color` flag. \n\nBy default, depth maps will be clipped to 3m for fusion and a tsdf \nresolution of 0.04m\u003csup\u003e3\u003c/sup\u003e will be used, but you can change that by changing both \n`--max_fusion_depth` and `--fusion_resolution`\n\nYou can optionnally ask for predicted depths used for fusion to be masked \nwhen no vaiid MVS information exists using `--mask_pred_depths`. This is not \nenabled by default.\n\nYou can also fuse the best guess depths from the cost volume before the \ncost volume encoder-decoder that introduces a strong image prior. You can do this by using \n`--fusion_use_raw_lowest_cost`.\n\nMeshes will be stored under `results_path/meshes/`.\n\n```bash\n# Example command to fuse depths to get meshes\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_default_test.yaml \\\n            --num_workers 8 \\\n            --run_fusion \\\n            --batch_size 8;\n```\n\n**Cache depths**\n\nYou can optionally store depths by providing the `--cache_depths` flag. \nThey will be stored at `results_path/depths`.\n\n```bash\n# Example command to compute scores and cache depths\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_default_test.yaml \\\n            --num_workers 8 \\\n            --cache_depths \\\n            --batch_size 8;\n\n# Example command to fuse depths to get color meshes\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_default_test.yaml \\\n            --num_workers 8 \\\n            --run_fusion \\\n            --depth_fuser open3d \\\n            --fuse_color \\\n            --batch_size 4;\n```\n**Quick viz**\n\nThere are other scripts for deeper visualizations of output depths and \nfusion, but for quick export of depth map visualization you can use \n`--dump_depth_visualization`. Visualizations will be stored at `results_path/viz/quick_viz/`.\n\n\n```bash\n# Example command to output quick depth visualizations\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_default_test.yaml \\\n            --num_workers 8 \\\n            --dump_depth_visualization \\\n            --batch_size 4;\n```\n## 👉☁️ Point Cloud Fusion\n\nWe also allow point cloud fusion of depth maps using the fuser from 3DVNet's [repo](https://github.com/alexrich021/3dvnet/blob/main/mv3d/eval/pointcloudfusion_custom.py). \n\n```bash\n# Example command to fuse depths into point clouds.\nCUDA_VISIBLE_DEVICES=0 python pc_fusion.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_dense_test.yaml \\\n            --num_workers 8 \\\n            --batch_size 4;\n```\n\nChange `configs/data/scannet_dense_test.yaml` to `configs/data/scannet_default_test.yaml` to use keyframes only if you don't want to wait too long.\n\n## 📊 Mesh Metrics\n\nWe use TransformerFusion's [mesh evaluation](https://github.com/AljazBozic/TransformerFusion/blob/main/src/evaluation/eval.py) for our main results table but set the seed to a fixed value for consistency when randomly sampling meshes. We also report mesh metrics using NeuralRecon's [evaluation](https://github.com/zju3dv/NeuralRecon/blob/master/tools/evaluation.py) in the supplemental material.\n\nFor point cloud evaluation, we use TransformerFusion's code but load in a point cloud in place of sampling a mesh's surface.\n\n\n\n## ⏳ Training\n\nBy default models and tensorboard event files are saved to `~/tmp/tensorboard/\u003cmodel_name\u003e`.\nThis can be changed with the `--log_dir` flag.\n\nWe train with a batch_size of 16 with 16-bit precision on two A100s on the default ScanNetv2 split.\n\nExample command to train with two GPUs:\n```shell\nCUDA_VISIBLE_DEVICES=0,1 python train.py --name HERO_MODEL \\\n            --log_dir logs \\\n            --config_file configs/models/hero_model.yaml \\\n            --data_config configs/data/scannet_default_train.yaml \\\n            --gpus 2 \\\n            --batch_size 16;\n```\n\n\nThe code supports any number of GPUs for training.\nYou can specify which GPUs to use with the `CUDA_VISIBLE_DEVICES` environment.\n\nAll our training runs were performed on two NVIDIA A100s.\n\n**Different dataset**\n\nYou can train on a custom MVS dataset by writing a new dataloader class which inherits from `GenericMVSDataset` at `datasets/generic_mvs_dataset.py`. See the `ScannetDataset` class in `datasets/scannet_dataset.py` or indeed any other class in `datasets` for an example.\n\n\n### 🎛️ Finetuning a pretrained model\n\nTo finetune, simple load a checkpoint (not resume!) and train from there:\n```shell\nCUDA_VISIBLE_DEVICES=0 python train.py --config configs/models/hero_model.yaml\n                --data_config configs/data/scannet_default_train.yaml \n                --load_weights_from_checkpoint weights/hero_model.ckpt\n```\n\nChange the data configs to whatever dataset you want to finetune to. \n\n## 🔧 Other training and testing options\n\nSee `options.py` for the range of other training options, such as learning rates and ablation settings, and testing options.\n\n## ✨ Visualization\n\nOther than quick depth visualization in the `test.py` script, there are two scripts for visualizing depth output. \n\nThe first is `visualization_scripts/visualize_scene_depth_output.py`. This will produce a video with color images of the reference and source frames, depth prediction, cost volume estimate, GT depth, and estimated normals from depth. The script assumes you have cached depth output using `test.py` and accepts the same command template format as `test.py`:\n\n```shell\n# Example command to get visualizations for dense frames\nCUDA_VISIBLE_DEVICES=0 python ./visualization_scripts/visualize_scene_depth_output.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --data_config configs/data/scannet_dense_test.yaml \\\n            --num_workers 8;\n```\n\nwhere `OUTPUT_PATH` is the base results directory for SimpleRecon (what you used for test to begin with). You could optionally run `.visualization_scripts/generate_gt_min_max_cache.py` before this script to get a scene average for the min and max depth values used for colormapping; if those aren't available, the script will use 0m and 5m for colomapping min and max.\n\nThe second allows a live visualization of meshing. This script will use cached depth maps if available, otherwise it will use the model to predict them before fusion. The script will iteratively load in a depth map, fuse it, save a mesh file at this step, and render this mesh alongside a camera marker for the birdseye video, and from the point of view of the camera for the fpv video. \n\n```shell\n# Example command to get live visualizations for mesh reconstruction\nCUDA_VISIBLE_DEVICES=0 python visualize_live_meshing.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs/models/hero_model.yaml \\\n            --load_weights_from_checkpoint weights/hero_model.ckpt \\\n            --data_config configs/data/scannet_dense_test.yaml \\\n            --num_workers 8;\n```\n\nBy default the script will save meshes to an intermediate location, and you can optionally load those meshes to save time when visualizing the same meshes again by passing `--use_precomputed_partial_meshes`. All intermediate meshes will have had to be computed on the previous run for this to work.\n\n## 📝🧮👩‍💻 Notation for Transformation Matrices\n\n__TL;DR:__ `world_T_cam == world_from_cam`  \nThis repo uses the notation \"cam_T_world\" to denote a transformation from world to camera points (extrinsics). The intention is to make it so that the coordinate frame names would match on either side of the variable when used in multiplication from *right to left*:\n\n    cam_points = cam_T_world @ world_points\n\n`world_T_cam` denotes camera pose (from cam to world coords). `ref_T_src` denotes a transformation from a source to a reference view.  \nFinally this notation allows for representing both rotations and translations such as: `world_R_cam` and `world_t_cam`\n\n## 🗺️ World Coordinate System\n\nThis repo is geared towards ScanNet, so while its functionality should allow for any coordinate system (signaled via input flags), the model weights we provide assume a ScanNet coordinate system. This is important since we include ray information as part of metadata. Other datasets used with these weights should be transformed to the ScanNet system. The dataset classes we include will perform the appropriate transforms. \n\n## 🐜🔧 Bug Fixes\n\n### **Update 31/12/2022:**\n\nThere are a few bugs addressed in this update, you will need to update your forks and use new weights from the table near the beginning of this README. You will also need to make sure you have the correct intrinsics files extracted using the reader.\n- We were initially using a slightly incorrect set of intrinsics in ScanNet. The repo now uses intriniscs from the intriniscs folder.\n- The MLP in the cost volume wasn't seeing any flip augmentation which led to biases around edges, so we've now included a geometry based flip in the base dataset class. It is enabled only for the train split.\n- We had a bug in projection that never allowed the mask in the cost volume to properly function, so we've now switched to using the same normalization as in OpenCV and Kornia.\n\nThanks to all those that pointed it out and were patient while we worked on fixes. \n\nAll scores improve with these fixes, and the associated weights are uploaded here. For old scores, code, and weights, check this commit hash: 7de5b451e340f9a11c7fd67bd0c42204d0b009a9\n\nFull scores for models with bug fixes:\n\n_Depth_\n| `--config`  | Abs Diff↓ | Abs Rel↓ | Sq Rel↓ |  RMSE↓  |  log RMSE↓  |delta \u003c 1.05↑ | delta \u003c 1.10↑ |\n|-------------|-----------|----------|---------|---------|-------------|--------------|---------------|\n| `hero_model.yaml`, Metadata + Resnet  | 0.0868 | 0.0428 | 0.0127 | 0.1472 |  0.0681 | 74.26 | 90.88 |\n| `dot_product_model.yaml`, dot product + Resnet | 0.0910 | 0.0453 | 0.0134 | 0.1509 | 0.0704 | 71.90 | 89.75 | \n\n_Mesh Fusion_\n| `--config`  | Acc↓ | Comp↓ | Chamfer↓ | Recall↑ | Precision↑ | F-Score↑ |\n|-------------|------|-------|----------|---------|------------|----------|\n| `hero_model.yaml`, Metadata + Resnet | 5.41 | 5.98 | 5.69 | 0.695 | 0.668 | 0.680 |\n| `dot_product_model.yaml`, dot product + Resnet | 5.66 | 6.18 | 5.92 | 0.682 | 0.655 | 0.667 | \n\n\n_Comparison:_\n| `--config`  | Model  | Abs Diff↓| Sq Rel↓ | delta \u003c 1.05↑| Chamfer↓ | F-Score↑ |\n|-------------|----------|--------------------|---------|---------|--------------|----------|\n| `hero_model.yaml` | Metadata + Resnet Matching | 0.0868 | 0.0127 | 74.26 | 5.69 | 0.680 |\n| OLD `hero_model.yaml` | Metadata + Resnet Matching | 0.0885 | 0.0125 | 73.16 | 5.81 | 0.671 |\n| `dot_product_model.yaml` | Dot Product + Resnet Matching | 0.0910 | 0.0134 | 71.90 | 5.92 | 0.667 |\n| OLD `dot_product_model.yaml` | Dot Product + Resnet Matching | 0.0941 | 0.0139 | 70.48 | 6.29 | 0.642 |\n\n\n### **Tiny bug with frame count:**\n\nInitially this repo spat out tuple files for default DVMVS style keyframes with 9 extra frame of 25599 for the ScanNetv2 test set. There was a minor bug with handling lost tracking that's now fixed. This repo should now mimic the DVMVS keyframe buffer exactly, with 25590 keyframes for testing. The only effect this bug had was the inclusion of 9 extra frames, all the other tuples were exactly the same as that of DVMVS. The offending frames are in these scans \n\n```\nscan         previous count  new count\n--------------------------------------\nscene0711_00 393             392\nscene0727_00 209             208 \nscene0736_00 1023            1022 \nscene0737_00 408             407 \nscene0751_00 165             164 \nscene0775_00 220             219 \nscene0791_00 227             226 \nscene0794_00 141             140 \nscene0795_00 102             101 \n```\n\nThe tuple files for default test have been updated. Since this is a small (~3e-4) difference in extra frames scored, the scores are unchanged.\n\n## 🗺️💾 COLMAP Dataset\n\n__TL;DR:__ Scale your poses and crop your images.\n\nWe do provide a dataloader for loading images from a COLMAP sparse reconstruction. For this to work with SimpleRecon, you'll need to crop your images to match the FOV of ScanNet (roughly similar to an iPhone's FOV in video mode), and scale your pose's location using known real world measurements. If these steps aren't taken, the cost volume won't be built correctly, and the network will not estimate depth properly.\n\n## 🙏 Acknowledgements\n\nWe thank Aljaž Božič of [TransformerFusion](https://github.com/AljazBozic/TransformerFusion), Jiaming Sun of [Neural Recon](https://zju3dv.github.io/neuralrecon/), and Arda Düzçeker of [DeepVideoMVS](https://github.com/ardaduz/deep-video-mvs) for quickly providing useful information to help with baselines and for making their codebases readily available, especially on short notice.\n\nThe tuple generation scripts make heavy use of a modified version of DeepVideoMVS's [Keyframe buffer](https://github.com/ardaduz/deep-video-mvs/blob/master/dvmvs/keyframe_buffer.py) (thanks again Arda and co!).\n\nThe PyTorch point cloud fusion module at `torch_point_cloud_fusion` code is borrowed from 3DVNet's [repo](https://github.com/alexrich021/3dvnet/blob/main/mv3d/eval/pointcloudfusion_custom.py). Thanks Alexander Rich!\n\nWe'd also like to thank Niantic's infrastructure team for quick actions when we needed them. Thanks folks!\n\nMohamed is funded by a Microsoft Research PhD Scholarship (MRL 2018-085).\n\n## 📜 BibTeX\n\nIf you find our work useful in your research please consider citing our paper:\n\n```\n@inproceedings{sayed2022simplerecon,\n  title={SimpleRecon: 3D Reconstruction Without 3D Convolutions},\n  author={Sayed, Mohamed and Gibson, John and Watson, Jamie and Prisacariu, Victor and Firman, Michael and Godard, Cl{\\'e}ment},\n  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},\n  year={2022},\n}\n```\n\n## 👩‍⚖️ License\n\nCopyright © Niantic, Inc. 2022. Patent Pending.\nAll rights reserved.\nPlease see the [license file](LICENSE) for terms.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fsimplerecon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnianticlabs%2Fsimplerecon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fsimplerecon/lists"}