{"id":14958101,"url":"https://github.com/nianticlabs/doubletake","last_synced_at":"2025-04-30T14:10:36.897Z","repository":{"id":254981583,"uuid":"846159875","full_name":"nianticlabs/doubletake","owner":"nianticlabs","description":"[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation","archived":false,"fork":false,"pushed_at":"2024-09-12T08:43:28.000Z","size":17255,"stargazers_count":172,"open_issues_count":3,"forks_count":12,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-02T07:38:59.094Z","etag":null,"topics":["ai","computer-vision","cost-volume","depth","depth-estimation","eccv2024","machine-learning","multiview-stereo","mvs","python","pytorch","visualization"],"latest_commit_sha":null,"homepage":"https://nianticlabs.github.io/doubletake/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nianticlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-22T16:39:44.000Z","updated_at":"2025-02-26T02:52:51.000Z","dependencies_parsed_at":"2024-09-22T06:11:42.301Z","dependency_job_id":null,"html_url":"https://github.com/nianticlabs/doubletake","commit_stats":{"total_commits":16,"total_committers":2,"mean_commits":8.0,"dds":0.25,"last_synced_commit":"925bde2a9e5913132cb94812e337f3968d6063a1"},"previous_names":["nianticlabs/doubletake"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fdoubletake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fdoubletake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fdoubletake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fdoubletake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nianticlabs","download_url":"https://codeload.github.com/nianticlabs/doubletake/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243510126,"owners_count":20302295,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","computer-vision","cost-volume","depth","depth-estimation","eccv2024","machine-learning","multiview-stereo","mvs","python","pytorch","visualization"],"created_at":"2024-09-24T13:16:13.920Z","updated_at":"2025-03-14T02:07:36.019Z","avatar_url":"https://github.com/nianticlabs.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DoubleTake: Geometry Guided Depth Estimation\n\nThis is the reference PyTorch implementation for training and testing MVS depth estimation models using the method described in\n\n\u003e **DoubleTake: Geometry Guided Depth Estimation**\n\u003e\n\u003e [Mohamed Sayed](https://masayed.com), [Filippo Aleotti](https://filippoaleotti.github.io/website/), [Jamie Watson](https://www.linkedin.com/in/jamie-watson-544825127/), [Zawar Qureshi](https://qureshizawar.github.io/), [Guillermo Garcia-Hernando](), [Gabriel Brostow](http://www0.cs.ucl.ac.uk/staff/g.brostow/), [Sara Vicente](https://scholar.google.co.uk/citations?user=7wWsNNcAAAAJ\u0026hl=en) and  [Michael Firman](http://www.michaelfirman.co.uk).\n\u003e\n\u003e [Paper, ECCV 2024 (arXiv pdf)](https://nianticlabs.github.io/doubletake/resources/DoubleTake.pdf), [Supplemental Material](https://nianticlabs.github.io/doubletake/resources/DoubleTakeSupplemental.pdf), [Project Page](https://nianticlabs.github.io/doubletake/), [Video](https://www.youtube.com/watch?v=IklQ5AHNdI8\u0026feature=youtu.be)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"media/teaser.png\" alt=\"example output\" width=\"720\" /\u003e\n\u003c/p\u003e\n\n\n\nhttps://github.com/user-attachments/assets/aa2052df-79f4-43a8-ab24-d704660f228a\n\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"media/mesh_teaser.png\" alt=\"example output\" width=\"720\" /\u003e\n\u003c/p\u003e\n\nPlease, refer to the the [license file](LICENSE) for terms of usage. If you use this codebase in your research, please consider citing our paper using the BibTex below and linking this repo. Thanks!\n\n## Table of Contents\n\n  * [🗺️ Overview](#%EF%B8%8F-overview)\n  * [⚙️ Setup](#%EF%B8%8F-setup)\n  * [📦 Trained Models and Precomputed Meshes/Scores](#-pretrained-models)\n  * [🚀 Speed](#-speed)\n  * [🏃 Running out of the box!](#-running-out-of-the-box)\n  * [💾 ScanNetv2 Dataset](#-scannetv2-dataset)\n  * [💾 SimpleRecon ScanNet Training Depth Renders](#-simplerecon-scannet-training-depth-renders)\n  * [💾 3RScan Dataset](#-3rscan-dataset)\n  * [📊 Testing and Evaluation](#-testing-and-evaluation)\n  * [📊 Mesh Metrics](#-mesh-metrics)\n  * [📝🧮👩‍💻 Notation for Transformation Matrices](#-notation-for-transformation-matrices)\n  * [🗺️ World Coordinate System](#%EF%B8%8F-world-coordinate-system)\n  * [🔨💾 Training Data Preperation](#-training-data-preperation)\n  * [🙏 Acknowledgements](#-acknowledgements)\n  * [📜 BibTeX](#-bibtex)\n  * [👩‍⚖️ License](#%EF%B8%8F-license)\n\n## 🗺️ Overview\n\nDoubleTake takes as input posed RGB images, and outputs a depth map for a target image. Under the hood, it uses a mesh it itself builds either online (incrementally) or offline (mesh built on one pass and used for better depth on the second pass) to improve its own depth estimates.\n\nhttps://github.com/user-attachments/assets/269c658a-7325-4b52-98ab-bd3505f045db\n\n## ⚙️ Setup\n\nWe are going to create a new Mamba environment called `doubletake`. If you don't have Mamba, you can install it with:\n\n```shell\nmake install-mamba\n```\n\nThen setup the environment with:\n```shell\nmake create-mamba-env\nmamba activate doubletake\n```\n\nIn the code directory, install the repo as a pip package:\n```shell\npip install -e .\n```\n\nSome C++ code will compile JIT using ninja the first time you use any of the fusers. Should be quick.\n\nIn case you don't have this in your `~/.bashrc` already, you should run:\n```shell\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/\n```\nIf you get a `GLIBCXX_3.4.29` not found error it's very likely this. \n\n## 📦 Trained Models and Precomputed Meshes/Scores\nWe provide three models. The standard DoubleTake model used for incremental, offline, and revisit evaluation on all datasets and figures in the paper, a slimmed down faster version of DoubleTake, and the vanilla SimpleRecon model we used for SimpleRecon scores. Use the links in the table to access the weights for each. The scores here are very slightly different (better) than those in the paper due to a slight bug fix in training data renders.\n\nDownload a pretrained model into the `weights/` folder.\n\nScores on ScanNet:\n| Model         | Config                                | Weights                                                               | Notes         |\n|---------------|---------------------------------------|-----------------------------------------------------------------------|---------------|\n| SimpleRecon   | configs/models/simplerecon_model.yaml | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/simplerecon_model.ckpt)   |               |\n| DoubleTake Small | configs/models/doubletake_small_model.yaml | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_small_model.ckpt) |               |\n| DoubleTake    | configs/models/doubletake_model.yaml  | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_model.ckpt)      | ours in the paper |\n\n| Offline/Two Pass using `test_offline_two_pass` | Abs Diff↓ | Sq Rel↓ | delta \u003c 1.05↑ | Chamfer↓ | F-Score↑ | Meshes and Full Scores |\n|-------------------------------------------------|-----------|---------|---------------|----------|----------|-------------------------|\n| SimpleRecon (Offline Tuples w/ `test_no_hint` ) | .0873     | .0128   | 74.12         | 5.29     | .668     | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/simplerecon_offline.tar) |\n| DoubleTake Small                                | .0631     | .0097   | 86.36         | 4.64     | .723     | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_small_offline.tar) |\n| DoubleTake                                      | .0624     | .0092   | 86.64         | 4.42     | .742     | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_offline.tar) |\n\n| Incremental using `test_incremental` | Abs Diff↓ | Sq Rel↓ | delta \u003c 1.05↑ | Chamfer↓ | F-Score↑ | Meshes and Full Scores |\n|----------------------------------------|-----------|---------|---------------|----------|----------|-------------------------|\n| DoubleTake Small                        | .0825     | .0124   | 76.75         | 5.53     | .649     |[Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_small_incremental.tar) |\n| DoubleTake                              | .0754     | .0109   | 80.29         | 5.03     | .689     |[Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_incremental.tar) |\n\n| No hint and online using `test_no_hint` | Abs Diff↓ | Sq Rel↓ | delta \u003c 1.05↑ | Chamfer↓ | F-Score↑ | Meshes and Full Scores |\n|----------------------------------------|-----------|---------|---------------|----------|----------|-------------------------|\n| SimpleRecon (Online Tuples)            | .0873     | .0128   | 74.12         | 5.29     | .668     | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/simplerecon_online.tar) | \n| DoubleTake Small                       | .0938     | .0148   | 72.02         | 5.50     | .650     | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_small_no_hint.tar) |\n| DoubleTake                             | .0863     | .0127   | 74.64         | 5.22     | .672     | [Link](https://storage.googleapis.com/niantic-lon-static/research/doubletake/doubletake_no_hint.tar) |\n\n\n## 🚀 Speed\nPlease see the paper and supplemental material for details on runtime. We do not include the first-pass feature caching step in this code release.\n\n\n## 🏃 Running out of the box!\n\nWe've included two scans for people to try out immediately with the code. You can download these scans [from here](https://storage.googleapis.com/niantic-lon-static/research/doubletake/vdr.zip).\n\nSteps:\n1. Download weights for the `hero_model` into the weights directory.\n2. Download the scans and unzip them into `datasets/`\n3. If you've unzipped into a different folder, modify the value for the option `dataset_path` in `configs/data/vdr/vdr_default_offline.yaml` to the base path of the unzipped vdr folder.\n4. You should be able to run it! Something like this will work:\n\nFor offline depth estimation and fusion:\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_offline_two_pass --name doubletake_offline \\\n            --output_base_path $OUTPUT_PATH \\\n            --config_file configs/models/doubletake_model.yaml \\\n            --load_weights_from_checkpoint weights/doubletake_model.ckpt \\\n            --data_config configs/data/vdr/vdr_default_offline.yaml \\\n            --num_workers 8 \\\n            --batch_size 2 \\\n            --fast_cost_volume \\\n            --run_fusion \\\n            --depth_fuser custom_open3d \\\n            --fuse_color \\\n            --fusion_max_depth 3.5 \\\n            --fusion_resolution 0.02 \\\n            --trim_tsdf_using_confience \\\n            --extended_neg_truncation \\\n            --dump_depth_visualization;\n```\n\nThis will output meshes, quick depth viz, and scores when benchmarked against LiDAR depth under `OUTPUT_PATH`. \n\nThis command uses `vdr_default_offline.yaml` which will generate a depth map for every keyframe and fuse them into a mesh. You can also use `dense_offline` tuples by instead using `vdr_dense_offline.yaml` for a depth map for every frame.\n\n\nSee the section below on testing and evaluation. Make sure to use the correct config flags for datasets. \n\n## 💾 ScanNetv2 Dataset\nWe've written a quick tutorial and included modified scripts to help you with downloading and extracting ScanNetv2. You can find them at [data_scripts/scannet_wrangling_scripts/](data_scripts/scannet_wrangling_scripts)\n\nYou should change the `dataset_path` config argument for ScanNetv2 data configs at `configs/data/` to match where your dataset is.\n\nThe codebase expects ScanNetv2 to be in the following format:\n\n    dataset_path\n        scans_test (test scans)\n            scene0707\n                scene0707_00_vh_clean_2.ply (gt mesh)\n                sensor_data\n                    frame-000261.pose.txt\n                    frame-000261.color.jpg \n                    frame-000261.color.512.png (optional, image at 512x384)\n                    frame-000261.color.640.png (optional, image at 640x480)\n                    frame-000261.depth.png (full res depth, stored scale *1000)\n                    frame-000261.depth.256.png (optional, depth at 256x192 also\n                                                scaled)\n                scene0707.txt (scan metadata and image sizes)\n                intrinsic\n                    intrinsic_depth.txt\n                    intrinsic_color.txt\n            ...\n        scans (val and train scans)\n            scene0000_00\n                (see above)\n            scene0000_01\n            ....\n\nIn this example `scene0707.txt` should contain the scan's metadata:\n\n        colorHeight = 968\n        colorToDepthExtrinsics = 0.999263 -0.010031 0.037048 ........\n        colorWidth = 1296\n        depthHeight = 480\n        depthWidth = 640\n        fx_color = 1170.187988\n        fx_depth = 570.924255\n        fy_color = 1170.187988\n        fy_depth = 570.924316\n        mx_color = 647.750000\n        mx_depth = 319.500000\n        my_color = 483.750000\n        my_depth = 239.500000\n        numColorFrames = 784\n        numDepthFrames = 784\n        numIMUmeasurements = 1632\n\n`frame-000261.pose.txt` should contain pose in the form:\n\n        -0.384739 0.271466 -0.882203 4.98152\n        0.921157 0.0521417 -0.385682 1.46821\n        -0.0587002 -0.961035 -0.270124 1.51837\n\n`frame-000261.color.512.png` and `frame-000261.color.640.png` are precached resized versions of the original image to save load and compute time during training and testing. `frame-000261.depth.256.png` is also a \nprecached resized version of the depth map. \n\nAll resized precached versions of depth and images are nice to have but not \nrequired. If they don't exist, the full resolution versions will be loaded, and downsampled on the fly.\n\n\n## 💾 SimpleRecon ScanNet Training Depth Renders\nDoubleTake is trained using depth and confidence renders of partial and full meshes of the train and validation ScanNet splits. We've provided these [here](https://storage.googleapis.com/niantic-lon-static/research/doubletake/renders.tar) if you'd like to train a DoubleTake model.\n\n## 💾 3RScan Dataset\n\nThis section explains how to prepare 3RScan for testing:\n\nPlease download and extract the dataset by following the instructions [here](https://github.com/WaldJohannaU/3RScan).\n\nThe dataset should be formatted like so:\n\n```\n\u003cdataset_path\u003e\n  \u003cscanId\u003e\n  |-- mesh.refined.v2.obj\n      Reconstructed mesh\n  |-- mesh.refined.mtl\n      Corresponding material file\n  |-- mesh.refined_0.png\n      Corresponding mesh texture\n  |-- sequence.zip\n      Calibrated RGB-D sensor stream with color and depth frames, camera poses\n  |-- labels.instances.annotated.v2.ply\n      Visualization of semantic segmentation\n  |-- mesh.refined.0.010000.segs.v2.json\n      Over-segmentation of annotation mesh\n  |-- semseg.v2.json\n            Instance segmentation of the mesh (contains the labels)\n```\n\nPlease make sure to extract each `sequence.zip` inside every `scanId` folder.\n\nWe provide the frame tuple files for this dataset (see for eg. `data_splits/3rscan/test_eight_view_deepvmvs.txt`) but if you need recreate them, you can do so by following the instructions [here](https://github.com/nianticlabs/simplerecon/tree/main?tab=readme-ov-file#%EF%B8%8F%EF%B8%8F%EF%B8%8F-frame-tuples).\n\nNOTE: we only use 3RScan dataset for testing and the data split used (`data_splits/3rscan/3rscan_test.txt`) corresponds to the validation split in the original dataset repo (`splits/val.txt`). We use the val split as the transformations that align the reference scan to the rescans are readily available for the train and val splits. \n\n\n## 🖼️🖼️🖼️ Frame Tuples\n\nBy default, we estimate a depth map for each keyframe in a scan. We use DeepVideoMVS's heuristic for keyframe separation and construct tuples to match. We use the depth maps at these keyframes for depth fusion. For each keyframe, we associate a list of source frames that will be used to build the cost volume. We also use dense tuples, where we predict a depth map for each frame in the data, and not just at specific keyframes; these are mostly used for visualization.\n\nWe generate and export a list of tuples across all scans that act as the dataset's elements. We've precomputed these lists and they are available at `data_splits` under each dataset's split. For ScanNet's test scans they are at `data_splits/ScanNetv2/standard_split`. Our core depth numbers are computed using `data_splits/ScanNetv2/standard_split/test_eight_view_deepvmvs.txt`.\n\n\n\nHere's a quick taxonamy of the type of tuples for test:\n\n- `default`: a tuple for every keyframe following DeepVideoMVS where all source frames are in the past. Used for all depth and mesh evaluation unless stated otherwise. For ScanNet use `data_splits/ScanNetv2/standard_split/test_eight_view_deepvmvs.txt`.\n- `offline`: a tuple for every frame in the scan where source frames can be both in the past and future relative to the current frame. These are useful when a scene is captured offline, and you want the best accuracy possible. With online tuples, the cost volume will contain empty regions as the camera moves away and all source frames lag behind; however with offline tuples, the cost volume is full on both ends, leading to a better scale (and metric) estimate.\n- `dense`: an online tuple (like default) for every frame in the scan where all source frames are in the past. For ScanNet this would be `data_splits/ScanNetv2/standard_split/test_eight_view_deepvmvs_dense.txt`.\n- `dense_offline`: an offline tuple for every frame in the scan.\n\nFor the train and validation sets, we follow the same tuple augmentation strategy as in DeepVideoMVS and use the same core generation script.\n\nIf you'd like to generate these tuples yourself, you can use the scripts at `data_scripts/generate_train_tuples.py` for train tuples and `data_scripts/generate_test_tuples.py` for test tuples. These follow the same config format as `test.py` and will use whatever dataset class you build to read pose informaiton.\n\nExample for test:\n\n```bash\n# default tuples\npython ./scripts/data_scripts/generate_test_tuples.py \n    --data_config configs/data/scannet/scannet_default_test.yaml\n    --num_workers 16\n\n# dense tuples\npython ./scripts/data_scripts/generate_test_tuples.py \n    --data_config configs/data/scannet/scannet_dense_test.yaml\n    --num_workers 16\n```\n\nExamples for train:\n\n```bash\n# train\npython ./scripts/data_scripts/generate_train_tuples.py \n    --data_config configs/data/scannet/scannet_default_train.yaml\n    --num_workers 16\n\n# val\npython ./scripts/data_scripts/generate_val_tuples.py \n    --data_config configs/data/scannet/scannet_default_val.yaml\n    --num_workers 16\n```\n\nThese scripts will first check each frame in the dataset to make sure it has an existing RGB frame, an existing depth frame (if appropriate for the dataset), and also an existing and valid pose file. It will save these `valid_frames` in a text file in each scan's folder, but if the directory is read only, it will ignore saving a `valid_frames` file and generate tuples anyway.\n\n\n## 📊 Testing and Evaluation\n\n### Depth Evaluation\n\nYou can evaluate our model on the depth benchmark of ScanNetv2 using the following commands:\n\nFor online incremental depth estimation, use this command.\n```shell\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_incremental \\\n    --name doubletake_incremental \\\n    --config_file configs/models/doubletake_model.yaml \\\n    --load_weights_from_checkpoint weights/doubletake_model.ckpt \\\n    --data_config  configs/data/scannet/scannet_default_test.yaml \\\n    --num_workers 12 \\\n    --batch_size 1 \\\n    --fast_cost_volume \\\n    --output_base_path $OUTPUT_DIR \\\n    --load_empty_hint \\\n    --fusion_resolution 0.02 \\\n    --extended_neg_truncation \\\n    --fusion_max_depth 3.5 \\\n    --depth_fuser ours;\n```\n\nFor offline depth estimation, use this command. Note this will generate meshes. \nRemove `--run_fusion` if you don't want to generate meshes for the second pass. \n```shell\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_offline_two_pass \\ \n    --name doubletake_offline \\\n    --config_file configs/models/doubletake_model.yaml \\\n    --load_weights_from_checkpoint weights/doubletake_model.ckpt \\\n    --data_config  configs/data/scannet/scannet_offline_test.yaml \\\n    --num_workers 12 \\\n    --batch_size 4 \\\n    --fast_cost_volume \\\n    --output_base_path $OUTPUT_DIR \\\n    --load_empty_hint \\\n    --fusion_resolution 0.02 \\\n    --extended_neg_truncation \\\n    --fusion_max_depth 3.5 \\\n    --depth_fuser ours;\n```\n\nIf you want to see the performance of a DoubleTake model without hints (no depth hint and online), use:\n```shell\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_no_hint \\ \n    --name doubletake_no_hint \\\n    --config_file configs/models/doubletake_model.yaml \\\n    --load_weights_from_checkpoint weights/doubletake_model.ckpt \\\n    --data_config  configs/data/scannet/scannet_default_test.yaml \\\n    --num_workers 12 \\\n    --batch_size 4 \\\n    --fast_cost_volume \\\n    --output_base_path $OUTPUT_DIR \\\n    --load_empty_hint \\\n    --fusion_resolution 0.02 \\\n    --extended_neg_truncation \\\n    --fusion_max_depth 3.5 \\\n    --depth_fuser ours;\n```\nYou can use `test_no_hint` for the provided SimpleRecon model as well.\n\n**TSDF Fusion**\nTL;DR: use `ours` for ScanNet, 7Scenes, and 3RScan. Anything to do with scores. Use `custom_open3d` for anything else.\n\n`ours` and `custom_open3d` give almost identical scores on ScanNet given the same fusion flags.\n\nTo run TSDF fusion provide the `--run_fusion` flag. This is mandatory for incremental running. You have three choices for \nfusers:\n1) `--depth_fuser ours` (default) will use our fuser, whose meshes are used \n    in most visualizations and for scores. This fuser does not support \n    color. We've provided a custom branch of scikit-image with our custom\n    implementation of `measure.matching_cubes` that allows single walled. We use \n    single walled meshes for evaluation. If this is isn't important to you, you\n    can set the export_single_mesh to `False` for call to `export_mesh` in `test.py`.\n    This fuser's TSDF volume is not sparse, and ScanNet/7Scenes/3RScan meshes will \n    fit in memory on an A100 given we have known mesh bounds.\n3) `--depth_fuser custom_open3d` will use a custom version of the open3d fuser that\n    supports our confidence mapping and confidence sampling. There is currently a\n    memory leak in open3d core. We will post an updated version of the fuser if resolved.\n    This fuser supports a sparse volume and our free space cleanup. This fuser supports color \n    via the `--fuse_color` flag.\n3) `--depth_fuser open3d` will use the default open3d depth fuser. This fuser \n    supports color and you can enable this by using the `--fuse_color` flag. This fuser does \n    not support confidences. This fuser cannot be used as a hint fuser.\n\nFor `ours` and `custom_open3d` you can pass `--extended_neg_truncation` for more complete meshes. \nScores in the paper are computed with this.\n\nFor `custom_open3d` you can pass `--trim_tsdf_using_confience` to remove potential floaters, especially in outdoor scenes.\n\nBy default, depth maps will be clipped to 3.5m for fusion and a tsdf \nresolution of 0.02m\u003csup\u003e3\u003c/sup\u003e will be used, but you can change that by changing both \n`--max_fusion_depth` and `--fusion_resolution`.\n\nHint fusers are locked to 3.0m and 0.04m resolution.  \n\nMeshes will be stored under `results_path/meshes/{scan name}_{mesh params}`.\n\n**Cache depths**\n\nYou can optionally store depths by providing the `--cache_depths` flag. \nThey will be stored at `results_path/depths`.\n\n# Example command to compute scores and cache depths\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_offline_two_pass \\ \n    --name doubletake_offline \\\n    --config_file configs/models/doubletake_model.yaml \\\n    --load_weights_from_checkpoint weights/doubletake_model.ckpt \\\n    --data_config  configs/data/scannet/scannet_offline_test.yaml \\\n    --num_workers 12 \\\n    --batch_size 4 \\\n    --fast_cost_volume \\\n    --output_base_path $OUTPUT_DIR \\\n    --load_empty_hint \\\n    --cache_depths;\n```\n\n**Quick viz**\n\nThere are other scripts for deeper visualizations of output depths and \nfusion, but for quick export of depth map visualization you can use \n`--dump_depth_visualization`. Visualizations will be stored at `results_path/viz/quick_viz/`.\n\n\n```bash\n# Example command to output quick depth visualizations\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_offline_two_pass \\ \n    --name doubletake_offline \\\n    --config_file configs/models/doubletake_model.yaml \\\n    --load_weights_from_checkpoint weights/doubletake_model.ckpt \\\n    --data_config  configs/data/scannet/scannet_offline_test.yaml \\\n    --num_workers 12 \\\n    --batch_size 4 \\\n    --fast_cost_volume \\\n    --output_base_path $OUTPUT_DIR \\\n    --load_empty_hint \\\n    --dump_depth_visualization;\n```\n\n### Revisit Evaluation\n\nYou can evaluate our model in the revist scenario (i.e using the geometry from a previous visit as ‘hints’ for our current depth estimates) on the 3RScan dataset by running the following command:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_revisit \\\n            --config_file configs/models/doubletake_model.yaml \\\n            --load_weights_from_checkpoint ./models/doubletake_model.ckpt \\\n            --data_config configs/data/3rscan/3rscan_test.yaml \\\n            --dataset_path PATH/TO/3RScan_dataset \\\n            --num_workers 12 \\\n            --batch_size 6 \\\n            --output_base_path ./outputs/ \\\n            --depth_hint_aug 0.0 \\\n            --load_empty_hint \\\n            --name final_model_3rscan_revisit \\\n            --run_fusion \\\n            --rotate_images;\n```\n\n## 📊 Mesh Metrics\n\nWe use a mesh evaluation protocol similar to TransformerFusion's, but use occlusion masks that better fit available geometry in the ground truth.\nThe masks can be found [here](https://storage.googleapis.com/niantic-lon-static/research/doubletake/scannet_test_visibility_masks.tar).\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python scripts/evals/mesh_eval.py \\\n    --groundtruth_dir SCANNET_TEST_FOLDER_PATH  \\\n    --prediction_dir ROOT_PRED_DIRECTORY/SCAN_NAME.ply \\\n    --visibility_volume_path UNTARED_VISIBILITY_MASK_PATH \\\n    --wait_for_scan;\n```\n\nUse `--wait_for_scan` if the prediction is still being generated and you want the script to wait until a scan's mesh is available before proceeding.\n\n\n## 📝🧮👩‍💻 Notation for Transformation Matrices\n\n__TL;DR:__ `world_T_cam == world_from_cam`  \nThis repo uses the notation \"cam_T_world\" to denote a transformation from world to camera points (extrinsics). The intention is to make it so that the coordinate frame names would match on either side of the variable when used in multiplication from *right to left*:\n\n    cam_points = cam_T_world @ world_points\n\n`world_T_cam` denotes camera pose (from cam to world coords). `ref_T_src` denotes a transformation from a source to a reference view.  \nFinally this notation allows for representing both rotations and translations such as: `world_R_cam` and `world_t_cam`\n\n## 🗺️ World Coordinate System\n\nThis repo is geared towards ScanNet, so while its functionality should allow for any coordinate system (signaled via input flags), the model weights we provide assume a ScanNet coordinate system. This is important since we include ray information as part of metadata. Other datasets used with these weights should be transformed to the ScanNet system. The dataset classes we include will perform the appropriate transforms. \n\n\n## 🔨💾 Training Data Preperation\nTo train a DoubleTake model you'll need the ScanNetv2 dataset and renders of a mesh from an SR model. We provide these\nrenders.\n\nTo generate mesh renders, you'll first need to run a SimpleRecon model and cache those depths to disk. You should\nuse `scannet_default_train_inference_style.yaml` and `scannet_default_val_inference_style.yaml` for this. These configs run the model on test-style keyframes \non both train and val splits. Something like this:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_no_hint \n    --config_file configs/models/simplerecon_model.yaml\n    --load_weights_from_checkpoint simplerecon_model_weights.ckpt\n    --data_config configs/data/scannet/scannet_default_train_inference_style.yaml  \n    --num_workers 8\n    --batch_size 8\n    --cache_depths \n    --run_fusion \n    --output_base_path YOUR_OUTPUT_DIR\n    --dataset_path SCANNET_DIR;\n```\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m doubletake.test_no_hint \n    --config_file configs/models/simplerecon_model.yaml\n    --load_weights_from_checkpoint simplerecon_model_weights.ckpt\n    --data_config configs/data/scannet/scannet_default_val_inference_style.yaml  \n    --num_workers 8\n    --batch_size 8\n    --cache_depths \n    --run_fusion \n    --output_base_path YOUR_OUTPUT_DIR\n    --dataset_path SCANNET_DIR;\n```\n\nWith these cached depths, you can generate mesh renders for training:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python ./scripts/render_scripts/render_meshes.py \\\n    --data_config configs/data/scannet/scannet_default_train.yaml \\\n    --cached_depth_path YOUR_OUTPUT_DIR/simplerecon_model/scannet/default/depths \\\n    --output_root renders/partial_renders \\\n    --dataset_path SCANNET_DIR \\\n    --batch_size 4 \\\n    --data_to_render both \\\n    --partial 1;\n\nCUDA_VISIBLE_DEVICES=0 python ./scripts/render_scripts/render_meshes.py \\\n    --data_config configs/data/scannet/scannet_default_train.yaml \\\n    --cached_depth_path YOUR_OUTPUT_DIR/simplerecon_model/scannet/default/depths \\\n    --output_root renders/renders \\\n    --dataset_path /mnt/scannet/ \\\n    --batch_size 4 \\\n    --data_to_render both \\\n    --partial 0;\n\nCUDA_VISIBLE_DEVICES=0 python ./scripts/render_scripts/render_meshes.py \\\n    --data_config configs/data/scannet/scannet_default_val.yaml \\\n    --cached_depth_path YOUR_OUTPUT_DIR/simplerecon_model/scannet/default/depths \\\n    --output_root renders/partial_renders \\\n    --dataset_path SCANNET_DIR \\\n    --batch_size 4 \\\n    --data_to_render both \\\n    --partial 1;\n\nCUDA_VISIBLE_DEVICES=0 python ./scripts/render_scripts/render_meshes.py \\\n    --data_config configs/data/scannet/scannet_default_val.yaml \\\n    --cached_depth_path YOUR_OUTPUT_DIR/simplerecon_model/scannet/default/depths \\\n    --output_root renders/renders \\\n    --dataset_path /mnt/scannet/ \\\n    --batch_size 4 \\\n    --data_to_render both \\\n    --partial 0;\n```\n\n## 🙏 Acknowledgements\n\nThe tuple generation scripts make heavy use of a modified version of DeepVideoMVS's [Keyframe buffer](https://github.com/ardaduz/deep-video-mvs/blob/master/dvmvs/keyframe_buffer.py) (thanks Arda and co!).\n\nWe'd like to thank the Niantic Raptor R\\\u0026D infrastructure team - Saki Shinoda, Jakub Powierza, and Stanimir Vichev - for their valuable infrastructure support.\n\n## 📜 BibTeX\n\nIf you find our work useful in your research please consider citing our paper:\n\n```\n@inproceedings{sayed2022simplerecon,\n  title={DoubleTake: Geometry Guided Depth Estimation},\n  author={Sayed, Mohamed and Aleotti, Filippo and Watson, Jamie and Qureshi, Zawar and Garcia-Hernando, Guillermo and Brostow, Gabriel and Vicente, Sara and Firman, Michael},\n  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},\n  year={2024},\n}\n```\n\n## 👩‍⚖️ License\n\nCopyright © Niantic, Inc. 2024. Patent Pending.\nAll rights reserved.\nPlease see the [license file](LICENSE) for terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fdoubletake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnianticlabs%2Fdoubletake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fdoubletake/lists"}