{"id":20516355,"url":"https://github.com/nianticlabs/airplanes","last_synced_at":"2025-09-14T19:58:16.254Z","repository":{"id":244397874,"uuid":"814200404","full_name":"nianticlabs/airplanes","owner":"nianticlabs","description":"[CVPR 2024] AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings","archived":false,"fork":false,"pushed_at":"2024-06-14T12:15:29.000Z","size":8322,"stargazers_count":54,"open_issues_count":3,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-10T19:37:20.745Z","etag":null,"topics":["computer-vision","cvpr2024","machine-learning","plane-detection"],"latest_commit_sha":null,"homepage":"https://nianticlabs.github.io/airplanes/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nianticlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-12T14:27:57.000Z","updated_at":"2024-12-04T14:43:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"59d42530-bd1b-4d70-9d0c-419a8af51ca3","html_url":"https://github.com/nianticlabs/airplanes","commit_stats":null,"previous_names":["nianticlabs/airplanes"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fairplanes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fairplanes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fairplanes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nianticlabs%2Fairplanes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nianticlabs","download_url":"https://codeload.github.com/nianticlabs/airplanes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230438191,"owners_count":18225871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cvpr2024","machine-learning","plane-detection"],"created_at":"2024-11-15T21:28:23.966Z","updated_at":"2024-12-19T13:09:00.935Z","avatar_url":"https://github.com/nianticlabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings\n\nThis is the reference PyTorch implementation for training and testing AirPlanes, the method described in the following CVPR 2024 publication:\n\u003e **AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings**\n\n\u003e [Jamie Watson](https://scholar.google.com/citations?view_op=list_works\u0026hl=en\u0026user=5pC7fw8AAAAJ), [Filippo Aleotti](https://filippoaleotti.github.io/website/), [Mohamed Sayed](https://masayed.com/), [Zawar Qureshi](https://qureshizawar.github.io/), [Oisin Mac Aodha](https://www.homepages.inf.ed.ac.uk/omacaod), [Gabriel Brostow](http://www0.cs.ucl.ac.uk/staff/g.brostow/), [Michael Firman](http://www.michaelfirman.co.uk/), and [Sara Vicente](https://scholar.google.co.uk/citations?user=7wWsNNcAAAAJ\u0026hl=en)\n\n\u003e [Project Page](https://nianticlabs.github.io/airplanes/), [Paper (pdf)](https://nianticlabs.github.io/airplanes/resources/airplanes_cvpr2024.pdf), [Supplementary Material (pdf)](https://nianticlabs.github.io/airplanes/resources/airplanes_cvpr2024_supp.pdf), [Video](https://www.youtube.com/watch?v=HnGAORJ8JEI)\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"media/teaser.jpg\" alt=\"teaser\" width=\"720\" /\u003e\n\u003c/p\u003e\n\n\n\n\nThis code is for non-commercial use only; please see the license file for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below. Thanks!\n\n## Table of Contents\n\n  * [🗺️ Overview](#🗺️-overview)\n  * [⚙️ Setup](#⚙️-setup)\n  * [📦 Pretrained models](#📦-pretrained-models)\n  * [🏃 Running out of the box!](#🏃-running-out-of-the-box)\n  * [🗃️ Data preparation](#🗃️-data-preparation)\n  * [🧪 Running and Evaluating on ScanNetV2](#🧪-running-and-evaluating-on-scannetv2)\n  * [⏳ Optional: Training the 2D Network on ScanNetV2](#⏳-optional-training-the-2d-network-on-scannetv2)\n  * [🙏 Acknowledgements](#🙏-acknowledgements)\n  * [📜 BibTeX](#📜-bibtex)\n  * [👩‍⚖️ License](#👩‍⚖️-license)\n\n## 🗺️ Overview\n\nAirPlanes takes as input posed RGB images, and outputs a 3D planar representation of the scene.\n\n### Pipeline overview\nOur pipeline consists of the following steps:\n- **2D Network inference**, where the network predicts for each image:\n  * A *depth map*.\n  * A per-pixel *planar probability score*, indicating if a pixel is part of a planar or of a non-planar region.\n  * A per-pixel *planar embedding*, where pixels belonging to the same plane should have a similar embedding vector.\n\n  This network extends [SimpleRecon](https://nianticlabs.github.io/simplerecon/), a popular network for depth estimation from posed images. \n- **Scene Optimisation**, where a scene-specific MLP is trained for each scene. This step tackles the main limitation of the previous step: planar embeddings predicted from single images are not multi-view consistent.\nThe MLP takes as input the coordinates of a 3D point (x,y,z) and predicts an embedding vector. A `push-pull` loss enforces that embeddings that are similar/different in an image must also be similar/different in 3D. After training, each per-scene MLP can predict a multi-view consistent embedding array for any 3D point in the scene.\n- **3D Plane Clustering**, where a clustering algorithm is used to group mesh vertices into planes. Our custom sequential RANSAC implementation uses the 3D embeddings from the MLP to identify the inilier set of each 3D plane hypothesis. \n\n\n## ⚙️ Setup\n\nYou can install a new environment using mamba. To install mamba:\n\n```shell\nmake install-mamba\n```\n\nTo create the environment:\n```shell\nmake create-mamba-env\n```\n\nNext, please activate the environment and install the project as a module:\n```shell\nconda activate airplanes\npip install -e .\n```\n\nWe ran our experiments with PyTorch 2.0.1, CUDA 11.7 and Python 3.9.7.\n\n\n## 📦 Pretrained models\nOur 2D network pretrained on ScanNetv2 is available [online here](https://storage.googleapis.com/niantic-lon-static/research/airplanes/airplanes_model.ckpt).\n\nWe suggest you place the downloaded model in a new `checkpoints` folder.\n\n## 🏃 Running out of the box!\n\nWe made two captures available to make it easier to quickly try the code.\n\nFirst, [download these scans](https://storage.googleapis.com/niantic-lon-static/research/airplanes/vdr.zip), place them in a new folder called `arbitrary_captures` and unzip them.\n\nThen, run the full inference pipeline with our model:\n```shell\necho \"📸 Preparing keyframes for the sequences\"\npython -m scripts.inference_on_captures prepare-captures [--captures /path/to/captures]\n\necho \"🚀 Generating TSDFs and 2D embeddings using our model for each scan\"\npython -m scripts.inference_on_captures model-inference [--checkpoint /path/to/model/checkpoint --output-dir  /path/to/predicted/meshes]\n\necho \"🚀 Training 3D embeddings (MLPs)...\"\npython -m scripts.inference_on_captures train-embeddings [--pred-root  /path/to/predicted/meshes --captures /path/to/captures]\n\necho \"🚀 Running RANSAC!\"\npython -m scripts.inference_on_captures run-ransac-ours [--pred-root  /path/to/predicted/meshes --dest-planar-meshes /path/to/planar/meshes --captures /path/to/captures]\n```\n\nThis script will infer per-image embeddings and scene geometry using the 2D network, optimise 3D embeddings and, finally, fit planar meshes using 3D embeddings in the RANSAC loop.\n\n\n## 🗃️ Data preparation\nThis section explains how to prepare ScanNetv2 for training and testing: in fact, ScanNetv2 only provides meshes with semantic labels, but not with planes.\nFollowing previous works, we process the dataset extracting planar information with RANSAC.\n\n🕒 Please note that the data preparation scripts will take a few hours to run.\n\n\u003cdetails\u003e\n\u003csummary\u003eScanNetv2 download (training \u0026 testing)\u003c/summary\u003e\n\n  Please follow instructions reported in [SimpleRecon](https://github.com/nianticlabs/simplerecon/tree/main/data_scripts/scannet_wrangling_scripts)\n\nYou should get at the end a ScanNetv2 root folder that looks like:\n```shell\nSCANNET_ROOT\n├── scans_test (test scans)\n│   ├── scene0707\n│   │   ├── scene0707_00_vh_clean_2.ply (gt mesh)\n│   │   ├── sensor_data\n│   │   │   ├── frame-000261.pose.txt\n│   │   │   ├── frame-000261.color.jpg \n│   │   │   └── frame-000261.depth.png (full res depth, stored scale *1000)\n│   │   ├── scene0707.txt (scan metadata and image sizes)\n│   │   └── intrinsic\n│   │       ├── intrinsic_depth.txt\n│   │       └── intrinsic_color.txt\n│   └── ...\n└── scans (val and train scans)\n    ├── scene0000_00\n    │   └── (see above)\n    ├── scene0000_01\n    └── ....\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eGround-truth generation from ScanNetv2 meshes (training \u0026 testing)\u003c/summary\u003e\n  \n  Our ground-truth generation script is adapted from the one released by [PlaneRCNN](https://github.com/NVlabs/planercnn/blob/master/data_prep/parse.py).\n\nRun:\n```shell\npython -m scripts.run_scannet_processing --scannet path/to/ScanNetv2 --output-dir destination/path\n```\n\n*NOTE:* PlaneRCNN saves planes as `(nx*d,ny*d,nz*d)`, while our project represents planes as `(nx/d,ny/d,nz/d)`.\nWe will manage this difference later in our dataloaders.\n\nThe code will generate in the root folder a structure as:\n\n```\nROOT_FOLDER\n├── scene0000_00\n│   ├── planes.npy  (array with planes parameters. Each plane is encoded as (nx*d,ny*d,nz*d))\n│   └── planes.ply  (unsquashed mesh of the scene, where each colour encodes a plane ID)\n├── scene0000_01\n│   └── annotation\n│       ├── planes.npy\n│       └── planes.ply\n└── ...\n```\n\nUnsquashed means that the geometry of the mesh is unchanged with respect to the ground truth mesh provided by ScanNetv2, i.e. the mesh is not composed of planes. The colour of each vertex in the mesh encodes the plane ID.\n\nRGB channels encode the plane ids, which can be obtained using:\n```python\n    plane_ids = plane_mesh.visual.vertex_colors.copy().astype(\"int32\")\n    plane_ids = (\n        plane_ids[:, 0] * 256 * 256 + plane_ids[:, 1] * 256 + plane_ids[:, 2]\n    ) // 100 - 1\n```\n\nThe alpha channel stores `0` if the vertex is an edge between two planes.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eRender ground truth plane images (training \u0026 testing)\u003c/summary\u003e\n\n  At this point, we have RGB images, depth maps and plane-annotated ground-truth meshes. We are missing images annotated with plane IDs.\n  To get those images, we have to render ground-truth meshes at different camera locations.\n\n```shell\npython -m scripts.run_rendering render-train-split --scannet /path/to/scannet --planar-meshes /path/to/mesh/with/planes/ids --output-dir /path/to/render/folder\n\npython -m scripts.run_rendering render-val-split --scannet /path/to/scannet --planar-meshes /path/to/mesh/with/planes/ids --output-dir /path/to/render/folder\n\npython -m scripts.run_rendering render-test-split --scannet /path/to/scannet --planar-meshes /path/to/mesh/with/planes/ids --output-dir /path/to/render/folder\n```\n\nThe script generates in the `DEST` folder a structure as:\n```shell\nDEST\n├── scene0000_00\n│   └── frames\n│       ├── 000000_planes.png   (image where pixels encode planeIDs)\n│       ├── 000000_depth.npy    (depth map rendered from the mesh)\n│       ├── 000001_planes.png\n│       ├── 000001_depth.png\n│       └── ...\n├── scene0000_01\n│   └── frames\n│       ├── 000000_planes.png\n│       ├── 000000_depth.npy\n│       ├── 000001_planes.png\n│       ├── 000001_depth.npy\n│       └── ...\n└── ...   \n```\n\nAs before, we can get plane Ids from colours in `planes.png` as:\n\n```python\nplane_ids = your_custom_image_reader(\"000000_planes.png\")\nplane_ids = (plane_ids[:, :, 0] * 256 * 256 + plane_ids[:, :, 1] * 256 + plane_ids[:, :, 2]) // 100 - 1\n```\n\nNote that rendered depth maps are not needed for training and validation. For this reason, we do not save depth maps for these two splits.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eGenerate visibility volumes (testing)\u003c/summary\u003e\n\n  We rely on visibility volumes for benchmarking. This is because the ground-truth mesh in ScanNetv2 has been obtained by post-processing sensor data. This mask allows to remove voxels that are not visible in the scan. Please see more details in the paper and supplement.\n\nThe following script can generate such volumes using depth renders and ScanNetv2 data.\n\n```shell\npython -m scripts.run_creation_of_visibility_volumes --scannet /path/to/scannet --output-dir /path/to/planar/meshes --renders /path/to/render/folder\n```\n\nWhere `--renders` points to the directory with the ground truth plane images created in the previous step.\nThe script adds the visibility volumes to `output-dir`, so that the final structure is:\n\n```shell\n├── scenexxxx_yy\n│   ├── mesh_with_planes.ply\n│   ├── scenexxxx_yy_planes.npy\n│   ├── scenexxxx_yy_visibility_pcd.ply (a point cloud for debugging purposes only)\n│   ├── scenexxxx_yy_volume.npz (visibility volume!)\n└── ...  \n```\n**NOTE:** our benchmarks expect to find volumes in the same folder of ground-truth meshes with planes. Please set `output-dir` accordingly.\n\nAt the end of the process you can inspect the sampled visibility volume, saved as a point cloud. It should look like the following figure:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"media/visibility_pcd.jpg\" alt=\"sampled visibility volume\" width=\"720\" /\u003e\n\u003c/p\u003e\n\n\u003c/details\u003e\n\n\n## 🧪 Running and Evaluating on ScanNetV2\n\n🚦 __We need ground-truth meshes with planes and visibility volumes. Please run all the data preparation steps if you haven't already. 🙏__\n\nIn order to evaluate AirPlanes, first we have to run the inference of our 2D network and cache intermediate results. Then, we use these outputs to train our 3D embeddings MLPs, one per scene. This embeddings can be used in the RANSAC loop to extract a planar representation for the scene (saved as a planar mesh, one per scene). Once we have all the planar meshes we can evaluate them using our meshing, segmentation and planar benchmarks.\n\nFirst, we have to update the testing configuration `configs/data/scannet_test.yaml`. Specifically:\n- `dataset_path`: path to ScanNetv2 data\n\nThe following script summarises all the steps. You can run (and customise) these steps individually if needed.\n\n```shell\necho \"🚀 Generating TSDFs and 2D embeddings using our model for each scan\"\npython -m scripts.evaluation model-inference [--output-dir /path/to/results/folder]\n\necho \"🚀 Training 3D embeddings (MLPs)...\"\npython -m scripts.evaluation train-embeddings [--pred-root /path/to/predicted/meshes]\n\necho \"🚀 Running RANSAC!\"\npython -m scripts.evaluation run-ransac-ours [--pred-root /path/to/predicted/meshes --dest-planar-meshes /path/to/planar/meshes]\n\necho \"🧪 Benchmarking Geometry\"\npython -m scripts.evaluation meshing-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth --output-score-dir /path/to/scores/folder]\n\necho \"🧪 Benchmarking Segmentation\"\npython -m scripts.evaluation segmentation-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth --output-score-dir /path/to/scores/folder]\n\necho \"🧪 Benchmarking Planar\"\npython -m scripts.evaluation planar-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth --output-score-dir /path/to/scores/folder]\n```\n\nSimilarly, you can evaluate the baseline model, i.e. SimpleRecon without embeddings. You can use the meshes saved by our model. The baseline doesn't need per-scene optimisation ⏩.\n\n```shell\necho \"🚀 Running RANSAC!\"\npython -m scripts.evaluation run-ransac-baseline [--pred-root /path/to/predicted/meshes --dest-planar-meshes /path/to/planar/meshes]\n\necho \"🧪 Benchmarking Geometry\"\npython -m scripts.evaluation meshing-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth]\n...\n```\n\n## ⏳ Optional: Training the 2D Network on ScanNetV2\n\n**We recommend using our pretrained 2D network when evaluating our method.**\n\nTo retrain the 2D network, please first [download the initial weights from here](https://storage.googleapis.com/niantic-lon-static/research/airplanes/simplerecon_starting_weights.ckpt) and place it in the \"checkpoints\" folder. You also need to generate the training data, following the instructions in [data preparation](#🗃️-data-preparation).\n\nThis network is in charge of estimating depth maps, 2D plane embeddings and per-pixel plane probabilities from posed images.\n\nFirst, we have to update the training configuration `configs/data/scannet_train.yaml`. Specifically:\n- `dataset_path`: path to ScanNetv2 data\n- `planes_path`: path to meshes with planes IDs generated during the step `Ground-truth generation from ScanNetv2 meshes`\n- `renders_path`: path to the training renders generated during the step `Render ground truth plane images (training \u0026 testing)`\n\nWhen the config is ready, please run:\n```shell\npython -m airplanes.train_2D_network\n```\n\n\n## 🙏 Acknowledgements\n\nWe thank PlaneRCNN authors for sharing their data processing code.\n\n## 📜 BibTeX\nIf you find our work useful in your research, please consider citing our paper:\n\n```\n@inproceedings{watson2024airplanes,\n  title={AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings},\n  author={Watson, Jamie and Aleotti, Filippo and Sayed, Mohamed and Qureshi, Zawar and Brostow and Mac Aodha, Oisin and Firman, Michael and Vicente, Sara},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year={2024},\n}\n```\n\n## 👩‍⚖️ License\nCopyright © Niantic, Inc. 2024. Patent Pending.\nAll rights reserved.\nPlease see the [license file](LICENSE) for terms.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fairplanes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnianticlabs%2Fairplanes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnianticlabs%2Fairplanes/lists"}