{"id":18930926,"url":"https://github.com/pointcept/openins3d","last_synced_at":"2025-04-05T02:03:43.634Z","repository":{"id":192525269,"uuid":"686811735","full_name":"Pointcept/OpenIns3D","owner":"Pointcept","description":"[ECCV'24] OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation","archived":false,"fork":false,"pushed_at":"2024-10-19T00:40:59.000Z","size":70682,"stargazers_count":183,"open_issues_count":3,"forks_count":9,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-29T01:02:54.668Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Pointcept.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-04T01:47:45.000Z","updated_at":"2025-03-26T03:33:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"28def93c-9186-4c55-a4c3-541c7a8b0ecb","html_url":"https://github.com/Pointcept/OpenIns3D","commit_stats":null,"previous_names":["pointcept/openins3d"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FOpenIns3D","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FOpenIns3D/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FOpenIns3D/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Pointcept%2FOpenIns3D/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Pointcept","download_url":"https://codeload.github.com/Pointcept/OpenIns3D/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247276159,"owners_count":20912288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T11:39:37.550Z","updated_at":"2025-04-05T02:03:38.621Z","avatar_url":"https://github.com/Pointcept.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\n\n  \u003ch1 align=\"center\"\u003e OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://zheninghuang.github.io/\"\u003e\u003cstrong\u003eZhening Huang\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://xywu.me\"\u003e\u003cstrong\u003eXiaoyang Wu\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://xavierchen34.github.io/\"\u003e\u003cstrong\u003eXi Chen\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://hszhao.github.io\"\u003e\u003cstrong\u003eHengshuang Zhao\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://sites.google.com/site/indexlzhu/home\"\u003e\u003cstrong\u003eLei Zhu\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"http://sigproc.eng.cam.ac.uk/Main/JL\"\u003e\u003cstrong\u003eJoan Lasenby\u003c/strong\u003e\u003c/a\u003e\n  \u003c/p\u003e\n  \n  \u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2309.00616\"\u003ePaper\u003c/a\u003e | \u003ca href=\"https://www.youtube.com/watch?v=kwlMJkEfTyY\"\u003eVideo\u003c/a\u003e | \u003ca href=\"https://zheninghuang.github.io/OpenIns3D/\"\u003eProject Page\u003c/a\u003e\u003c/h3\u003e\n  \u003cdiv align=\"center\"\u003e\u003c/div\u003e\n\u003c/p\u003e\n\n   \n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openins3d-snap-and-lookup-for-3d-open/zero-shot-3d-point-cloud-classification-on-1)](https://paperswithcode.com/sota/zero-shot-3d-point-cloud-classification-on-1?p=openins3d-snap-and-lookup-for-3d-open) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openins3d-snap-and-lookup-for-3d-open/3d-open-vocabulary-instance-segmentation-on-3)](https://paperswithcode.com/sota/3d-open-vocabulary-instance-segmentation-on-3?p=openins3d-snap-and-lookup-for-3d-open)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openins3d-snap-and-lookup-for-3d-open/3d-open-vocabulary-object-detection-on-1)](https://paperswithcode.com/sota/3d-open-vocabulary-object-detection-on-1?p=openins3d-snap-and-lookup-for-3d-open)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openins3d-snap-and-lookup-for-3d-open/3d-open-vocabulary-instance-segmentation-on-1)](https://paperswithcode.com/sota/3d-open-vocabulary-instance-segmentation-on-1?p=openins3d-snap-and-lookup-for-3d-open)\n\u003cp align=\"center\"\u003e\n\u003cstrong\u003e TL;DR: OpenIns3D proposes a \"mask-snap-lookup\" scheme to achieve 2D-input-free 3D open-world scene understanding, which attains SOTA performance across datasets, even with fewer input prerequisites. 🚀✨\n\u003c/p\u003e\n\n\n\u003ctable\u003e\n\u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"assets/demo_1.gif\" width=\"100%\"/\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"assets/demo_2.gif\" width=\"100%\"/\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"assets/demo_3.gif\" width=\"100%\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd align='center' width='24%'\u003edevice to watch BBC news\u003c/td\u003e\n    \u003ctd align='center' width='24%'\u003efurniture that is capable of producing music\u003c/td\u003e\n    \u003ctd align='center' width='24%'\u003eMa Long's domain of excellence\u003c/td\u003e\n\u003ctr\u003e\n\u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"assets/demo_4.gif\" width=\"100%\"/\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"assets/demo_5.gif\" width=\"100%\"/\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"assets/demo_6.gif\" width=\"100%\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n    \u003ctd align='center' width='24%'\u003emost comfortable area to sit in the room\u003c/td\u003e\n    \u003ctd align='center' width='24%'\u003epenciling down ideas during brainstorming\u003c/td\u003e\n    \u003ctd align='center' width='24%'\u003efurniture offers recreational enjoyment with friends\u003c/td\u003e\n\u003ctr\u003e\n\u003c/table\u003e\n\n\n\u003cbr\u003e\n\n\u003c!-- # OpenIns3D pipeline\n\n\u003cimg src=\"assets/general_pipeline_updated.png\" width=\"100%\"/\u003e --\u003e\n\n\n# Highlights\n- *2 Aug, 2024*: Major update 🔥: We have released optimized and easy-to-use code for OpenIns3D to [reproduce all the results in the paper](#Reproducing-Results) and [demo](#Zero-Shot-Inference-with-Single-Vocabulary).\n- *1 Jul, 2024*: OpenIns3D has been accepted at ECCV 2024 🎉. We will release more code on various experiments soon.\n- *6 Jan, 2024*: We have released a major revision, incorporating S3DIS and ScanNet benchmark code. Try out the latest version.\n- *31 Dec, 2023* We release the batch inference code on ScanNet.\n- *31 Dec, 2023* We release the zero-shot inference code， test it on your own data!\n- *Sep, 2023*: **OpenIns3D** is released on [arXiv](https://arxiv.org/abs/2309.00616), alongside with [explanatory video](https://www.youtube.com/watch?v=kwlMJkEfTyY), [project page](https://zheninghuang.github.io/OpenIns3D/). We will release the code at end of this year.\n\n# Overview\n\n- [Installation](#installation)\n- [Reproducing All Benchmarks Results](#reproducing-results)\n- [Replacing Snap with RGBD](#Replacing-Snap-with-RGBD)\n- [Zero-Shot Inference with Single Vocabulary](#Zero-Shot-Inference-with-Single-Vocabulary)\n- [Zero-Shot Inference with Multiple Vocabulary](#Zero-Shot-Inference-with-Multiple-Vocabulary)\n- [Citation](#citation)\n- [Acknowledgement](#acknowledgement)\n\n# Installation\n\nPlease check the [installation file](installation.md) to install OpenIns3D for:\n1. [reproducing all results in the paper](#reproducing-results),\n2. [testing on your own dataset](#Zero-Shot-Inference-with-Multiple-Vocabulary)\n\n---\n\n# Reproducing Results\n\n### 🗂️ Replica\n\n**🔧 Data Preparation**: \n1. Execute the following command to set up the Replica dataset, including scene `.ply` files, predicted masks, and ground truth:\n```sh\nsh scripts/prepare_replica.sh\nsh scripts/prepare_yoloworld.sh \n```\n\n**📊 Open Vocabulary Instance Segmentation**:\n```sh\npython openins3d/main.py --dataset replica --task OVIS --detector yoloworld\n```\n**📈 Results Log**: \n| Task                        |  AP  | AP50 | AP25 | Log |\n|-----------------------------|:----:|:----:|:----:|:----:|\n| Replica OVIS (in paper)      | 13.6 | 18.0 | 19.7 |      |\n| Replica OVIS (this Code)     | 15.4 | 19.5 | 25.2 | [log](assets/logs/log_replica_ovis.txt)  |\n---\n\n### 🗂️ ScanNet\n\n**🔧 Data Preparation**: \n1. Make sure you have completed the form on [ScanNet](http://www.scan-net.org/) to obtain access.\n2. Place the `download-scannet.py` script into the `scripts` directory.\n3. Run the following command to download all `_vh_clean_2.ply` files for validation sets, as well as instance ground truth, GT-masks, and detected masks:\n\n```sh\nsh scripts/prepare_scannet.sh\n```\n\n**📊 Open Vocabulary Object Recognition**: \n```sh\npython openins3d/main.py --dataset scannet --task OVOR --detector odise\n```\n\n**📈 Results Log**: \n| Task                        | Top-1 Accuracy | Log |\n|-----------------------------|:--------------:|:----:|\n| ScanNet_OVOR (in paper)      |     60.4       |      |\n| ScanNet_OVOR (this Code)     |     64.2       | [log](assets/logs/log_scannet_classfication.txt)  |\n\n**📊 Open Vocabulary Object Detection**:\n```sh\npython openins3d/main.py --dataset scannet --task OVOD --detector odise\n```\n\n**📊 Open Vocabulary Instance Segmentation**:\n```sh\npython openins3d/main.py --dataset scannet --task OVIS --detector odise\n```\n**📈 Results Log**: \n| Task                        |  AP  | AP50 | AP25 | Log |\n|-----------------------------|:----:|:----:|:----:|:----:|\n| ScanNet_OVOD (in paper)      | 17.8 | 28.3 | 36.0 |      |\n| ScanNet_OVOD (this Code)     | 20.7 | 29.9 | 39.7 | [log](assets/logs/log_scannet_ovod.txt)  |\n| ScanNet_OVIS (in paper)      | 19.9 | 28.7 | 38.9 |      |\n| ScanNet_OVIS (this Code)     | 23.3 | 34.6 | 42.6 | [log](assets/logs/log_scannet_ovis.txt)  |\n\n---\n\n### 🗂️ S3DIS\n\n**🔧 Data Preparation**: \n1. Make sure you have completed the form on [S3DIS](https://redivis.com/datasets/9q3m-9w5pa1a2h/files) to obtain access. \n2. Then, run the following command to acquire scene `.ply` files, predicted masks, and ground truth:\n```sh\nsh scripts/prepare_s3dis.sh\n```\n\n**📊 Open Vocabulary Instance Segmentation**:\n```sh\npython openins3d/main.py --dataset s3dis --task OVIS --detector odise\n```\n**📈 Results Log**: \n| Task                        |  AP  | AP50 | AP25 | Log |\n|-----------------------------|:----:|:----:|:----:|:----:|\n| S3DIS OVIS (in paper)        | 21.1 | 28.3 | 29.5 |      |\n| S3DIS OVIS (this Code)       | 22.9 | 29.0 | 31.4 | [log](assets/logs/log_s3dis_ovis.txt)  |\n\n---\n\n### 🗂️ STPLS3D\n\n**🔧 Data Preparation**: \n1. Make sure you have completed the form [STPLS3D](https://www.stpls3d.com/data) to gain access. \n2. Then, run the following command to obtain scene `.ply` files, predicted masks, and ground truth:\n```sh\nsh scripts/prepare_stpls3d.sh\n```\n\n**📊 Open Vocabulary Instance Segmentation**:\n```sh\npython openins3d/main.py --dataset stpls3d --task OVIS --detector odise\n```\n**📈 Results Log**: \n| Task                        |   AP   | AP50  | AP25  | Log |\n|-----------------------------|:------:|:-----:|:-----:|:----:|\n| STPLS3D OVIS (in paper)      | 11.4  | 14.2 | 17.2 |      |\n| STPLS3D OVIS (this Code)     |  15.3      | 17.3   | 17.4      | [log](assets/logs/log_stpls3d_ovis.txt)  |\n\n---\n\n\n# Replacing Snap with RGBD\n\nWe also evaluate the performance of OpenIns3D when the Snap module is replaced with original RGBD images while keeping the other design intact.\n\n### 🗂️ Replica\n\n**🔧 Data Preparation**  \n\n1. Download the Replica dataset and RGBD images:\n\n```sh\nsh scripts/prepare_replica.sh\nsh scripts/prepare_replica2d.sh\nsh scripts/prepare_yoloworld.sh \n```\n\n\n**📊 Open Vocabulary Instance Segmentation**\n\n```sh\npython openins3d/main.py --dataset replica --task OVIS --detector yoloworld --use_2d true\n```\n\n**📈 Results Log**  \n| Task           |  AP  | AP50 | AP25 | Log                                      |\n|----------------|:----:|:----:|:----:|:----------------------------------------:|\n| OpenMask3D     | 13.1 | 18.4 | 24.2 |                                          |\n| Open3DIS       | 18.5 | 24.5 | 28.2 |                                          |\n| OpenIns3D      | 21.1 | 26.2 | 30.6 | [log](assets/logs/log_replica_use2d_ovis.txt) |\n\n\n# Zero-Shot Inference with Single Vocabulary\n\nWe demonstrate how to perform single-vocabulary instance segmentation similar to the teaser image in the paper. The key new feature is the introduction of a CLIP ranking and filtering module to reduce false-positive results. (Works best with RGBD but is also fine with SNAP.)\n\nQuick Start: \n\n1. 📥 **Download the demo dataset** by running:\n\n   ```sh\n   sh scripts/prepare_demo_single.sh \n   ```\n\n2. 🚀 **Run the model** by executing:\n\n   ```sh\n   python zero_shot_single_voc.py\n   ```\n\n\nYou can now view results like teaser images in 2D or 3D.\n\n\n---\n\n# Zero-Shot Inference with Multiple Vocabulary\n\nℹ️ **Note**: Ensure you have installed the mask module according to the installation guide, as it is not required for reproducing results.\n\nTo perform zero-shot scene understanding:\n\n1. 📥 **Download** the `scannet200_val.ckpt` checkpoint from [this link](https://drive.google.com/file/d/1emtZ9xCiCuXtkcGO3iIzIRzcmZAFfI_B/view) and place it in the `third_party/` directory.\n\n2. 🚀 **Run the model** by executing `python zero_shot.py` and specify:\n   - 🗂️ `pcd_path`: The path to the colored point cloud file.\n   - 📝 `vocab`: A list of vocabulary terms to search for.\n\nYou can also use the following script to automatically set up the `scannet200_val.ckpt` checkpoint and download some sample 3D scans:\n\n```bash\nsh scripts/prepare_zero_shot.sh\n```\n\n### 🚀 Running a Zero-Shot Inference\n\nTo perform zero-shot inference using the sample dataset (default with Replica vocabulary), run:\n\n```bash\npython zero_shot_multi_vocs.py --pcd_path data/demo_scenes/demo_scene_1.ply\n```\n\n📂 **Results** are saved under `output/snap_demo/demo_scene_1_vis/image`.\n\nTo use a different 2D detector (🔍 **ODISE works better on pcd-rendered images**):\n\n```bash\npython zero_shot_multi_vocs.py --pcd_path data/demo_scenes/demo_scene_2.ply --detector yoloworld\n```\n\n📝 **Custom Vocabulary**: If you want to specify your own vocabulary list, add it with the `--vocab` flag as follows:\n\n```bash\npython zero_shot_multi_vocs.py \\\n--pcd_path 'data/demo_scenes/demo_scene_4.ply' \\\n--vocab \"drawers\" \"lower table\"\n```\n\n\n# Citation\n\nIf you find OpenIns3D and this codebase useful for your research, please cite our work as a form of encouragement. 😊\n\n```\n@article{huang2024openins3d,\n      title={OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation}, \n      author={Zhening Huang and Xiaoyang Wu and Xi Chen and Hengshuang Zhao and Lei Zhu and Joan Lasenby},\n      journal={European Conference on Computer Vision},\n      year={2024}\n    }\n\n```\n\n# Acknowledgement\n\nThe mask proposal model is modified from [Mask3D](https://jonasschult.github.io/Mask3D/), and we heavily used the [easy setup](https://github.com/cvg/Mask3D) version of it for MPM. Thanks again for the great work! 🙌 We also drew inspiration from [LAR](https://github.com/eslambakr/LAR-Look-Around-and-Refer) and [ContrastiveSceneContexts](https://github.com/facebookresearch/ContrastiveSceneContexts) when developing the code. 🚀\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpointcept%2Fopenins3d","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpointcept%2Fopenins3d","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpointcept%2Fopenins3d/lists"}