{"id":13442005,"url":"https://github.com/pengsongyou/openscene","last_synced_at":"2025-03-20T13:31:53.579Z","repository":{"id":150298522,"uuid":"615653374","full_name":"pengsongyou/openscene","owner":"pengsongyou","description":"[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies","archived":false,"fork":false,"pushed_at":"2023-10-27T15:57:18.000Z","size":21173,"stargazers_count":653,"open_issues_count":11,"forks_count":46,"subscribers_count":19,"default_branch":"main","last_synced_at":"2024-10-28T05:12:31.507Z","etag":null,"topics":["3d-scene-understanding","clip","cvpr2023","llm","matterport3d","nuscenes","point-cloud-segmentation","point-clouds","scannet","semantic-segmentation"],"latest_commit_sha":null,"homepage":"https://pengsongyou.github.io/openscene","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pengsongyou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-18T09:29:23.000Z","updated_at":"2024-10-24T06:14:23.000Z","dependencies_parsed_at":"2023-04-14T21:01:12.687Z","dependency_job_id":"0efc8a98-2182-43c0-ab61-ac20ae270063","html_url":"https://github.com/pengsongyou/openscene","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pengsongyou%2Fopenscene","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pengsongyou%2Fopenscene/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pengsongyou%2Fopenscene/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pengsongyou%2Fopenscene/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pengsongyou","download_url":"https://codeload.github.com/pengsongyou/openscene/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244619148,"owners_count":20482369,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-scene-understanding","clip","cvpr2023","llm","matterport3d","nuscenes","point-cloud-segmentation","point-clouds","scannet","semantic-segmentation"],"created_at":"2024-07-31T03:01:40.570Z","updated_at":"2025-03-20T13:31:53.573Z","avatar_url":"https://github.com/pengsongyou.png","language":"Python","funding_links":[],"categories":["Python","Scene Understanding \u0026 Semantic Mapping"],"sub_categories":["3D Scene Graphs"],"readme":"\u003c!-- PROJECT LOGO --\u003e\n\n\u003cp align=\"center\"\u003e\n\n  \u003ch1 align=\"center\"\u003e\u003cimg src=\"https://pengsongyou.github.io/media/openscene/logo.png\" width=\"40\"\u003eOpenScene: 3D Scene Understanding with Open Vocabularies\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://pengsongyou.github.io\"\u003e\u003cstrong\u003eSongyou Peng\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://www.kylegenova.com/\"\u003e\u003cstrong\u003eKyle Genova\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://www.maxjiang.ml/\"\u003e\u003cstrong\u003eChiyu \"Max\" Jiang\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://taiya.github.io/\"\u003e\u003cstrong\u003eAndrea Tagliasacchi\u003c/strong\u003e\u003c/a\u003e\n    \u003cbr\u003e\n    \u003ca href=\"https://people.inf.ethz.ch/pomarc/\"\u003e\u003cstrong\u003eMarc Pollefeys\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://www.cs.princeton.edu/~funk/\"\u003e\u003cstrong\u003eThomas Funkhouser\u003c/strong\u003e\u003c/a\u003e\n  \u003c/p\u003e\n  \u003ch2 align=\"center\"\u003eCVPR 2023\u003c/h2\u003e\n  \u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2211.15654\"\u003ePaper\u003c/a\u003e | \u003ca href=\"https://youtu.be/jZxCLHyDJf8\"\u003eVideo\u003c/a\u003e | \u003ca href=\"https://pengsongyou.github.io/openscene\"\u003eProject Page\u003c/a\u003e\u003c/h3\u003e\n  \u003cdiv align=\"center\"\u003e\u003c/div\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003e\n    \u003cimg src=\"https://pengsongyou.github.io/media/openscene/teaser.jpg\" alt=\"Logo\" width=\"100%\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cstrong\u003eOpenScene\u003c/strong\u003e is a zero-shot approach to perform a series of novel 3D scene understanding tasks using open-vocabulary queries.\n\u003c/p\u003e\n\u003cbr\u003e\n\n\u003c!-- TABLE OF CONTENTS --\u003e\n\u003cdetails open=\"open\" style='padding: 10px; border-radius:5px 30px 30px 5px; border-style: solid; border-width: 1px;'\u003e\n  \u003csummary\u003eTable of Contents\u003c/summary\u003e\n  \u003col\u003e\n    \u003cli\u003e\n      \u003ca href=\"#interactive-demo\"\u003eInteractive Demo\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#installation\"\u003eInstallation\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#data-preparation\"\u003eData Preparation\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#run\"\u003eRun\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#applications\"\u003eApplications\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#todo\"\u003eTODO\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#acknowledgement\"\u003eAcknowledgement\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#citation\"\u003eCitation\u003c/a\u003e\n    \u003c/li\u003e\n  \u003c/ol\u003e\n\u003c/details\u003e\n\n## News :triangular_flag_on_post:\n\n- [2023/10/27] Add the code for LSeg per-pixel feature extraction and multi-view fusion. Check [this repo](https://github.com/pengsongyou/lseg_feature_extraction).\n- [2023/03/31] Code is released.\n\n## Interactive Demo\n### No GPU is needed! Follow **[this instruction](./demo)** to set up and play with the real-time demo yourself.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./media/demo.gif\" width=\"75%\" /\u003e\n\u003c/p\u003e\n\n\nHere we present a **real-time**, **interactive**, **open-vocabulary** scene understanding tool. A user can type in an arbitrary query phrase like **`snoopy`** (rare object), **`somewhere soft`** (property), **`made of metal`** (material), **`where can I cook?`** (activity), **`festive`** (abstract concept) etc, and the correponding regions are highlighted.\n\n\n## Installation\nFollow the [installation.md](installation.md) to install all required packages so you can do the evaluation \u0026 distillation afterwards.\n\n## Data Preparation\n\nWe provide the **pre-processed 3D\u00262D data** and **multi-view fused features** for the following datasets:\n- ScanNet\n- Matterport3D\n- nuScenes\n- Replica\n### Pre-processed 3D\u00262D Data\nYou can preprocess the dataset yourself, see the [data pre-processing instruction](scripts/preprocess/README.md).\n\n\nAlternatively, we have provided the preprocessed datasets. One can download the pre-processed datasets by running the script below, and following the command line instruction to download the corresponding datasets:\n```bash\nbash scripts/download_dataset.sh\n```\nThe script will download and unpack data into the folder `data/`. One can also download the dataset somewhere else, but link to the corresponding folder with the symbolic link:\n```bash\nln -s /PATH/TO/DOWNLOADED/FOLDER data\n```\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cstrong\u003eList of provided processed data\u003c/strong\u003e (click to expand):\u003c/summary\u003e\n  \n  - ScanNet 3D (point clouds with GT semantic labels)\n  - ScanNet 2D (RGB-D images with camera poses)\n  - Matterport 3D (point clouds with GT semantic labels)\n  - Matterport 2D (RGB-D images with camera poses)\n  - nuScenes 3D (lidar point clouds with GT semantic labels)\n  - nuScenes 2D (RGB images with camera poses)\n  - Replica 3D (point clouds)\n  - Replica 2D (RGB-D images)\n  - Matterport 3D with top 40 NYU classes\n  - Matterport 3D with top 80 NYU classes\n  - Matterport 3D with top 160 NYU classes\n\u003c/details\u003e\n\n**Note**: 2D processed datasets (e.g. `scannet_2d`) are only needed if you want to do multi-view feature fusion on your own. If so, please follow the [instruction for multi-view fusion](./scripts/feature_fusion/README.md).\n\n### Multi-view Fused Features\nTo evaluate our OpenScene model or distill a 3D model, one needs to have the multi-view fused image feature for each 3D point (see method in Sec. 3.1 in the paper).\n\nYou can run the following to directly download provided fused features:\n\n```bash\nbash scripts/download_fused_features.sh\n```\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cstrong\u003eList of provided fused features\u003c/strong\u003e (click to expand):\u003c/summary\u003e\n  \n  - ScanNet - Multi-view fused OpenSeg features, train/val (234.8G)\n  - ScanNet - Multi-view fused LSeg features, train/val (175.8G)\n  - Matterport - Multi-view fused OpenSeg features, train/val (198.3G)\n  - Matterport - Multi-view fused OpenSeg features, test set (66.7G)\n  - Replica - Multi-view fused OpenSeg features (9.0G)\n  - Matterport - Multi-view fused LSeg features (coming)\n  - nuScenes - Multi-view fused OpenSeg features (coming)\n  - nuScenes - Multi-view fused LSeg features (coming)\n\u003c/details\u003e\n\n\nAlternatively, you can also generate multi-view features yourself following the [instruction](./scripts/feature_fusion/README.md).\n\n\n## Run\nWhen you have installed the environment and obtained the **processed 3D data** and **multi-view fused features**, you are ready to run our OpenScene disilled/ensemble model for 3D semantic segmentation, or distill your own model from scratch.\n\n### Evaluation for 3D Semantic Segmentation with a Pre-defined Labelsets\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./media/benchmark_screenshot.jpg\" width=\"80%\" /\u003e\n\u003c/p\u003e\n\nHere you can evaluate OpenScene features on different dataset (ScanNet/Matterport3D/nuScenes/Replica) that have pre-defined labelsets.\nWe already include the following labelsets in [label_constants.py](dataset/label_constants.py):\n- ScanNet 20 classes (`wall`, `door`, `chair`, ...)\n- Matterport3D 21 classes (ScanNet 20 classes + `floor`)\n- Matterport top 40, 80, 160 NYU classes (more rare object classes)\n- nuScenes 16 classes (`road`, `bicycle`, `sidewalk`, ...)\n\nThe general command to run evaluation:\n```bash\nsh run/eval.sh EXP_DIR CONFIG.yaml feature_type\n```\nwhere you specify your experiment directory `EXP_DIR`, and replace `CONFIG.yaml` with the correct config file under [`config/`](./config/). **`feature_type`** corresponds to per-point OpenScene features:\n- `fusion`: The 2D multi-view fused features\n- `distill`: features from 3D distilled model \n- `ensemble`: Our 2D-3D ensemble features\n\nTo evaluate with `distill` and `ensemble`, the easiest way is to use a pre-trained 3D distilled model. You can do this by using one of the config files with postfix `_pretrained`. \n\nFor example, to evaluate the semantic segmentation on Replica, you can simply run:\n```bash\n# 2D-3D ensemble\nsh run/eval.sh out/replica_openseg config/replica/ours_openseg_pretrained.yaml ensemble\n\n# Run 3D distilled model\nsh run/eval.sh out/replica_openseg config/replica/ours_openseg_pretrained.yaml distill\n\n# Evaluate with 2D fused features\nsh run/eval.sh out/replica_openseg config/replica/ours_openseg_pretrained.yaml fusion\n```\nThe script will automatically download the pretrained 3D model and run the evaluation for Matterport 21 classes.\nYou can find all outputs in the `out/replica_openseg`.\n\nFor evaluation options, see under `TEST` inside `config/replica/ours_openseg_pretrained.yaml`. Below are important evaluation options that you might want to modify:\n- `labelset` (default: None, `scannet`| `matterport` | `matterport40`| `matterport80`|`matterport160`): Evaluate on a specific pre-defined labelset in [label_constants.py](./dataset/label_constants.py). If not specified, same as your 3D point cloud folder name\n- `eval_iou` (default: True): whether evaluating the mIoU. Set to `False` if there is no GT labels\n- `save_feature_as_numpy` (default: False): save the per-point features as `.npy`\n- `prompt_eng` (default: True): input class name X -\u003e \"a X in a scene\"\n- `vis_gt` (default: True):  visualize point clouds with GT semantic labels\n- `vis_pred` (default: True): visualize point clouds with our predicted semantic labels\n- `vis_input` (default: True): visualize input point clouds\n\nIf you want to use a 3D model distilled from scratch, specify the `model_path` to the correponding checkpoints `EXP/model/model_best.pth.tar`.\n\n\n### Distillation\nFinally, if you want to distill a new 3D model from scratch, run:\n\n- Start distilling:\n```sh run/distill.sh EXP_NAME CONFIG.yaml```\n\n- Resume: \n```sh run/resume_distill.sh EXP_NAME CONFIG.yaml```\n\nFor available distillation options, please take a look at `DISTILL` inside `config/matterport/ours_openseg.yaml`\n\n\n### Using Your Own Datasets\n1. Follow the [data preprocessing instruction](./scripts/preprocess/README.md), modify codes accordingly to obtain the processed 2D\u00263D data\n2. Follow the [feature fusion instruction](./scripts/feature_fusion/README.md), modify codes to obtain multi-view fused features.\n3. You can distill a model on your own, or take our provided 3D distilled model weights (e.g. our 3D model for ScanNet or Matterport3D), and modify the `model_path` accordingly.\n4. If you want to evaluate on a specific labelset, change the `labelset` in config.\n\n\n## Applications\nBesides the zero-shot 3D semantic segmentation, we can perform also the following tasks:\n- **Open-vocabulary 3D scene understanding and exploration**: query a 3D scene to understand properties that extend beyond fixed category labels, e.g. materials, activity, affordances, room type, abstract concepts...\n- **Rare object search**: query a 3D scene database to find rare examples based on their names\n- **Image-based 3D object detection**: query a 3D scene database to retrieve examples based on similarities to a given input image\n\n## Acknowledgement\nWe sincerely thank Golnaz Ghiasi for providing guidance on using OpenSeg model. Our appreciation extends to Huizhong Chen, Yin Cui, Tom Duerig, Dan Gnanapragasam, Xiuye Gu, Leonidas Guibas, Nilesh Kulkarni, Abhijit Kundu, Hao-Ning Wu, Louis Yang, Guandao Yang, Xiaoshuai Zhang, Howard Zhou, and Zihan Zhu for helpful discussion. We are also grateful to Charles R. Qi and Paul-Edouard Sarlin for their proofreading.\n\nWe build some parts of our code on top of the [BPNet repository](https://github.com/wbhu/BPNet).\n\n\n## TODO\n- [ ] Support demo for arbitrary scenes\n- [ ] Support in-webiste demo\n- [x] Support multi-view feature fusion with LSeg\n- [x] Add missing multi-view fusion LSeg feature for Matterport \u0026 nuScenes\n- [x] Add missing multi-view fusion OpenSeg feature for nuScenes\n- [x] Multi-view feature fusion code for nuScenes\n- [ ] Support the latest PyTorch version\n\nWe are very much welcome all kinds of contributions to the project.\n\n## Citation\nIf you find our code or paper useful, please cite\n```bibtex\n@inproceedings{Peng2023OpenScene,\n  title     = {OpenScene: 3D Scene Understanding with Open Vocabularies},\n  author    = {Peng, Songyou and Genova, Kyle and Jiang, Chiyu \"Max\" and Tagliasacchi, Andrea and Pollefeys, Marc and Funkhouser, Thomas},\n  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year      = {2023}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpengsongyou%2Fopenscene","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpengsongyou%2Fopenscene","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpengsongyou%2Fopenscene/lists"}