{"id":31653090,"url":"https://github.com/boschresearch/open3dsg","last_synced_at":"2025-10-07T10:41:03.431Z","repository":{"id":253745437,"uuid":"813506107","full_name":"boschresearch/Open3DSG","owner":"boschresearch","description":"[CVPR 2024] Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships","archived":false,"fork":false,"pushed_at":"2024-09-16T06:54:29.000Z","size":143,"stargazers_count":43,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-09-16T08:20:41.020Z","etag":null,"topics":["3d-scene-graph","3d-scene-understanding","bcai","open-vocabulary","paper-resource","scene-graph"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/boschresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-11T08:10:17.000Z","updated_at":"2024-09-16T06:54:32.000Z","dependencies_parsed_at":"2024-08-20T22:33:17.300Z","dependency_job_id":null,"html_url":"https://github.com/boschresearch/Open3DSG","commit_stats":null,"previous_names":["boschresearch/open3dsg"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/boschresearch/Open3DSG","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boschresearch%2FOpen3DSG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boschresearch%2FOpen3DSG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boschresearch%2FOpen3DSG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boschresearch%2FOpen3DSG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/boschresearch","download_url":"https://codeload.github.com/boschresearch/Open3DSG/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boschresearch%2FOpen3DSG/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278762925,"owners_count":26041444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-scene-graph","3d-scene-understanding","bcai","open-vocabulary","paper-resource","scene-graph"],"created_at":"2025-10-07T10:41:01.201Z","updated_at":"2025-10-07T10:41:03.426Z","avatar_url":"https://github.com/boschresearch.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- PROJECT LOGO --\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ch1\u003e\n  Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships\n\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://kochsebastian.com/\"\u003e\u003cstrong\u003eSebastian Koch\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://scholar.google.com/citations?user=U3KSTwkAAAAJ\u0026hl=en\"\u003e\u003cstrong\u003eNarunas Vaskevicius\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://scholar.google.com/citations?hl=en\u0026user=k4m1c6EAAAAJ\"\u003e\u003cstrong\u003eMirco Colosi\u003c/strong\u003e\u003c/a\u003e\n    \u003cbr\u003e\n    \u003ca href=\"https://phermosilla.github.io/\"\u003e\u003cstrong\u003ePedro Hermosilla\u003c/strong\u003e\u003c/a\u003e\n    ·\n    \u003ca href=\"https://viscom.uni-ulm.de/members/timo-ropinski/\"\u003e\u003cstrong\u003eTimo Ropinski\u003c/strong\u003e\u003c/a\u003e\n  \u003c/p\u003e\n  \u003ch2 align=\"center\"\u003eCVPR 2024\u003c/h2\u003e\n  \u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2402.12259\"\u003ePaper\u003c/a\u003e | \u003ca href=\"https://kochsebastian.com/open3dsg\"\u003eProject Page\u003c/a\u003e\u003c/h3\u003e\n  \u003cdiv align=\"center\"\u003e\u003c/div\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003e\n    \u003cimg src=\"https://github.com/kochsebastian/kochsebastian.github.io/blob/master/media/open3dsg/teaser.png?raw=true\" alt=\"Logo\" width=\"85%\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\nWe present \u003cstrong\u003eOpen3DSG\u003c/strong\u003e the first approach\nfor learning to predict open-vocabulary 3D scene graphs from\n3D point clouds. The advantage of our method is that it can be\nqueried and prompted for any instance in the scene, such as the\nTV and Wall, to predict fine-grained semantic descriptions of objects and relationships.\n\u003c/p\u003e\n\u003cbr\u003e\n\n## Setup\n\n```bash\nconda env create --name open3dsg python=3.9\nconda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia\npip install -r requirements.txt\npip install -e .\n```\n\n\u003e **Note**: This software is tested and developed for CUDA 11.8 \u0026 tested with an NVIDIA V100 32GB.\n\n### Data Preparation\n\n1. Download [3RScan](https://github.com/WaldJohannaU/3RScan) and [3DSSG](https://3dssg.github.io/). Unpack the image sequences for each scan. And include the 3DSSG files as a subdirectory in 3RScan.\n2. Download [ScanNet](http://www.scan-net.org/ScanNet/) and split the scans into ```scannet_2d``` and ```scannet_3d```. We use the pre-processed data from [ScanNet ETH preprocessed 3D](https://cvg-data.inf.ethz.ch/openscene/data/scannet_processed/scannet_3d.zip) \u0026 [ScanNet ETH preprocessed 2D](https://cvg-data.inf.ethz.ch/openscene/data/scannet_processed/scannet_2d.zip), when using the pre-processed version make sure that you have acknowledged the ScanNet license. When using processed ScanNet ETH preprocessed 2D frames, use the matching [intrinsics](https://drive.google.com/drive/folders/1rlzUS1d5cYo5lJCNl1G81x9HmYtn5NB5?usp=drive_link).\n3. Download the [3DSSG_subset.zip](http://campar.in.tum.de/public_datasets/3DSSG/3DSSG_subset.zip) and extract the files in the 3RScan directory for training and evaluation. Additional meta files can be found [here](https://drive.google.com/drive/folders/1rlzUS1d5cYo5lJCNl1G81x9HmYtn5NB5?usp=drive_link).\n4. Download 3RScan \u0026 ScanNet meta data files using ```scripts/download_scannet_meta.sh``` and ```scripts/download_scannet_meta.sh``` and place them in their data directories.\n5. Set the path to your data in ```config/config.py```\n\n### Data Preprocessing\n\n3DSSG provides pre-constructed scene graphs with ground-truth labels for training and validation. ScanNet does not. To train our model on ScanNet, we first have to build up a similar graph structure for ScanNet. You can use the following command to generate the graphs for ScanNet\n\n```bash\npython open3dsg/data/gen_scannet_subgraphs.py --type [train/test/validation]\n```\n\nFor the 2D-3D distillation training, we have to align the 2D frames to the 3D point clouds. Using this script we generate matching frames for each 3D instance.\n\n```bash\npython open3dsg/data/get_object_frame.py --mode [train/test] --dataset [R3SCAN/SCANNET]\n```\n\nWe pre-process the data before the training for faster data processing in the training loop.\n\n```bash\npython open3dsg/data/preprocess_3rscan.py\npython open3dsg/data/preprocess_scannet.py\n```\n\nThe pre-processed features can be used directly for training and testing.\n\n### Model Downloads\n\nDownload the [OpenSeg Checkpoint](https://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/openseg), [BLIP2 Positional Embedding](https://drive.google.com/file/d/1BfvxB6eo3XksE6AfMUgoBHwzVYce1ed1/view?usp=sharing) \u0026 pre-trained [PointNet/PointNet2 weights](https://drive.google.com/drive/folders/1PrnJVMpJVVh4MAV4yPRuRByhBu-DuXwH?usp=sharing) and put them the checkpoints directory selected in the config file.\n\n## Precompute 2D features\n\nThis is an **optional** step to accelerate the forward pass in the training loop. This command will dump the VLM features for each training sample to disk. Storing the features requires about 300GB per dataset.\n\n```bash\npython open3dsg/scripts/run.py --dump_features --dataset [scannet/3rscan] --scales 3 --top_k_frames 5 --clip_model OpenSeg --blip\n```\n\nIn case of out of memory issues, seperate the BLIP export \u0026 the OpenSeg export.\n\n## Train\n\nTo train Open3DSG on ScanNet you can use:\n\n```bash\npython open3dsg/scripts/run.py --epochs 100 --batch_size 4 --gpus 4 --workers 8 --use_rgb --dataset scannet --clip_model OpenSeg --blip --load_features [path to precomputed 2D features]\n```\n\nChange hyperparameters according to you hardware availability. In [run.py](open3dsg/scripts/run.py) you can find more model and data hyperparameters.\nUse ```--mixed_precision``` to optimize GPU memory during training.\n\n## Test\n\nTo evaluate a trained model on the 3RSCAN dataset with ground-truth labels, use the following command:\n\n```bash\npython open3dsg/script/run.py --test --dataset 3rscan --checkpoint [path to checkpoint] --n_beams 5 --weight_2d 0.5 --clip_model OpenSeg --node_model ViT-L/14@336px --blip\n```\n\nWe use the ```CLIP ViT-L/14@336px``` to query object classes from the node embedding. Use ```--n_beams``` to adjust the beam search for the LLM relationship output and ```--weight_2d``` to adjust the 2D-3D features fusion. A value of 0.0 indicates a prediction from 3D features only\n\n## Citation\n\nIf you find our code or paper useful, please cite\n\n```bibtex\n@inproceedings{koch2024open3dsg,\n      title={Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships},\n      author={Koch, Sebastian  and Vaskevicius, Narunas and Colosi, Mirco and Hermosilla, Pedro and Ropinski, Timo},\n      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n      month={June},\n      year={2024},\n  }\n```\n\n## License\n\nOpen3DSG is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.\n\nFor a list of other open source components included in Open3DSG, see the file 3rd-party-licenses.txt.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboschresearch%2Fopen3dsg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboschresearch%2Fopen3dsg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboschresearch%2Fopen3dsg/lists"}