{"id":26645897,"url":"https://github.com/manycore-research/SpatialLM","last_synced_at":"2025-03-24T22:01:55.483Z","repository":{"id":282891289,"uuid":"948245457","full_name":"manycore-research/SpatialLM","owner":"manycore-research","description":"SpatialLM: Large Language Model for Spatial Understanding","archived":false,"fork":false,"pushed_at":"2025-03-17T13:22:49.000Z","size":6525,"stargazers_count":42,"open_issues_count":2,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-17T14:32:16.249Z","etag":null,"topics":["mllm","point-clouds","scene-understanding","spatial-intelligence"],"latest_commit_sha":null,"homepage":"https://manycore-research.github.io/SpatialLM","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/manycore-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-14T01:50:51.000Z","updated_at":"2025-03-17T14:15:18.000Z","dependencies_parsed_at":"2025-03-17T22:01:21.439Z","dependency_job_id":null,"html_url":"https://github.com/manycore-research/SpatialLM","commit_stats":null,"previous_names":["manycore-research/spatiallm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manycore-research%2FSpatialLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manycore-research%2FSpatialLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manycore-research%2FSpatialLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manycore-research%2FSpatialLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/manycore-research","download_url":"https://codeload.github.com/manycore-research/SpatialLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245359285,"owners_count":20602338,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mllm","point-clouds","scene-understanding","spatial-intelligence"],"created_at":"2025-03-24T22:01:54.611Z","updated_at":"2025-03-24T22:01:55.444Z","avatar_url":"https://github.com/manycore-research.png","language":"Python","funding_links":[],"categories":["Python","Repos","largemodel"],"sub_categories":[],"readme":"# SpatialLM\n\n\u003c!-- markdownlint-disable first-line-h1 --\u003e\n\u003c!-- markdownlint-disable html --\u003e\n\u003c!-- markdownlint-disable no-duplicate-header --\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"figures/logo_light.png#gh-light-mode-only\" width=\"60%\" alt=\"SpatialLM\" /\u003e\n  \u003cimg src=\"figures/logo_dark.png#gh-dark-mode-only\" width=\"60%\" alt=\"SpatialLM\" /\u003e\n\u003c/div\u003e\n\u003chr style=\"margin-top: 0; margin-bottom: 8px;\"\u003e\n\u003cdiv align=\"center\" style=\"margin-top: 0; padding-top: 0; line-height: 1;\"\u003e\n    \u003ca href=\"https://manycore-research.github.io/SpatialLM\" target=\"_blank\" style=\"margin: 2px;\"\u003e\u003cimg alt=\"Project\"\n    src=\"https://img.shields.io/badge/🌐%20Website-SpatialLM-ffc107?color=42a5f5\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/manycore-research/SpatialLM\" target=\"_blank\" style=\"margin: 2px;\"\u003e\u003cimg alt=\"GitHub\"\n    src=\"https://img.shields.io/badge/GitHub-SpatialLM-24292e?logo=github\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\" style=\"line-height: 1;\"\u003e\n    \u003ca href=\"https://huggingface.co/manycore-research/SpatialLM-Llama-1B\" target=\"_blank\" style=\"margin: 2px;\"\u003e\u003cimg alt=\"Hugging Face\"\n    src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM%201B-ffc107?color=ffc107\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://huggingface.co/datasets/manycore-research/SpatialLM-Testset\" target=\"_blank\" style=\"margin: 2px;\"\u003e\u003cimg alt=\"Dataset\"\n    src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-SpatialLM-ffc107?color=ffc107\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n## Introduction\n\nSpatialLM is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, SpatialLM can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors. This multimodal architecture effectively bridges the gap between unstructured 3D geometric data and structured 3D representations, offering high-level semantic understanding. It enhances spatial reasoning capabilities for applications in embodied robotics, autonomous navigation, and other complex 3D scene analysis tasks.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cvideo src=\"https://github.com/user-attachments/assets/c0218d6a-f676-41f8-ae76-bba228866306\" poster=\"figures/cover.png\"\u003e \u003c/video\u003e\n  \u003cp\u003e\u003ci\u003eSpatialLM reconstructs 3D layout from a monocular RGB video with MASt3R-SLAM. Results aligned to video with GT cameras for visualization.\u003c/i\u003e\u003c/p\u003e\n\u003c/div\u003e\n\n## SpatialLM Models\n\n\u003cdiv align=\"center\"\u003e\n\n|      **Model**      | **Download**                                                                   |\n| :-----------------: | ------------------------------------------------------------------------------ |\n| SpatialLM-Llama-1B  | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialLM-Llama-1B)  |\n| SpatialLM-Qwen-0.5B | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) |\n\n\u003c/div\u003e\n\n## Usage\n\n### Installation\n\nTested with the following environment:\n\n- Python 3.11\n- Pytorch 2.4.1\n- CUDA Version 12.4\n\n```bash\n# clone the repository\ngit clone https://github.com/manycore-research/SpatialLM.git\ncd SpatialLM\n\n# create a conda environment with cuda 12.4\nconda create -n spatiallm python=3.11\nconda activate spatiallm\nconda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash\n\n# Install dependencies with poetry\npip install poetry \u0026\u0026 poetry config virtualenvs.create false --local\npoetry install\npoe install-torchsparse # Building wheel for torchsparse will take a while\n```\n\n### Inference\n\nIn the current version of SpatialLM, input point clouds are considered axis-aligned where the z-axis is the up axis. This orientation is crucial for maintaining consistency in spatial understanding and scene interpretation across different datasets and applications.\nExample preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM), are available in [SpatialLM-Testset](#spatiallm-testset).\n\nDownload an example point cloud:\n\n```bash\nhuggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .\n```\n\nRun inference:\n\n```bash\npython inference.py --point_cloud pcd/scene0000_00.ply --output scene0000_00.txt --model_path manycore-research/SpatialLM-Llama-1B\n```\n\n### Visualization\n\nUse `rerun` to visualize the point cloud and the predicted structured 3D layout output:\n\n```bash\n# Convert the predicted layout to Rerun format\npython visualize.py --point_cloud pcd/scene0000_00.ply --layout scene0000_00.txt --save scene0000_00.rrd\n\n# Visualize the point cloud and the predicted layout\nrerun scene0000_00.rrd\n```\n\n### Evaluation\n\nTo evaluate the performance of SpatialLM, we provide `eval.py` script that reports the benchmark results on the SpatialLM-Testset in the table below in section [Benchmark Results](#benchmark-results).\n\nDownload the testset:\n\n```bash\nhuggingface-cli download manycore-research/SpatialLM-Testset --repo-type dataset --local-dir SpatialLM-Testset\n```\n\nRun evaluation:\n\n```bash\n# Run inference on the PLY point clouds in folder SpatialLM-Testset/pcd with SpatialLM-Llama-1B model\npython inference.py --point_cloud SpatialLM-Testset/pcd --output SpatialLM-Testset/pred --model_path manycore-research/SpatialLM-Llama-1B\n\n# Evaluate the predicted layouts\npython eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/layout --pred_dir SpatialLM-Testset/pred --label_mapping SpatialLM-Testset/benchmark_categories.tsv\n```\n\n## SpatialLM Testset\n\nWe provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.\n\n\u003cdiv align=\"center\"\u003e\n\n|    **Dataset**    | **Download**                                                                       |\n| :---------------: | ---------------------------------------------------------------------------------- |\n| SpatialLM-Testset | [🤗 Datasets](https://huggingface.co/datasets/manycore-research/SpatialLM-TestSet) |\n\n\u003c/div\u003e\n\n## Benchmark Results\n\nBenchmark results on the challenging SpatialLM-Testset are reported in the following table:\n\n\u003cdiv align=\"center\"\u003e\n\n| **Method**       | **SpatialLM-Llama-1B** | **SpatialLM-Qwen-0.5B** |\n| ---------------- | ---------------------- | ----------------------- |\n| **Floorplan**    | **mean IoU**           |                         |\n| wall             | 78.62                  | 74.81                   |\n|                  |                        |                         |\n| **Objects**      | **F1 @.25 IoU (3D)**   |                         |\n| curtain          | 27.35                  | 28.59                   |\n| nightstand       | 57.47                  | 54.39                   |\n| chandelier       | 38.92                  | 40.12                   |\n| wardrobe         | 23.33                  | 30.60                   |\n| bed              | 95.24                  | 93.75                   |\n| sofa             | 65.50                  | 66.15                   |\n| chair            | 21.26                  | 14.94                   |\n| cabinet          | 8.47                   | 8.44                    |\n| dining table     | 54.26                  | 56.10                   |\n| plants           | 20.68                  | 26.46                   |\n| tv cabinet       | 33.33                  | 10.26                   |\n| coffee table     | 50.00                  | 55.56                   |\n| side table       | 7.60                   | 2.17                    |\n| air conditioner  | 20.00                  | 13.04                   |\n| dresser          | 46.67                  | 23.53                   |\n|                  |                        |                         |\n| **Thin Objects** | **F1 @.25 IoU (2D)**   |                         |\n| painting         | 50.04                  | 53.81                   |\n| carpet           | 31.76                  | 45.31                   |\n| tv               | 67.31                  | 52.29                   |\n| door             | 50.35                  | 42.15                   |\n| window           | 45.4                   | 45.9                    |\n\n\u003c/div\u003e\n\n## License\n\nSpatialLM-Llama-1B is derived from Llama3.2-1B-Instruct, which is licensed under the Llama3.2 license.\nSpatialLM-Qwen-0.5B is derived from the Qwen-2.5 series, originally licensed under the Apache 2.0 License.\n\nAll models are built upon the SceneScript point cloud encoder, licensed under the CC-BY-NC-4.0 License. TorchSparse, utilized in this project, is licensed under the MIT License.\n\n## Citation\n\nIf you find this work useful, please consider citing:\n\n```bibtex\n@misc{spatiallm,\n  title        = {SpatialLM: Large Language Model for Spatial Understanding},\n  author       = {ManyCore Research Team},\n  howpublished = {\\url{https://github.com/manycore-research/SpatialLM}},\n  year         = {2025}\n}\n```\n\n## Acknowledgements\n\nWe would like to thank the following projects that made this work possible:\n\n[Llama3.2](https://github.com/meta-llama) | [Qwen2.5](https://github.com/QwenLM/Qwen2.5) | [Transformers](https://github.com/huggingface/transformers) | [SceneScript](https://github.com/facebookresearch/scenescript) | [TorchSparse](https://github.com/mit-han-lab/torchsparse)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanycore-research%2FSpatialLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanycore-research%2FSpatialLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanycore-research%2FSpatialLM/lists"}