https://github.com/visionxlab/space-10

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/visionxlab/space-10
Owner: VisionXLab
Created: 2025-06-05T08:33:18.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-06-11T12:04:17.000Z (11 months ago)
Last Synced: 2025-06-11T12:43:35.969Z (11 months ago)
Language: Python
Size: 1.83 MB
Stars: 7
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          





 SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence


[Ziyang Gong](https://scholar.google.com/citations?user=cWip8QgAAAAJ&hl=zh-CN&oi=ao)^1*,

[Wenhao Li]()^2*,

[Oliver Ma]()³,

[Songyuan Li](https://scholar.google.com/citations?user=dVQGfEEAAAAJ&hl=zh-CN&oi=ao)⁴,

[Jiayi Ji](https://scholar.google.com/citations?user=xp_rICcAAAAJ&hl=zh-CN&oi=ao)⁵,

[Xue Yang](https://scholar.google.com/citations?user=2xTlvV0AAAAJ&hl=zh-CN)¹,

[Gen Luo](https://scholar.google.com/citations?user=EyZqU9gAAAAJ&hl=zh-CN)³,

[Junchi Yan]()¹,

[Rongrong Ji]()²

¹ Shanghai Jiao Tong University, 

² Xiamen University,  

³ Shanghai AI Lab, 

⁴ Sun Yat-sen University, 

⁵ National University of Singapore

^* Equal contribution













---

# 🧠 What is SpaCE-10?

**SpaCE-10** is a **compositional spatial intellegence benchmark** for evaluating **Multimodal Large Language Models (MLLMs)** in indoor environments. Our contribution as follows:

- 🧬 We define an **Atomic Capability Pool**, proposing 10 **atomic spatial capabilities.**

- 🔗 Based on the composition of different atomic capabilities, we design **8 compositional QA types**.

- 📈 SpaCE-10 benchmark contains 5,000+ QA pairs.

- 🏠 All QA pairs come from 811 indoor scenes (ScanNet++, ScanNet, 3RScan, ARKitScene)

- 🌍 SpaCE-10 spans both 2D and 3D MLLM evaluations and can be seamlessly adapted to MLLMs that accept 3D scan input.















---

# 🔥🔥🔥 News

- 🖼️ [2025/06/11] Scans for 3D MLLMs and our manually collected 3D snapshots will be coming soon.

- 💻 [2025/06/10] Evaluation code is released at followings.

- 📊 [2025/06/09] We have released the benchmark for 2D MLLMs at [Hugging Face](https://huggingface.co/datasets/Cusyoung/SpaCE-10).

- 📚 [2025/06/09] The paper of SpaCE-10 is released at [Arxiv](https://arxiv.org/abs/2506.07966v1), and it will be updating continually!

---

# Environment 

The evaluation of SpaCE-10 is based on lmms-eval. Thus, we follow the environment settings of lmms-eval.

```bash

git clone https://github.com/Cuzyoung/SpaCE-10.git

cd SpaCE-10

uv venv dev --python=3.10

source dev/bin/activate

uv pip install -e .

```

# Evaluation

Take InternVL2.5-8B as an example:

```bash

cd lmms-eval/run_bash

bash internvl2.5-8b.sh

```

Notably, each time we test a new model, the corresponding environment of this model needs to be installed.

---

# Citation

@article{gong2025space10,

  title={SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence},

  author={Ziyang Gong, Wenhao Li, Oliver Ma, Songyuan Li, Jiayi Ji, Xue Yang, Gen Luo, Junchi Yan, Rongrong Ji},

  journal={arXiv preprint arXiv:2506.07966},

  year={2025}

}