An open API service indexing awesome lists of open source software.

https://github.com/visionxlab/space-10


https://github.com/visionxlab/space-10

Last synced: 11 months ago
JSON representation

Awesome Lists containing this project

README

          


SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence

[Ziyang Gong](https://scholar.google.com/citations?user=cWip8QgAAAAJ&hl=zh-CN&oi=ao)1*,
[Wenhao Li]()2*,
[Oliver Ma]()3,
[Songyuan Li](https://scholar.google.com/citations?user=dVQGfEEAAAAJ&hl=zh-CN&oi=ao)4,
[Jiayi Ji](https://scholar.google.com/citations?user=xp_rICcAAAAJ&hl=zh-CN&oi=ao)5,
[Xue Yang](https://scholar.google.com/citations?user=2xTlvV0AAAAJ&hl=zh-CN)1,
[Gen Luo](https://scholar.google.com/citations?user=EyZqU9gAAAAJ&hl=zh-CN)3,
[Junchi Yan]()1,
[Rongrong Ji]()2

1 Shanghai Jiao Tong University,
2 Xiamen University,
3 Shanghai AI Lab,
4 Sun Yat-sen University,
5 National University of Singapore

* Equal contribution





---
# 🧠 What is SpaCE-10?

**SpaCE-10** is a **compositional spatial intellegence benchmark** for evaluating **Multimodal Large Language Models (MLLMs)** in indoor environments. Our contribution as follows:

- 🧬 We define an **Atomic Capability Pool**, proposing 10 **atomic spatial capabilities.**
- 🔗 Based on the composition of different atomic capabilities, we design **8 compositional QA types**.
- 📈 SpaCE-10 benchmark contains 5,000+ QA pairs.
- 🏠 All QA pairs come from 811 indoor scenes (ScanNet++, ScanNet, 3RScan, ARKitScene)
- 🌍 SpaCE-10 spans both 2D and 3D MLLM evaluations and can be seamlessly adapted to MLLMs that accept 3D scan input.









---
# 🔥🔥🔥 News

- 🖼️ [2025/06/11] Scans for 3D MLLMs and our manually collected 3D snapshots will be coming soon.
- 💻 [2025/06/10] Evaluation code is released at followings.
- 📊 [2025/06/09] We have released the benchmark for 2D MLLMs at [Hugging Face](https://huggingface.co/datasets/Cusyoung/SpaCE-10).
- 📚 [2025/06/09] The paper of SpaCE-10 is released at [Arxiv](https://arxiv.org/abs/2506.07966v1), and it will be updating continually!
---

# Environment
The evaluation of SpaCE-10 is based on lmms-eval. Thus, we follow the environment settings of lmms-eval.
```bash
git clone https://github.com/Cuzyoung/SpaCE-10.git
cd SpaCE-10
uv venv dev --python=3.10
source dev/bin/activate
uv pip install -e .
```

# Evaluation
Take InternVL2.5-8B as an example:
```bash
cd lmms-eval/run_bash
bash internvl2.5-8b.sh
```
Notably, each time we test a new model, the corresponding environment of this model needs to be installed.
---

# Citation
@article{gong2025space10,
title={SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence},
author={Ziyang Gong, Wenhao Li, Oliver Ma, Songyuan Li, Jiayi Ji, Xue Yang, Gen Luo, Junchi Yan, Rongrong Ji},
journal={arXiv preprint arXiv:2506.07966},
year={2025}
}