{"id":18883674,"url":"https://github.com/jiayuww/spatialeval","last_synced_at":"2025-09-04T07:31:36.968Z","repository":{"id":261591214,"uuid":"877072051","full_name":"jiayuww/SpatialEval","owner":"jiayuww","description":"[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs","archived":false,"fork":false,"pushed_at":"2025-01-23T02:51:19.000Z","size":4144,"stargazers_count":23,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-05T05:03:56.802Z","etag":null,"topics":["claude","foundation-models","gemini","gpt-4o","gpt-4v","large-language-models","llama3","machine-learning","multimodal-deep-learning","reasoning","spatial-reasoning","vision-language-models"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jiayuww.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-23T03:19:56.000Z","updated_at":"2025-04-03T05:50:12.000Z","dependencies_parsed_at":"2024-11-07T11:42:41.322Z","dependency_job_id":"15f91833-a62e-405c-82fe-c68c8fff3774","html_url":"https://github.com/jiayuww/SpatialEval","commit_stats":null,"previous_names":["jiayuww/spatialeval"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jiayuww/SpatialEval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiayuww%2FSpatialEval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiayuww%2FSpatialEval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiayuww%2FSpatialEval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiayuww%2FSpatialEval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jiayuww","download_url":"https://codeload.github.com/jiayuww/SpatialEval/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiayuww%2FSpatialEval/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273573304,"owners_count":25129877,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude","foundation-models","gemini","gpt-4o","gpt-4v","large-language-models","llama3","machine-learning","multimodal-deep-learning","reasoning","spatial-reasoning","vision-language-models"],"created_at":"2024-11-08T07:08:21.710Z","updated_at":"2025-09-04T07:31:36.946Z","avatar_url":"https://github.com/jiayuww.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SpatialEval\n\nWelcome to the official codebase for [Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models](https://arxiv.org/abs/2406.14852). \n\n## 📌 Quick Links\n[![Project Page](https://img.shields.io/badge/🌐_Project_Page-blue?style=for-the-badge)](https://spatialeval.github.io/)\n[![Paper](https://img.shields.io/badge/📖_Paper-red?style=for-the-badge)](https://arxiv.org/pdf/2406.14852)\n[![Dataset](https://img.shields.io/badge/🤗_Dataset-green?style=for-the-badge)](https://huggingface.co/datasets/MilaWang/SpatialEval)\n[![Talk](https://img.shields.io/badge/🎤_5_min_Talk-purple?style=for-the-badge)](https://neurips.cc/virtual/2024/poster/94371)\n\n\n## 💥 News 💥\n\n* **[2024.09.25]** 🎉 SpatialEval has been accepted to **NeurIPS 2024**!\n* **[2024.09.16]** 🌟 SpatialEval has been included in [Eureka](https://www.microsoft.com/en-us/research/publication/eureka-evaluating-and-understanding-large-foundation-models/) from **Microsoft Research**!\n\n* **[2024.06.21]** 📢 SpatialEval is now publicly available on [arXiv](https://arxiv.org/abs/2406.14852)\n\n## 🤔 About SpatialEval\n\nSpatialEval is a comprehensive benchmark for evaluating spatial intelligence in LLMs and VLMs across four key dimensions:\n- Spatial relationships\n- Positional understanding\n- Object counting\n- Navigation\n\n### Benchmark Tasks\n1. **Spatial-Map**: Understanding spatial relationships between objects in map-based scenarios\n2. **Maze-Nav**: Testing navigation through complex environments\n3. **Spatial-Grid**: Evaluating spatial reasoning within structured environments\n4. **Spatial-Real**: Assessing real-world spatial understanding\n\nEach task supports three input modalities:\n- Text-only (TQA)\n- Vision-only (VQA)\n- Vision-Text (VTQA)\n\n![SpatialEval Overview](assets/spatialeval_task.png)\n\n\n## 🚀 Quick Start\n\n\n### 📍 Load Dataset\n\nSpatialEval provides three input modalities—TQA (Text-only), VQA (Vision-only), and VTQA (Vision-text)—across four tasks: Spatial-Map, Maze-Nav, Spatial-Grid, and Spatial-Real. Each modality and task is easily accessible via Hugging Face. Ensure you have installed the [packages](https://huggingface.co/docs/datasets/en/quickstart):\n\n```python\nfrom datasets import load_dataset\n\ntqa = load_dataset(\"MilaWang/SpatialEval\", \"tqa\", split=\"test\")\nvqa = load_dataset(\"MilaWang/SpatialEval\", \"vqa\", split=\"test\")\nvtqa = load_dataset(\"MilaWang/SpatialEval\", \"vtqa\", split=\"test\")\n```\n\n\n### 📈 Evaluate SpatialEval\n\nSpatialEval supports any evaluation pipelines compatible with language models and vision-language models. For text-based prompts, use the `text` column with this structure:\n`{text} First, provide a concise answer in one sentence. Then, elaborate on the reasoning behind your answer in a detailed, step-by-step explanation.` The image input is in the `image` column, and the correct answers are available in the `oracle_answer`, `oracle_option`, and `oracle_full_answer` columns.\n\nNext, we provide full scripts for inference and evaluation.\n\n#### Install\n\n1. Clone this repository\n\n```python\ngit clone git@github.com:jiayuww/SpatialEval.git\n```\n\n2. Install dependencies\n\nTo run models like LLaVA and Bunny, install [LLaVA](https://github.com/haotian-liu/LLaVA) and [Bunny](https://github.com/BAAI-DCAI/Bunny). Install [fastchat](https://github.com/lm-sys/FastChat) for language model inference.\nFor Bunny variants, ensure you merge LoRA weights into the base LLMs before initiation.\n\n#### 💬 Running Inference\n\nFor language models, for example, to run on Llama-3-8B for all four tasks:\n\n```bash\n# Run on all tasks\npython inference_lm.py \\\n    --task \"all\" \\\n    --mode \"tqa\" \\\n    --w_reason \\\n    --model-path \"meta-llama/Meta-Llama-3-8B-Instruct\" \\\n    --output_folder outputs \\\n    --temperature 0.2 \\\n    --top_p 0.9 \\\n    --repetition_penalty 1.0 \\\n    --max_new_tokens 512 \\\n    --device \"cuda\"\n\n# For specific tasks, replace \"all\" with:\n# - \"spatialmap\"\n# - \"mazenav\"\n# - \"spatialgrid\"\n# - \"spatialreal\"\n```\n\nFor vision-language models, for example, to run LLaVA-1.6-Mistral-7B across all tasks:\n\n```python\n# VQA mode\npython inference_vlm.py \\\n    --mode \"vqa\" \\\n    --task \"all\" \\\n    --model_path \"liuhaotian/llava-v1.6-mistral-7b\" \\\n    --w_reason \\\n    --temperature 0.2 \\\n    --top_p 0.9 \\\n    --repetition_penalty 1.0 \\\n    --max_new_tokens 512 \\\n    --device \"cuda\"\n\n# For VTQA mode, use --mode \"vtqa\"\n```\n\nExample bash scripts are available in the `scripts/` folder. For more configurations, see `configs/inference_configs.py`. VLMs support `tqa`, `vqa`, and `vtqa` modes, while LMs support `tqa` only. Tasks include all four tasks or individual tasks like `spatialmap`, `mazenav`, `spatialgrid`, and `spatialreal`.\nWe can also test the first `k` examples, for exmaple, first 100 samples for each question type in each task by specifying `--first_k 100`.\n\n#### 📊 Evaluation\n\nWe use exact match for evaluation. For example, to evaluate Spatial-Map task on all three input modalities TQA, VQA and VTQA:\n\n```bash\n# For TQA on Spatial-Map\npython evals/evaluation.py --mode 'tqa' --task 'spatialmap' --output_folder 'outputs/' --dataset_id 'MilaWang/SpatialEval' --eval_summary_dir 'eval_summary'\n# For VQA on Spatial-Map\npython evals/evaluation.py --mode 'vqa' --task 'spatialmap' --output_folder 'outputs/' --dataset_id 'MilaWang/SpatialEval' --eval_summary_dir 'eval_summary'\n# For VTQA on Spatial-Map\npython evals/evaluation.py --mode 'vtqa' --task 'spatialmap' --output_folder 'outputs/' --dataset_id 'MilaWang/SpatialEval' --eval_summary_dir 'eval_summary'\n```\n\nEvaluation can also be configured for other tasks `mazenav`, `spatialgrid`, and `spatialreal`. Further details are in `evals/evaluation.py`.\n\n### 💡 Dataset Generation Script\n\nStay tuned! The dataset generation script will be released in Feburary 😉\n\n## ⭐ Citation\n\nIf you find our work helpful, please consider citing our paper 😊\n\n```\n@inproceedings{wang2024spatial,\n        title={Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models},\n        author={Wang, Jiayu and Ming, Yifei and Shi, Zhenmei and Vineet, Vibhav and Wang, Xin and Li, Yixuan and Joshi, Neel},\n        booktitle={The Thirty-Eighth Annual Conference on Neural Information Processing Systems},\n        year={2024}\n      }\n```\n\n## 💬 Questions\nHave questions? We're here to help!\n- Open an issue in this repository\n- Contact us through the channels listed on our project page","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiayuww%2Fspatialeval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjiayuww%2Fspatialeval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiayuww%2Fspatialeval/lists"}