{"id":50484763,"url":"https://github.com/aim-uofa/gsi-bench","last_synced_at":"2026-06-01T21:01:55.497Z","repository":{"id":353232896,"uuid":"1217771051","full_name":"aim-uofa/GSI-Bench","owner":"aim-uofa","description":"[CVPR2026] Exploring Spatial Intelligence from a Generative Perspective","archived":false,"fork":false,"pushed_at":"2026-04-23T01:33:59.000Z","size":27525,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T03:29:32.328Z","etag":null,"topics":["spatial-intelligence"],"latest_commit_sha":null,"homepage":"https://aim-uofa.github.io/GSI-Bench/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aim-uofa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-22T07:47:57.000Z","updated_at":"2026-04-23T02:14:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/aim-uofa/GSI-Bench","commit_stats":null,"previous_names":["aim-uofa/gsi-bench"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/aim-uofa/GSI-Bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGSI-Bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGSI-Bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGSI-Bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGSI-Bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aim-uofa","download_url":"https://codeload.github.com/aim-uofa/GSI-Bench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim-uofa%2FGSI-Bench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33793044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["spatial-intelligence"],"created_at":"2026-06-01T21:01:52.762Z","updated_at":"2026-06-01T21:01:55.487Z","avatar_url":"https://github.com/aim-uofa.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GSI-Bench: Exploring Spatial Intelligence from a Generative Perspective\n\nLanguage: English | 中文见 [`README_zh.md`](README_zh.md)\n\n**🎉 Accepted to CVPR 2026**\n\nOfficial implementation of the paper:\n\n\u003e **Exploring Spatial Intelligence from a Generative Perspective**\n\u003e _CVPR 2026_\n\u003e [[Paper]](paper/main.pdf) [[arXiv]](https://arxiv.org/abs/2604.20570) [[Project Page]](https://aim-uofa.github.io/GSI-Bench/)\n\nGSI-Bench evaluates the ability of generative models to understand and manipulate 3D spatial relationships in indoor scenes.\n\n| Metric | Full Name | What It Measures |\n|--------|-----------|------------------|\n| **IC** | Instruction Compliance | Does the output actually perform the requested spatial operation? |\n| **SA** | Spatial Accuracy | Is the 3D displacement, rotation, or scale close to the ground-truth geometry? |\n| **AC** | Appearance Consistency | Are object identity, category, and appearance preserved after editing? |\n| **EL** | Edit Locality | Is the rest of the scene left untouched outside the intended region? |\n\n---\n\n## Quick Navigation\n\n\u003e **If you only want to evaluate your model on GSI-Bench, go directly to [Evaluation](#evaluation).**\n\u003e\n\u003e Steps 1 and 2 document how we constructed the benchmark data. They are open-sourced for transparency and reproducibility, but are **not required** for running evaluations.\n\n```\nGSI-Bench/\n├── evaluation/     # Evaluation framework (IC / SA / EL / AC)  ← start here\n├── robothor/       # [Optional] Data generation pipeline 1: RoboTHOR indoor scenes\n├── mesatask/       # [Optional] Data generation pipeline 2: MesaTask tabletop scenes\n├── paper/          # Paper PDF\n└── tests/          # Unit \u0026 integration tests\n```\n\n---\n\n## Evaluation\n\n### 1. Environment Setup\n\n```bash\nconda create -n gsi-eval python=3.10 -y\nconda activate gsi-eval\n\ncd evaluation\n\n# Install PyTorch matching your CUDA version (example: CUDA 11.8)\npip install torch torchvision --index-url https://download.pytorch.org/whl/cu118\n\n# Install mmcv with C++ ops\npip install -U openmim \u0026\u0026 mim install mmcv\n\n# Install remaining dependencies\npip install -r requirements.txt\n\n# Optional: build GroundingDINO for text-prompt detection\npip install -e ./src/groundingdino --no-build-isolation\n```\n\n### 2. Download Model Weights\n\n| Weight | Size | Source |\n|--------|------|--------|\n| `other_exp_ckpt.pth` (DetAny3D) | ~500MB | [OpenDriveLab/DetAny3D](https://github.com/OpenDriveLab/DetAny3D) |\n| `sam_vit_h_4b8939.pth` (SAM ViT-H) | ~2.4GB | [Meta AI](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) |\n| `dinov2_vitl14_pretrain.pth` (DINOv2) | ~1.1GB | [Meta AI](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth) |\n| `groundingdino_swinb_cogcoor.pth` (optional) | ~690MB | [IDEA-Research](https://github.com/IDEA-Research/GroundingDINO) |\n\nPlace all weights in one directory, then run:\n```bash\nbash prepare_weights.sh \u003cpath_to_weight_directory\u003e\n# Creates symlinks under checkpoints/ and GroundingDINO/weights/\n```\n\n### 3. Download Evaluation Datasets\n\n```bash\n# Download the four GSI-Bench evaluation datasets and place in one directory\nbash prepare_datasets.sh \u003cpath_to_downloaded_datasets\u003e\n# Creates symlinks: fine_dataset/  mesatask_dataset/  bathroom_dataset/  robothor_dataset/\n```\n\n### 4. Generate Edited Images with Your Model\n\nYour model should produce edited images following the naming convention:\n```\neval/\u003cmodel_name\u003e/generated_images_fine/\u003cimg_id\u003e_edit_\u003cquery_id\u003e.png\neval/\u003cmodel_name\u003e/generated_images_mesatask/\u003cimg_id\u003e_edit_\u003cquery_id\u003e.png\neval/\u003cmodel_name\u003e/generated_images_bathroom/\u003cimg_id\u003e_edit_\u003cquery_id\u003e.png\neval/\u003cmodel_name\u003e/generated_images_robothor/\u003cimg_id\u003e_edit_\u003cquery_id\u003e.png\n```\n\nWe provide a BAGEL-based example: `python examples/inference.py` (see [`evaluation/REPRODUCE_BAGEL_RESULTS.md`](evaluation/REPRODUCE_BAGEL_RESULTS.md)).\n\n### 5. Run Evaluation\n\n```bash\ncd evaluation\nexport PYTHONPATH=$PWD:$PYTHONPATH\n\n# IC / SA / EL evaluation (iterates all models × all datasets)\nbash eval.sh\n\n# (Optional) MLLM-based AC scoring — requires serving an LLM\ncd mllm_eval\nbash eval_infer.sh \u003cmodel_path\u003e default \u003cport\u003e\ncd ..\n\n# Aggregate all metrics into a final report\npython -m eval.aggregate \\\n  --root-dir ./eval \\\n  --output-dir ./eval_results \\\n  --mllm-eval-dir \u003cdir_with_mllm_ac_jsons\u003e\n\ncd ..   # back to repo root\n```\n\n**Output:** `eval_results/` with per-model, per-dataset JSON files containing IC/SA/EL/AC scores.\n\nSee [`evaluation/eval/README.md`](evaluation/eval/README.md) for detailed input format and troubleshooting.\n\n---\n\n## Data Generation Pipelines (Optional)\n\n\u003e The following two pipelines document how we constructed the GSI-Bench data. They are **not needed for evaluation** — the evaluation datasets are provided as downloads above.\n\n### Pipeline 1: RoboTHOR Indoor Scenes\n\n**Environment:**\n```bash\nconda create -n gsi-robothor python=3.10 -y\nconda activate gsi-robothor\npip install -r robothor/requirements.txt\n# Dependencies: ai2thor\u003e=5.0.0, numpy, Pillow, matplotlib\n# AI2-THOR downloads scene assets automatically on first run (~2GB)\n# Requires: NVIDIA GPU + CloudRendering (headless) or X server (display)\n```\n\n**Generate data:**\n```bash\ncd robothor\n\n# 1) Generate base views + camera-relative commands for ALL 60 training scenes\n#    Output: data/outputs/train/with_physics/\nbash scripts/generate_train.sh\n\n# 2) Generate additional command types (requires pregenerated views from step 1)\nbash scripts/generate_train_object.sh          # object-relative positioning\nbash scripts/generate_train_rotate.sh           # rotation commands\nbash scripts/generate_train_receptacle.sh       # receptacle placement\nbash scripts/generate_train_spatial_remove.sh    # spatial removal\nbash scripts/generate_train_agent_camera.sh      # agent camera movement\n\n# 3) Generate validation data\nbash scripts/generate_val_agent_camera.sh\n\ncd ..   # back to repo root\n```\n\n**Output:** `data/outputs/{train,val}/` with JSONL records + RGB/depth/segmentation images per view per command.\n\n**Timing:** ~2–5 min per scene depending on GPU. Full 60 scenes: several hours.\n\nSee [`robothor/README.md`](robothor/README.md) for details.\n\n---\n\n### Pipeline 2: MesaTask Tabletop Scenes\n\n**Environment:**\n```bash\nconda create -n gsi-mesatask python=3.10 -y\nconda activate gsi-mesatask\npip install -r mesatask/requirement.txt\n# For inference (optional): pip install torch torchvision\n# For rendering (optional): download Blender 4.3+ from https://www.blender.org/download/\n# For physical optimization (optional): conda install -c conda-forge drake\n```\n\n**Download MesaTask-10K dataset:**\n```bash\ncd mesatask\ngit lfs install\ngit clone https://huggingface.co/datasets/InternRobotics/MesaTask-10K MesaTask-10K\n\n# Prepare asset library (from dataset archives)\ncd MesaTask-10K/Assets_library_archive\ncat Assets_library_backup.tar.gz.* \u003e Assets_library_merged.tar.gz\ntar -xzvf Assets_library_merged.tar.gz -C ../Assets_library/\ncd ../..\n```\n\n**Generate data:**\n```bash\ncd mesatask\n\n# 1) Generate atomic transforms (move, rotate, scale)\npython generate_atomic_transforms.py \\\n  --input-dir MesaTask-10K/Layout_info \\\n  --asset-annotation MesaTask-10K/Asset_annotation.json \\\n  --output-dir transformed_layouts \\\n  --num-variants 10 --seed 42\n\n# 2) Render all layouts (requires Blender)\npython dataset/vis_batch.py transformed_layouts \\\n  --output_dir dataset/vis_final --parallel 4\n\n# 3) Assemble image-editing dataset\npython organize_image_editing_dataset.py \\\n  --transformed-dir transformed_layouts \\\n  --vis-dir dataset/vis_final \\\n  --output-dir dataset/image_editing_dataset\n\ncd ..   # back to repo root\n```\n\n**Timing:** Step 1 takes ~10 min for 10K scenes. Step 2 (rendering) depends on machine and parallelism.\n\nSee [`mesatask/README.md`](mesatask/README.md) for details.\n\n---\n\n## Verify the Repo\n\n```bash\ngit clone \u003cthis-repo-url\u003e GSI-Bench \u0026\u0026 cd GSI-Bench\n\n# Run tests (no GPU or data needed)\npip install pytest\npython -m pytest tests/ -v    # 43 tests should pass\n```\n\n## Environment Requirements Summary\n\n| Component | Python | GPU | Conda Env |\n|-----------|--------|-----|-----------|\n| **tests/** | 3.8+ | No | any |\n| **evaluation/** | 3.10 | NVIDIA (DetAny3D) | `gsi-eval` |\n| **robothor/** | 3.10 | NVIDIA (CloudRendering) | `gsi-robothor` |\n| **mesatask/** | 3.10 | Optional | `gsi-mesatask` |\n\n---\n\n## Citation\n\n```bibtex\n@article{zhu2026exploring,\n  title={Exploring Spatial Intelligence from a Generative Perspective},\n  author={Zhu, Muzhi and Jiang, Shunyao and Zheng, Huanyi and Luo, Zekai and Zhong, Hao and Li, Anzhou and Wang, Kaijun and Rong, Jintao and Liu, Yang and Chen, Hao and Lin, Tao and Shen, Chunhua},\n  journal={arXiv preprint arXiv:2604.20570},\n  year={2026}\n}\n```\n\n## License\n\nGSI-Bench is released under the MIT License — see [`LICENSE`](LICENSE).\n\nSubdirectories containing code derived from third-party projects retain their\nown licenses:\n\n- [`robothor/LICENSE`](robothor/LICENSE)\n- [`mesatask/LICENSE`](mesatask/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim-uofa%2Fgsi-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faim-uofa%2Fgsi-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim-uofa%2Fgsi-bench/lists"}