https://github.com/blurgyy/compass
[ICCV 2025] Enhancing spatial understanding in text-to-Image diffusion models
https://github.com/blurgyy/compass
diffusion generation spatial-understanding t2i text-to-image
Last synced: 5 months ago
JSON representation
[ICCV 2025] Enhancing spatial understanding in text-to-Image diffusion models
- Host: GitHub
- URL: https://github.com/blurgyy/compass
- Owner: blurgyy
- License: apache-2.0
- Created: 2024-12-11T11:58:39.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-09-01T03:43:26.000Z (10 months ago)
- Last Synced: 2025-09-09T17:39:15.762Z (10 months ago)
- Topics: diffusion, generation, spatial-understanding, t2i, text-to-image
- Language: Python
- Homepage: https://compass.blurgy.xyz
- Size: 721 KB
- Stars: 83
- Watchers: 6
- Forks: 6
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
**\[[Project Page]\]
\[[arXiv]\]
\[[ComfyUI node]\]**
> [Gaoyang Zhang], Bingtao Fu, [Qingnan Fan], [Qi Zhang], Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu
> ICCV 2025
## TL; DR
CoMPaSS enhances the spatial understanding of existing text-to-image diffusion models, enabling
them to generate images that faithfully reflect spatial configurations specified in the
text prompt.

## Setting up Environment
We manage our python environment with [uv], and provide a convenient script for setting
up the environment at [setup_env.sh](./setup_env.sh).
Running this script will create a subdirectory `.venv/` in the project root. To enable
it, run `source .venv/bin/activate` after the environment is set up:
```bash
# install requirements into .venv/
bash ./setup_env.sh
# activate the environment
source .venv/bin/activate
```
## Trying out CoMPaSS
> [!NOTE]
> For training, SCOP and TENOR are both required.
> For generating images from text, only TENOR and the reference weights are needed.
### ComfyUI
We recommend trying out the FLUX.1-dev LoRA trained via CoMPaSS. Please refer to [the
custom node's repository][ComfyUI node] to get started.
### Reference Weights
We provide the reference weights used to report all metrics in our paper on Hugging
Face 🤗.
We recommend trying out the FLUX.1-dev weights as it is a Rank-16 LoRA which is only
50MB in size.
| Model | Link |
|:-----:|:-----:|
| FLUX.1-dev | |
| SD1.4 | |
| SD1.5 | |
| SD2.1 | |
### The SCOP dataset
We provide full instructions for replicating the SCOP dataset (28,028 object pairs among
15,426 images) in the [SCOP](./SCOP) directory. Check out its [README](./SCOP/README.md)
to get started.
### The TENOR Module
We provide both training and inference instructions for using our TENOR module in the
[TENOR](./TENOR) directory.
MMDiT-based models (e.g., FLUX.1-dev) and UNet-based models (e.g., SD1.5) are both
supported. Check out their respective instructions to get started:
- [Instructions for FLUX.1-dev](./TENOR/flux/README.md)
- [Instructions for SD1.4, SD1.5, and SD2.1](./TENOR/sd/README.md)
## Citation
```bibtex
@inproceedings{zhang2025compass,
title={CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
author={Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
booktitle={ICCV},
year={2025}
}
```
[Gaoyang Zhang]:
[Qingnan Fan]:
[Qi Zhang]:
[Project Page]:
[arXiv]:
[ComfyUI node]:
[uv]:
[TokenCompose]:
[x-flux]: