https://github.com/blurgyy/compass

[ICCV 2025] Enhancing spatial understanding in text-to-Image diffusion models
https://github.com/blurgyy/compass

diffusion generation spatial-understanding t2i text-to-image

Last synced: 5 months ago
JSON representation

[ICCV 2025] Enhancing spatial understanding in text-to-Image diffusion models

Host: GitHub
URL: https://github.com/blurgyy/compass
Owner: blurgyy
License: apache-2.0
Created: 2024-12-11T11:58:39.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-09-01T03:43:26.000Z (10 months ago)
Last Synced: 2025-09-09T17:39:15.762Z (10 months ago)
Topics: diffusion, generation, spatial-understanding, t2i, text-to-image
Language: Python
Homepage: https://compass.blurgy.xyz
Size: 721 KB
Stars: 83
Watchers: 6
Forks: 6
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models

**\[[Project Page]\]

\[[arXiv]\]

\[[ComfyUI node]\]**

> [Gaoyang Zhang], Bingtao Fu, [Qingnan Fan], [Qi Zhang], Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu  

> ICCV 2025

## TL; DR

CoMPaSS enhances the spatial understanding of existing text-to-image diffusion models, enabling

them to generate images that faithfully reflect spatial configurations specified in the

text prompt.

![teaser](./assets/teaser.avif)

## Setting up Environment

We manage our python environment with [uv], and provide a convenient script for setting

up the environment at [setup_env.sh](./setup_env.sh).

Running this script will create a subdirectory `.venv/` in the project root.  To enable

it, run `source .venv/bin/activate` after the environment is set up:

```bash

# install requirements into .venv/

bash ./setup_env.sh

# activate the environment

source .venv/bin/activate

```

## Trying out CoMPaSS

> [!NOTE]

> For training, SCOP and TENOR are both required.  

> For generating images from text, only TENOR and the reference weights are needed.

### ComfyUI

We recommend trying out the FLUX.1-dev LoRA trained via CoMPaSS. Please refer to [the

custom node's repository][ComfyUI node] to get started.

### Reference Weights

We provide the reference weights used to report all metrics in our paper on Hugging

Face 🤗.

We recommend trying out the FLUX.1-dev weights as it is a Rank-16 LoRA which is only

50MB in size.

| Model | Link |

|:-----:|:-----:|

| FLUX.1-dev |  |

| SD1.4 |  |

| SD1.5 |  |

| SD2.1 |  |

### The SCOP dataset

We provide full instructions for replicating the SCOP dataset (28,028 object pairs among

15,426 images) in the [SCOP](./SCOP) directory.  Check out its [README](./SCOP/README.md)

to get started.

### The TENOR Module

We provide both training and inference instructions for using our TENOR module in the

[TENOR](./TENOR) directory.

MMDiT-based models (e.g., FLUX.1-dev) and UNet-based models (e.g., SD1.5) are both

supported.  Check out their respective instructions to get started:

- [Instructions for FLUX.1-dev](./TENOR/flux/README.md)

- [Instructions for SD1.4, SD1.5, and SD2.1](./TENOR/sd/README.md)

## Citation

```bibtex

@inproceedings{zhang2025compass,

  title={CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},

  author={Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},

  booktitle={ICCV},

  year={2025}

}

```

[Gaoyang Zhang]: 

[Qingnan Fan]: 

[Qi Zhang]: 

[Project Page]: 

[arXiv]: 

[ComfyUI node]: 

[uv]: 

[TokenCompose]: 

[x-flux]:

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/blurgyy/compass

Awesome Lists containing this project

README