https://github.com/vchitect/taca
TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
https://github.com/vchitect/taca
Last synced: 12 months ago
JSON representation
TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
- Host: GitHub
- URL: https://github.com/vchitect/taca
- Owner: Vchitect
- Created: 2025-05-27T16:02:07.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-10T02:58:30.000Z (about 1 year ago)
- Last Synced: 2025-06-10T03:33:19.507Z (about 1 year ago)
- Language: Python
- Size: 100 MB
- Stars: 6
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
1The University of Hong Kong
2Nanjing University
3University of Chinese Academy of Sciences
4Nanyang Technological University
5Harbin Institute of Technology
(*Equal Contribution. ‡Project Leader. †Corresponding Author.)
Paper |
Project Page |
LoRA Weights
# About
We propose **TACA**, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.
https://github.com/user-attachments/assets/ae15a853-ee99-4eee-b0fd-8f5f53c308f9
# Usage
For Stable Diffusion 3.5, simply run:
``` sh
python infer/infer_sd3.py
```
For FLUX.1, run:
``` sh
python infer/infer_flux.py
```
# Benchmark
Comparison of alignment evaluation on T2I-CompBench for FLUX.1-Dev-based and SD3.5-Medium-based models.
| Model | Attribute Binding | | | Object Relationship | | Complex $\uparrow$ |
|---|---|---|---|---|---|---|
| | Color $\uparrow$ | Shape $\uparrow$ | Texture $\uparrow$ | Spatial $\uparrow$ | Non-Spatial $\uparrow$ | |
| FLUX.1-Dev | 0.7678 | 0.5064 | 0.6756 | 0.2066 | 0.3035 | 0.4359 |
| FLUX.1-Dev + TACA ($r = 64$) | **0.7843** | **0.5362** | **0.6872** | **0.2405** | 0.3041 | **0.4494** |
| FLUX.1-Dev + TACA ($r = 16$) | 0.7842 | 0.5347 | 0.6814 | 0.2321 | **0.3046** | 0.4479 |
| SD3.5-Medium | 0.7890 | 0.5770 | 0.7328 | 0.2087 | 0.3104 | 0.4441 |
| SD3.5-Medium + TACA ($r = 64$) | **0.8074** | **0.5938** | **0.7522** | **0.2678** | 0.3106 | 0.4470 |
| SD3.5-Medium + TACA ($r = 16$) | 0.7984 | 0.5834 | 0.7467 | 0.2374 | **0.3111** | **0.4505** |
# Showcases



