https://github.com/jiuntian/interactdiffusion
[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
https://github.com/jiuntian/interactdiffusion
diffusion-models generative-ai image-generation interactdiffusion stable-diffusion
Last synced: 3 months ago
JSON representation
[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
- Host: GitHub
- URL: https://github.com/jiuntian/interactdiffusion
- Owner: jiuntian
- Created: 2023-12-09T00:44:25.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-26T06:37:13.000Z (12 months ago)
- Last Synced: 2024-08-01T18:37:46.491Z (11 months ago)
- Topics: diffusion-models, generative-ai, image-generation, interactdiffusion, stable-diffusion
- Language: Python
- Homepage: https://jiuntian.github.io/interactdiffusion/
- Size: 14.8 MB
- Stars: 84
- Watchers: 3
- Forks: 6
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
# InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model
[Jiun Tian Hoe](https://jiuntian.com/), [Xudong Jiang](https://personal.ntu.edu.sg/exdjiang/),
[Chee Seng Chan](http://cs-chan.com), [Yap Peng Tan](https://personal.ntu.edu.sg/eyptan/),
[Weipeng Hu](https://scholar.google.com/citations?user=zo6ni_gAAAAJ)[Project Page](https://jiuntian.github.io/interactdiffusion) |
[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Hoe_InteractDiffusion_Interaction_Control_in_Text-to-Image_Diffusion_Models_CVPR_2024_paper.html) |
[arXiv](https://arxiv.org/abs/2312.05849) |
[WebUI](https://github.com/jiuntian/sd-webui-interactdiffusion) |
[Demo](https://huggingface.co/spaces/interactdiffusion/interactdiffusion) |
[Video](https://www.youtube.com/watch?v=Uunzufq8m6Y) |
[Diffuser](https://huggingface.co/interactdiffusion/diffusers-v1-2) |
[Colab](https://colab.research.google.com/drive/1Bh9PjfTylxI2rbME5mQJtFqNTGvaghJq?usp=sharing)[](https://arxiv.org/abs/2312.05849)
[](https://badges.toozhao.com/stats/01HH1JE53YX5TDDDDCG6PXY8WQ "Get your own page views count badge on badges.toozhao.com")
[](https://huggingface.co/spaces/interactdiffusion/interactdiffusion)
[](https://colab.research.google.com/drive/1Bh9PjfTylxI2rbME5mQJtFqNTGvaghJq?usp=sharing)
- Existing methods lack ability to control the interactions between objects in the generated content.
- We propose a pluggable interaction control model, called InteractDiffusion that extends existing pre-trained T2I diffusion models to enable them being better conditioned on interactions.## News
- **[2025.1.20]** SDXL version is available at [huggingface](https://huggingface.co/jiuntian/interactdiffusion-xl-1024/). We also release SDXL version of GLIGEN as part of the training [here](https://github.com/jiuntian/igligen-xl).
- **[2024.3.13]** Diffusers code is available at [here](https://huggingface.co/interactdiffusion/diffusers-v1-2).
- **[2024.3.8]** Demo is available at [Huggingface Spaces](https://huggingface.co/spaces/interactdiffusion/interactdiffusion).
- **[2024.3.6]** Code is released.
- **[2024.2.27]** InteractionDiffusion paper is accepted at CVPR 2024.
- **[2023.12.12]** InteractionDiffusion paper is released. WebUI of InteractDiffusion is available as *alpha* version.## Results
Model
Interaction Controllability
FID
KID
Tiny
Large
v1.0
29.53
31.56
18.69
0.00676
v1.1
30.20
31.96
17.90
0.00635
v1.2
30.73
33.10
17.32
0.00585
XLv1.0
--
35.60
16.65
0.00560
Interaction Controllability is measured using FGAHOI detection score. In this table, we measure the Full subset in Default setting on Swin-Tiny and Swin-Large backbone. More details on the protocol is in the paper.
## Download InteractDiffusion models
We provide three checkpoints with different training strategies.
| Version | Dataset | SD |Download |
|---------|------------|----|---------|
| v1.0 | HICO-DET | v1.4| [HF Hub](https://huggingface.co/jiuntian/interactiondiffusion-weight/blob/main/interact-diffusion-v1.pth) |
| v1.1 | HICO-DET | v1.5| [HF Hub](https://huggingface.co/jiuntian/interactiondiffusion-weight/blob/main/interact-diffusion-v1-1.pth) |
| v1.2 | HICO-DET + VisualGenome | v1.5| [HF Hub](https://huggingface.co/jiuntian/interactiondiffusion-weight/blob/main/interact-diffusion-v1-2.pth) |
| XL v1.0 | HICO-DET | XL | [HF Hub](https://huggingface.co/jiuntian/interactdiffusion-xl-1024/) |Note that the experimental results in our paper is referring to v1.0.
- v1.0 is based on Stable Diffusion v1.4 and GLIGEN. We train at batch size of 16 for 250k steps on HICO-DET. **Our paper is based on this.**
- v1.1 is based on Stable Diffusion v1.5 and GLIGEN. We train at batch size of 32 for 250k steps on HICO-DET.
- v1.1 is based on InteractDiffusion v1.1. We train further at batch size of 32 for 172.5k steps on HICO-DET and VisualGenome.
- XL v1.0 is based on StableDiffusion XL v1.0 and GLIGEN-XL (which we have trained it). We train InteractDiffusion XL at batch size of 32 for 250k steps on HICO-DET, at 512x512 resolution. More details is coming soon.## Extension for AutomaticA111's Stable Diffusion WebUI
We develop an AutomaticA111's Stable Diffuion WebUI extension to allow the use of InteractDiffusion over existing SD models. Check out the plugin at [sd-webui-interactdiffusion](https://github.com/jiuntian/sd-webui-interactdiffusion). Note that it is still on `alpha` version.
### Gallery
Some examples generated with InteractDiffusion, together with other DreamBooth and LoRA models.
| | |
--- | --- | --- | ---
 |  |  | 
|||
|||## Diffusers
```python
from diffusers import DiffusionPipeline
import torchpipeline = DiffusionPipeline.from_pretrained(
"interactdiffusion/diffusers-v1-2",
trust_remote_code=True,
variant="fp16", torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")images = pipeline(
prompt="a person is feeding a cat",
interactdiffusion_subject_phrases=["person"],
interactdiffusion_object_phrases=["cat"],
interactdiffusion_action_phrases=["feeding"],
interactdiffusion_subject_boxes=[[0.0332, 0.1660, 0.3359, 0.7305]],
interactdiffusion_object_boxes=[[0.2891, 0.4766, 0.6680, 0.7930]],
interactdiffusion_scheduled_sampling_beta=1,
output_type="pil",
num_inference_steps=50,
).imagesimages[0].save('out.jpg')
```## Reproduce & Evaluate
1. Change `ckpt.pth` in interence_batch.py to selected checkpoint.
2. Made inference on InteractDiffusion to synthesis the test set of HICO-DET based on the ground truth.```bash
python inference_batch.py --batch_size 1 --folder generated_output --seed 489 --scheduled-sampling 1.0 --half
```
3. Setup FGAHOI at `../FGAHOI`. See [FGAHOI repo](https://github.com/xiaomabufei/FGAHOI) on how to setup FGAHOI and also HICO-DET dataset in `data/hico_20160224_det`.
4. Prepare for evaluate on FGAHOI. See `id_prepare_inference.ipynb`
5. Evaluate on FGAHOI.```bash
python main.py --backbone swin_tiny --dataset_file hico --resume weights/FGAHOI_Tiny.pth --num_verb_classes 117 --num_obj_classes 80 --output_dir logs --merge --hierarchical_merge --task_merge --eval --hoi_path data/id_generated_output --pretrain_model_path "" --output_dir logs/id-generated-output-t
```6. Evaluate for FID and KID. We recommend to resize hico_det dataset to 512x512 before perform image quality evaluation, for a fair comparison. We use [torch-fidelity](https://github.com/toshas/torch-fidelity).
```bash
fidelity --gpu 0 --fid --isc --kid --input2 ~/data/hico_det_test_resize --input1 ~/FGAHOI/data/data/id_generated_output/images/test2015
```7. This should provide a brief overview of how the evaluation process works.
## Training
1. Prepare the necessary dataset and pretrained models, see [DATA](DATA/readme.md)
2. Run the following command:```bash
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 main.py --yaml_file configs/hoi_hico_text.yaml --ckpt --name test --batch_size=4 --gradient_accumulation_step 2 --total_iters 500000 --amp true --disable_inference_in_training true --official_ckpt_name
```## TODO
- [x] Code Release
- [x] HuggingFace demo
- [x] WebUI extension
- [x] Diffuser## Citation
```bibtex
@InProceedings{Hoe_2024_CVPR,
author = {Hoe, Jiun Tian and Jiang, Xudong and Chan, Chee Seng and Tan, Yap-Peng and Hu, Weipeng},
title = {InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {6180-6189}
}
```## Acknowledgement
This work is developed based on the codebase of [GLIGEN](https://github.com/gligen/GLIGEN) and [LDM](https://github.com/CompVis/latent-diffusion).