https://github.com/jiuntian/interactdiffusion

[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
https://github.com/jiuntian/interactdiffusion

diffusion-models generative-ai image-generation interactdiffusion stable-diffusion

Last synced: 6 months ago
JSON representation

[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".

Host: GitHub
URL: https://github.com/jiuntian/interactdiffusion
Owner: jiuntian
Created: 2023-12-09T00:44:25.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-06-26T06:37:13.000Z (over 1 year ago)
Last Synced: 2024-08-01T18:37:46.491Z (about 1 year ago)
Topics: diffusion-models, generative-ai, image-generation, interactdiffusion, stable-diffusion
Language: Python
Homepage: https://jiuntian.github.io/interactdiffusion/
Size: 14.8 MB
Stars: 84
Watchers: 3
Forks: 6
Open Issues: 8
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

          
# InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

[Jiun Tian Hoe](https://jiuntian.com/), [Xudong Jiang](https://personal.ntu.edu.sg/exdjiang/),

[Chee Seng Chan](http://cs-chan.com), [Yap Peng Tan](https://personal.ntu.edu.sg/eyptan/),

[Weipeng Hu](https://scholar.google.com/citations?user=zo6ni_gAAAAJ)

[Project Page](https://jiuntian.github.io/interactdiffusion) |

 [paper](https://openaccess.thecvf.com/content/CVPR2024/html/Hoe_InteractDiffusion_Interaction_Control_in_Text-to-Image_Diffusion_Models_CVPR_2024_paper.html) |

 [arXiv](https://arxiv.org/abs/2312.05849) |

 [WebUI](https://github.com/jiuntian/sd-webui-interactdiffusion) |

 [Demo](https://huggingface.co/spaces/interactdiffusion/interactdiffusion) |

 [Video](https://www.youtube.com/watch?v=Uunzufq8m6Y) |

 [Diffuser](https://huggingface.co/interactdiffusion/diffusers-v1-2) |

 [Colab](https://colab.research.google.com/drive/1Bh9PjfTylxI2rbME5mQJtFqNTGvaghJq?usp=sharing)

[![Paper](https://img.shields.io/badge/cs.CV-arxiv:2312.05849-B31B1B.svg)](https://arxiv.org/abs/2312.05849)

[![Page Views Count](https://badges.toozhao.com/badges/01HH1JE53YX5TDDDDCG6PXY8WQ/blue.svg)](https://badges.toozhao.com/stats/01HH1JE53YX5TDDDDCG6PXY8WQ "Get your own page views count badge on badges.toozhao.com")

[![Hugging Face](https://img.shields.io/badge/InteractDiffusion-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/interactdiffusion/interactdiffusion)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Bh9PjfTylxI2rbME5mQJtFqNTGvaghJq?usp=sharing)

![Teaser figure](docs/static/res/teaser.jpg)

- Existing methods lack ability to control the interactions between objects in the generated content.

- We propose a pluggable interaction control model, called InteractDiffusion that extends existing pre-trained T2I diffusion models to enable them being better conditioned on interactions.

## News

- **[2025.1.20]** SDXL version is available at [huggingface](https://huggingface.co/jiuntian/interactdiffusion-xl-1024/). We also release SDXL version of GLIGEN as part of the training [here](https://github.com/jiuntian/igligen-xl).

- **[2024.3.13]** Diffusers code is available at [here](https://huggingface.co/interactdiffusion/diffusers-v1-2).

- **[2024.3.8]** Demo is available at [Huggingface Spaces](https://huggingface.co/spaces/interactdiffusion/interactdiffusion).

- **[2024.3.6]** Code is released.

- **[2024.2.27]** InteractionDiffusion paper is accepted at CVPR 2024.

- **[2023.12.12]** InteractionDiffusion paper is released. WebUI of InteractDiffusion is available as *alpha* version.

## Results

  

    Model

    Interaction Controllability

    FID

    KID

  

  

    Tiny

    Large

  

  

    v1.0

    29.53

    31.56

    18.69

    0.00676

  

  

    v1.1

    30.20

    31.96

    17.90

    0.00635

  

  

    v1.2

    30.73

    33.10

    17.32

    0.00585

  

 

    XLv1.0

    --

    35.60

    16.65

    0.00560

  

  Interaction Controllability is measured using FGAHOI detection score. In this table, we measure the Full subset in Default setting on Swin-Tiny and Swin-Large backbone. More details on the protocol is in the paper.

## Download InteractDiffusion models

We provide three checkpoints with different training strategies.

| Version | Dataset    | SD |Download |

|---------|------------|----|---------|

| v1.0 | HICO-DET                 | v1.4| [HF Hub](https://huggingface.co/jiuntian/interactiondiffusion-weight/blob/main/interact-diffusion-v1.pth) |

| v1.1 | HICO-DET                 | v1.5| [HF Hub](https://huggingface.co/jiuntian/interactiondiffusion-weight/blob/main/interact-diffusion-v1-1.pth) |

| v1.2 | HICO-DET + VisualGenome  | v1.5| [HF Hub](https://huggingface.co/jiuntian/interactiondiffusion-weight/blob/main/interact-diffusion-v1-2.pth) |

| XL v1.0 | HICO-DET | XL | [HF Hub](https://huggingface.co/jiuntian/interactdiffusion-xl-1024/) |

Note that the experimental results in our paper is referring to v1.0.

- v1.0 is based on Stable Diffusion v1.4 and GLIGEN. We train at batch size of 16 for 250k steps on HICO-DET. **Our paper is based on this.**

- v1.1 is based on Stable Diffusion v1.5 and GLIGEN. We train at batch size of 32 for 250k steps on HICO-DET.

- v1.1 is based on InteractDiffusion v1.1. We train further at batch size of 32 for 172.5k steps on HICO-DET and VisualGenome.

- XL v1.0 is based on StableDiffusion XL v1.0 and GLIGEN-XL (which we have trained it). We train InteractDiffusion XL at batch size of 32 for 250k steps on HICO-DET, at 512x512 resolution. More details is coming soon.

## Extension for AutomaticA111's Stable Diffusion WebUI

We develop an AutomaticA111's Stable Diffuion WebUI extension to allow the use of InteractDiffusion over existing SD models. Check out the plugin at [sd-webui-interactdiffusion](https://github.com/jiuntian/sd-webui-interactdiffusion). Note that it is still on `alpha` version.

### Gallery

Some examples generated with InteractDiffusion, together with other DreamBooth and LoRA models.

 |  |  |  

--- | --- | --- | ---

![image (7)](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/e4ff1279-1b08-41c9-9ea3-45ec3667115e) | ![image (5)](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/dfd254ea-f6fb-4fc4-9fe6-8222fe47ee12) | ![image (6)](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/a6df1288-3315-4738-9db8-d9cb9bd01038) | ![image (4)](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/1766e775-ce6c-4705-a376-4aa8e62bcceb)

![cuteyukimix_1](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/1416f2b6-4907-4ac7-bb03-b5d2b5adcd91)|![cuteyukimix_7](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/7b619e4e-7d0b-4989-85f9-422fbd6a6319)|![darksushimix_1](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/2b81abe3-a39a-4db8-9e7a-63336f96d7e3)|![toonyou_6](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/ce027fac-7840-44cc-9f69-0bdeef5da1da)

![image (8)](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/0bc70ee4-9f84-4340-994c-fbde99a17062)|![cuteyukimix_4](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/0d12f242-cc90-4871-8d2c-02f7c36c70cf)|![darksushimix_5](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/cd716268-92d2-48fa-bbc5-a291c80f7f9a)|![rcnzcartoon_1](https://github.com/jiuntian/sd-webui-interactdiffusion/assets/13869695/ce8c33f1-62fd-4c44-ae76-d5b70b1f05f5)

## Diffusers

```python

from diffusers import DiffusionPipeline

import torch

pipeline = DiffusionPipeline.from_pretrained(

    "interactdiffusion/diffusers-v1-2",

    trust_remote_code=True,

    variant="fp16", torch_dtype=torch.float16

)

pipeline = pipeline.to("cuda")

images = pipeline(

    prompt="a person is feeding a cat",

    interactdiffusion_subject_phrases=["person"],

    interactdiffusion_object_phrases=["cat"],

    interactdiffusion_action_phrases=["feeding"],

    interactdiffusion_subject_boxes=[[0.0332, 0.1660, 0.3359, 0.7305]],

    interactdiffusion_object_boxes=[[0.2891, 0.4766, 0.6680, 0.7930]],

    interactdiffusion_scheduled_sampling_beta=1,

    output_type="pil",

    num_inference_steps=50,

    ).images

images[0].save('out.jpg')

```

## Reproduce & Evaluate

1. Change `ckpt.pth` in interence_batch.py to selected checkpoint.

2. Made inference on InteractDiffusion to synthesis the test set of HICO-DET based on the ground truth.

      ```bash

      python inference_batch.py --batch_size 1 --folder generated_output --seed 489 --scheduled-sampling 1.0 --half

      ```

  

3. Setup FGAHOI at `../FGAHOI`. See [FGAHOI repo](https://github.com/xiaomabufei/FGAHOI) on how to setup FGAHOI and also HICO-DET dataset in `data/hico_20160224_det`.

4. Prepare for evaluate on FGAHOI. See `id_prepare_inference.ipynb`

5. Evaluate on FGAHOI.

      ```bash

      python main.py --backbone swin_tiny --dataset_file hico --resume weights/FGAHOI_Tiny.pth --num_verb_classes 117 --num_obj_classes 80 --output_dir logs  --merge --hierarchical_merge --task_merge --eval --hoi_path data/id_generated_output --pretrain_model_path "" --output_dir logs/id-generated-output-t

      ```

6. Evaluate for FID and KID. We recommend to resize hico_det dataset to 512x512 before perform image quality evaluation, for a fair comparison. We use [torch-fidelity](https://github.com/toshas/torch-fidelity).

      ```bash

      fidelity --gpu 0 --fid --isc --kid --input2 ~/data/hico_det_test_resize  --input1 ~/FGAHOI/data/data/id_generated_output/images/test2015

      ```

7. This should provide a brief overview of how the evaluation process works.

## Training

1. Prepare the necessary dataset and pretrained models, see [DATA](DATA/readme.md)

2. Run the following command:

      ```bash

      CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 main.py --yaml_file configs/hoi_hico_text.yaml --ckpt  --name test --batch_size=4 --gradient_accumulation_step 2 --total_iters 500000 --amp true --disable_inference_in_training true --official_ckpt_name 

      ```

## TODO

- [x] Code Release

- [x] HuggingFace demo

- [x] WebUI extension

- [x] Diffuser

## Citation

```bibtex

@InProceedings{Hoe_2024_CVPR,

    author    = {Hoe, Jiun Tian and Jiang, Xudong and Chan, Chee Seng and Tan, Yap-Peng and Hu, Weipeng},

    title     = {InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models},

    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},

    month     = {June},

    year      = {2024},

    pages     = {6180-6189}

}

```

## Acknowledgement

This work is developed based on the codebase of [GLIGEN](https://github.com/gligen/GLIGEN) and [LDM](https://github.com/CompVis/latent-diffusion).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jiuntian/interactdiffusion

Awesome Lists containing this project

README