https://github.com/sungnyun/diffblender

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
https://github.com/sungnyun/diffblender

diffusion generative-model multimodal text-to-image

Last synced: 3 months ago
JSON representation

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

Host: GitHub
URL: https://github.com/sungnyun/diffblender
Owner: sungnyun
License: apache-2.0
Created: 2023-05-24T07:42:08.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-12-21T13:26:55.000Z (over 1 year ago)
Last Synced: 2025-03-25T22:21:33.547Z (4 months ago)
Topics: diffusion, generative-model, multimodal, text-to-image
Language: Python
Homepage: https://sungnyun.github.io/diffblender
Size: 9.31 MB
Stars: 46
Watchers: 7
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

        # DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models 🔥



   



- **DiffBlender** successfully synthesizes complex combinations of input modalities. It enables flexible manipulation of conditions, providing the customized generation aligned with user preferences. 

- We designed its structure to intuitively extend to additional modalities while achieving a low training cost through a partial update of hypernetworks. 







## 🗓️ TODOs

- [x] Project page is open: [link](https://sungnyun.github.io/diffblender/)

- [x] DiffBlender model: code & checkpoint

- [x] Release inference code

- [ ] Release training code & pipeline

- [ ] Gradio UI

## 🚀 Getting Started

Install the necessary packages with:

```sh

$ pip install -r requirements.txt

```

Download DiffBlender model checkpoint from this [Huggingface model](https://huggingface.co/sungnyun/diffblender), and place it under `./diffblender_checkpoints/`.    

Also, prepare the SD model from this [link](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) (we used CompVis/sd-v1-4.ckpt).

## ⚡️ Try Multimodal T2I Generation with DiffBlender

```sh

$ python inference.py --ckpt_path=./diffblender_checkpoints/{CKPT_NAME}.pth \

                      --official_ckpt_path=/path/to/sd-v1-4.ckpt \

                      --save_name={SAVE_NAME} 

```

Results will be saved under `./inference/{SAVE_NAME}/`, in the format as {conditions + generated image}.

 

## BibTeX

```

@article{kim2023diffblender,

  title={DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models},

  author={Kim, Sungnyun and Lee, Junsoo and Hong, Kibeom and Kim, Daesik and Ahn, Namhyuk},

  journal={arXiv preprint arXiv:2305.15194},

  year={2023}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sungnyun/diffblender

Awesome Lists containing this project

README