https://github.com/sungnyun/diffblender
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
https://github.com/sungnyun/diffblender
diffusion generative-model multimodal text-to-image
Last synced: 30 days ago
JSON representation
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
- Host: GitHub
- URL: https://github.com/sungnyun/diffblender
- Owner: sungnyun
- License: apache-2.0
- Created: 2023-05-24T07:42:08.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-21T13:26:55.000Z (over 1 year ago)
- Last Synced: 2025-03-25T22:21:33.547Z (about 2 months ago)
- Topics: diffusion, generative-model, multimodal, text-to-image
- Language: Python
- Homepage: https://sungnyun.github.io/diffblender
- Size: 9.31 MB
- Stars: 46
- Watchers: 7
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
# DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models 🔥
- **DiffBlender** successfully synthesizes complex combinations of input modalities. It enables flexible manipulation of conditions, providing the customized generation aligned with user preferences.
- We designed its structure to intuitively extend to additional modalities while achieving a low training cost through a partial update of hypernetworks.
![]()
## 🗓️ TODOs
- [x] Project page is open: [link](https://sungnyun.github.io/diffblender/)
- [x] DiffBlender model: code & checkpoint
- [x] Release inference code
- [ ] Release training code & pipeline
- [ ] Gradio UI## 🚀 Getting Started
Install the necessary packages with:
```sh
$ pip install -r requirements.txt
```Download DiffBlender model checkpoint from this [Huggingface model](https://huggingface.co/sungnyun/diffblender), and place it under `./diffblender_checkpoints/`.
Also, prepare the SD model from this [link](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) (we used CompVis/sd-v1-4.ckpt).## ⚡️ Try Multimodal T2I Generation with DiffBlender
```sh
$ python inference.py --ckpt_path=./diffblender_checkpoints/{CKPT_NAME}.pth \
--official_ckpt_path=/path/to/sd-v1-4.ckpt \
--save_name={SAVE_NAME}
```Results will be saved under `./inference/{SAVE_NAME}/`, in the format as {conditions + generated image}.
## BibTeX
```
@article{kim2023diffblender,
title={DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models},
author={Kim, Sungnyun and Lee, Junsoo and Hong, Kibeom and Kim, Daesik and Ahn, Namhyuk},
journal={arXiv preprint arXiv:2305.15194},
year={2023}
}
```