Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Nithin-GK/MaxFusion
[ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models
https://github.com/Nithin-GK/MaxFusion
diffusion-models multimodal plug-and-play txt2img
Last synced: 5 days ago
JSON representation
[ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models
- Host: GitHub
- URL: https://github.com/Nithin-GK/MaxFusion
- Owner: Nithin-GK
- License: mit
- Created: 2024-03-18T14:17:13.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-07-23T17:10:21.000Z (3 months ago)
- Last Synced: 2024-08-01T18:35:45.529Z (3 months ago)
- Topics: diffusion-models, multimodal, plug-and-play, txt2img
- Language: Jupyter Notebook
- Homepage: https://nithin-gk.github.io/maxfusion.github.io/
- Size: 8.7 MB
- Stars: 15
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
MaxFusion: Plug & Play multimodal generation in text to image diffusion models
If you like our project, please give us a star ⭐ on GitHub for latest update.
[![hf_space](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)]()
[![project page](https://img.shields.io/badge/Project%20Page-8A2BE2)](https://nithin-gk.github.io/maxfusion.github.io/)
[![arXiv](https://img.shields.io/badge/Arxiv-2404.09977-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2404.09977)
## Applications
Keywords: Multimodal Generation, Text to image generation, Plug and Play
We propose **MaxFusion**, a plug and play framework for multimodal generation using text to image diffusion models.
*(a) Multimodal generation*. We address the problem of conflicting spatial conditioning for text to iamge models .
*(b) Saliency in variance maps*. We discover that the variance maps of different feature layers expresses the strength og conditioning.
### Contributions:
- We tackle the need for training with paired data for multi-task conditioning using diffusion models.
- We propose a novel variance-based feature merging strategy for diffusion models.
- Our method allows us to use combined information to influence the output, unlike individual models that are limited to a single condition.
- Unlike previous solutions, our approach is easily scalable and can be added on top of off-the-shelf models.## Environment setup
```
conda env create -f environment.yml
```## Code demo:
A notebook for differnt demo conditions is provided in demo.ipynb
# Testing On custom datasets
Will be released shortly
## Instructions for Interactive Demo
An intractive demo can be run locally using
```
python gradio_maxfusion.py```
This code is reliant on:
```
https://github.com/google/prompt-to-prompt/
```## Citation
5. If you use our work, please use the following citation```
@article{nair2024maxfusion,
title={MaxFusion: Plug\&Play Multi-Modal Generation in Text-to-Image Diffusion Models},
author={Nair, Nithin Gopalakrishnan and Valanarasu, Jeya Maria Jose and Patel, Vishal M},
journal={arXiv preprint arXiv:2404.09977},
year={2024}
}
```