https://github.com/Nithin-GK/MaxFusion

[ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models
https://github.com/Nithin-GK/MaxFusion

diffusion-models model-merging multimodal plug-and-play txt2img

Last synced: 3 months ago
JSON representation

[ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models

Host: GitHub
URL: https://github.com/Nithin-GK/MaxFusion
Owner: Nithin-GK
License: mit
Created: 2024-03-18T14:17:13.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-07-23T17:10:21.000Z (11 months ago)
Last Synced: 2024-10-31T00:39:54.467Z (8 months ago)
Topics: diffusion-models, model-merging, multimodal, plug-and-play, txt2img
Language: Jupyter Notebook
Homepage: https://nithin-gk.github.io/maxfusion.github.io/
Size: 8.7 MB
Stars: 18
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

MaxFusion: Plug & Play multimodal generation in text to image diffusion models

If you like our project, please give us a star ⭐ on GitHub for latest update.

[![hf_space](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)]()
[![project page](https://img.shields.io/badge/Project%20Page-8A2BE2)](https://nithin-gk.github.io/maxfusion.github.io/)
[![arXiv](https://img.shields.io/badge/Arxiv-2404.09977-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2404.09977)

## Applications

Keywords: Multimodal Generation, Text to image generation, Plug and Play

We propose MaxFusion, a plug and play framework for multimodal generation using text to image diffusion models.
(a) Multimodal generation. We address the problem of conflicting spatial conditioning for text to iamge models .
(b) Saliency in variance maps. We discover that the variance maps of different feature layers expresses the strength og conditioning.

### Contributions:

- We tackle the need for training with paired data for multi-task conditioning using diffusion models.
- We propose a novel variance-based feature merging strategy for diffusion models.
- Our method allows us to use combined information to influence the output, unlike individual models that are limited to a single condition.
- Unlike previous solutions, our approach is easily scalable and can be added on top of off-the-shelf models.

## Environment setup

```
conda env create -f environment.yml
```

## Code demo:

A notebook for differnt demo conditions is provided in demo.ipynb

# Testing On custom datasets

Will be released shortly

## Instructions for Interactive Demo

An intractive demo can be run locally using

```
python gradio_maxfusion.py

```

This code is reliant on:
```
https://github.com/google/prompt-to-prompt/
```

## Citation
5. If you use our work, please use the following citation

```
@article{nair2024maxfusion,
title={MaxFusion: Plug\&Play Multi-Modal Generation in Text-to-Image Diffusion Models},
author={Nair, Nithin Gopalakrishnan and Valanarasu, Jeya Maria Jose and Patel, Vishal M},
journal={arXiv preprint arXiv:2404.09977},
year={2024}
}
```