Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Nithin-GK/MaxFusion

[ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models
https://github.com/Nithin-GK/MaxFusion

diffusion-models multimodal plug-and-play txt2img

Last synced: 5 days ago
JSON representation

[ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models

Awesome Lists containing this project

README

        

MaxFusion: Plug & Play multimodal generation in text to image diffusion models

If you like our project, please give us a star ⭐ on GitHub for latest update.



[![hf_space](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)]()
[![project page](https://img.shields.io/badge/Project%20Page-8A2BE2)](https://nithin-gk.github.io/maxfusion.github.io/)
[![arXiv](https://img.shields.io/badge/Arxiv-2404.09977-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2404.09977)

## Applications

Keywords: Multimodal Generation, Text to image generation, Plug and Play

We propose **MaxFusion**, a plug and play framework for multimodal generation using text to image diffusion models.
*(a) Multimodal generation*. We address the problem of conflicting spatial conditioning for text to iamge models .
*(b) Saliency in variance maps*. We discover that the variance maps of different feature layers expresses the strength og conditioning.


### Contributions:

- We tackle the need for training with paired data for multi-task conditioning using diffusion models.
- We propose a novel variance-based feature merging strategy for diffusion models.
- Our method allows us to use combined information to influence the output, unlike individual models that are limited to a single condition.
- Unlike previous solutions, our approach is easily scalable and can be added on top of off-the-shelf models.

## Environment setup

```
conda env create -f environment.yml
```

## Code demo:

A notebook for differnt demo conditions is provided in demo.ipynb

# Testing On custom datasets

Will be released shortly

## Instructions for Interactive Demo

An intractive demo can be run locally using

```
python gradio_maxfusion.py

```

This code is reliant on:
```
https://github.com/google/prompt-to-prompt/
```

## Citation
5. If you use our work, please use the following citation

```
@article{nair2024maxfusion,
title={MaxFusion: Plug\&Play Multi-Modal Generation in Text-to-Image Diffusion Models},
author={Nair, Nithin Gopalakrishnan and Valanarasu, Jeya Maria Jose and Patel, Vishal M},
journal={arXiv preprint arXiv:2404.09977},
year={2024}
}
```