Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/measure-infinity/mulan-code

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/measure-infinity/mulan-code
Owner: measure-infinity
Created: 2024-02-16T12:22:41.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-07-16T12:56:35.000Z (7 months ago)
Last Synced: 2024-08-01T18:35:29.359Z (6 months ago)
Language: Python
Size: 15.3 MB
Stars: 36
Watchers: 1
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

        # MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion

![Stable-Diffusion](https://img.shields.io/badge/Model-SD-blue)

![SDXL](https://img.shields.io/badge/Model-SDXL-blue)

![GPT4](https://img.shields.io/badge/Model-GPT4-blue)

![GPT4V](https://img.shields.io/badge/Model-GPT4V-blue)

![Gemini-Pro](https://img.shields.io/badge/Model-Gemini--Pro-blue)

![LLaVA](https://img.shields.io/badge/Model-LLaVA-blue)

![Generation](https://img.shields.io/badge/Task-Controllable%20Generation-red)

Sen Li, [Ruochen Wang](https://ruocwang.github.io/), [Cho-Jui Hsieh](https://web.cs.ucla.edu/~chohsieh/), [Minhao Cheng](https://cmhcbb.github.io/), [Tianyi Zhou](https://tianyizhou.github.io/)

### **ARC**-**A**IGC **R**esearch **C**ollaboration

HKUST, UCLA, PSU, UMD

[Paper](https://arxiv.org/pdf/2402.12741.pdf), [Project website](https://measure-infinity.github.io/mulan), [Code](https://github.com/measure-infinity/mulan-code)

![Main Framework](./assets/framework.png)

![Main Visualization](./assets/illustration.png)

## TODO

- [x] MuLan with SD v1.4

- [x] MuLan with SDXL

## More visualization results

![More results](./assets/qualitative1.png)

## Progressive multi-object diffusion

### Installation

```

git clone https://github.com/measure-infinity/mulan-code

cd mulan-code

conda create -n mulan python=3.10 -y

conda activate mulan

pip install -r ./requirements.txt

pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers

pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip

```

Configuring LLaVA (default VLM in the code)

```

git clone https://github.com/haotian-liu/LLaVA.git

cd LLaVA

pip install -e .

```

### Demo

Please modify you own GPT-4 API key in `query.py`, which is utilized for planning during the generation process. We recommend GPT-4 for the planning which is the default model in the code.

**SD-v1.4**

Please download the weights of Stable Diffusion v1.4 [here](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4-full-ema.ckpt) and put it into the folder `sd-models`.

To generate an image with a complex prompt, first `cd scripts`, and then run

```python

from pipeline_sd import mulan_sd

mulan_sd(prompt="a black headphone is on the left of a green phone", seed=42, sd_model="../sd-models/sd-v1-4-full-ema.ckpt")

```

`seed`: Random seed, `prompt`: User prompt

The results will be saved in `outputs` by default. You can easily adjust the hyper-parameters of the [backward guidance](https://github.com/silent-chen/layout-guidance), `weight` (110. by default) and `thresh` (0.15 by default), to see how the results will change. 

**SDXL**

Please download the weights of SDXL [here](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and put it into the folder `sd-models`. Currently we use DDIM sampler for the generation instead of the original one. Please replace the corresponding config files in the downloaded models with the files in `sdxl_configs`.

Please uninstall the library `diffusers` if you have one in the current environment. The code contains the modified library `diffusers`.

To generate an image with a complex prompt, first `cd scripts`, and then run

```python

from pipeline_sdxl import mulan_sdxl

mulan_sdxl(prompt="a black headphone is on the left of a green phone", seed=42)

```

`seed`: Random seed, `propmt`: User prompt

The results will be saved in `sdxl_outputs` by default.

## Bibtex

```

@misc{li2024mulan,

    title={MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion},

    author={Li, Sen and Wang, Ruochen and Hsieh, Cho-jui and Cheng, Minhao and Zhou, Tianyi},

    publisher={arXiv:2402.12741},

    year={2024},

}

```

## Acknowledgements

1. [Stable Diffusion](https://github.com/CompVis/stable-diffusion)

2. [Backward Guidance](https://github.com/silent-chen/layout-guidance)

3. [LLaVA](https://github.com/haotian-liu/LLaVA)