https://github.com/xiefan-guo/initno
[CVPR 2024] InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
https://github.com/xiefan-guo/initno
diffusion-models image-generation text-to-image
Last synced: 7 months ago
JSON representation
[CVPR 2024] InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
- Host: GitHub
- URL: https://github.com/xiefan-guo/initno
- Owner: xiefan-guo
- License: apache-2.0
- Created: 2024-04-07T05:43:54.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-07T12:48:15.000Z (over 1 year ago)
- Last Synced: 2024-08-01T18:37:25.822Z (about 1 year ago)
- Topics: diffusion-models, image-generation, text-to-image
- Language: Python
- Homepage: https://xiefan-guo.github.io/initno
- Size: 25.7 MB
- Stars: 23
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
## Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Official PyTorch code release for the CVPR 2024 paper: https://arxiv.org/abs/2404.04650

**InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization**
Xiefan Guo, Jinlin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, Di Huang
https://xiefan-guo.github.io/initno
Abstract: *Our investigation dives into the exploration of various random noise configurations and their subsequent influence on the generated results. Notably, when different noises are input into SD under identical text prompts, there are marked discrepancy in the alignment between the generated image and the given text. Unsuccessful cases are delineated by gray contours, while successful instances are indicated by yellow contours. This observation underscores the pivotal role of initial noise in determining the success of the generation process. Based on this insight, we divide the initial noise space into valid and invalid regions. Introducing Initial Noise Optimization (InitNO), identified as orange arrow, our method is capable of guiding any initial noise into the valid region, thereby synthesizing high-fidelity results (orange contours) that precisely correspond to the given prompt. The same location employs the same random seed.*
## Requirements
* Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
* All experiments are conducted on a single NVIDIA V100 GPU (32 GB).
## Getting started
**Python libraries:** See environment.yml for exact library dependencies. You can use the following commands to create and activate your InitNO Python environment:
```.bash
# Create conda environment
conda env create -f environment.yaml
# Activate conda environment
conda activate initno_env
```
**Generating images:** Our code relies on Hugging Face's [diffusers](https://github.com/huggingface/diffusers) library for downloading the Stable Diffusion model. Run the following command to generate images.
```.bash
python run_sd_initno.py
```
You can specify the following arguments in `run_sd_initno.py`:
* `SEEDS`: a list of random seeds
* `PROMPT`: text prompt for image generation
* `token_indices`: a list of target token indices
* `result_root`: path to save generated results
**Visualization of attention maps:** We provide the `fn_show_attention` function in `attn_utils.py` for attention map visualization. By running the above command, you will be able to obtain the visualization of attention maps along with the generated images.
**Float16 precision:** You can use `torch.float16` when loading the stable diffusion model to speed up inference and reduce memory usage. However, this may somewhat degrade the quality of the generated results.
```python
pipe = StableDiffusionInitNOPipeline.from_pretrained(SD14_VERSION, torch_dtype=torch.float16).to("cuda")
```
## Citation
```bibtex
@inproceedings{guo2024initno,
title = {InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization},
author = {Guo, Xiefan and Liu, Jinlin and Cui, Miaomiao and Li, Jiankai and Yang, Hongyu and Huang, Di},
booktitle = {CVPR},
year = {2024}
}
```
## Acknowledgments
The code is built upon [diffusers](https://github.com/huggingface/diffusers) and [Attend-and-Excite](https://github.com/yuval-alaluf/Attend-and-Excite), we thank all the contributors for open-sourcing.