https://github.com/garibida/cross-image-attention

Officail Implementation for "Cross-Image Attention for Zero-Shot Appearance Transfer"
https://github.com/garibida/cross-image-attention

appearance-transfer diffusion-models stable-diffusion style-transfer text-to-image

Last synced: 7 months ago
JSON representation

Officail Implementation for "Cross-Image Attention for Zero-Shot Appearance Transfer"

Host: GitHub
URL: https://github.com/garibida/cross-image-attention
Owner: garibida
License: mit
Created: 2023-11-04T19:28:41.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-05-05T11:44:34.000Z (over 1 year ago)
Last Synced: 2024-08-01T18:37:58.652Z (over 1 year ago)
Topics: appearance-transfer, diffusion-models, stable-diffusion, style-transfer, text-to-image
Language: Python
Homepage: https://garibida.github.io/cross-image-attention/
Size: 64.6 MB
Stars: 286
Watchers: 18
Forks: 20
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

# Cross-Image Attention for Zero-Shot Appearance Transfer (SIGGRAPH 2024)

> Yuval Alaluf*, Daniel Garibi*, Or Patashnik, Hadar Averbuch-Elor, Daniel Cohen-Or
> Tel Aviv University
> \* Denotes equal contribution
>
> Recent advancements in text-to-image generative models have demonstrated a remarkable ability to capture a deep semantic understanding of images. In this work, we leverage this semantic knowledge to transfer the visual appearance between objects that share similar semantics but may differ significantly in shape. To achieve this, we build upon the self-attention layers of these generative models and introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. Specifically, given a pair of images ––– one depicting the target structure and the other specifying the desired appearance ––– our cross-image attention combines the queries corresponding to the structure image with the keys and values of the appearance image. This operation, when applied during the denoising process, leverages the established semantic correspondences to generate an image combining the desired structure and appearance. In addition, to improve the output image quality, we harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process. Importantly, our approach is zero-shot, requiring no optimization or training. Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint between the two input images.

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/yuvalalaluf/cross-image-attention)

Given two images depicting a source structure and a target appearance, our method generates an image merging the structure of one image with the appearance of the other in a zero-shot manner.

## Description
Official implementation of our Cross-Image Attention and Appearance Transfer paper.

## Environment
Our code builds on the requirement of the `diffusers` library. To set up their environment, please run:
```
conda env create -f environment/environment.yaml
conda activate cross_image
```

## Usage

Sample appearance transfer results obtained by our cross-image attention technique.

To generate an image, you can simply run the `run.py` script. For example,
```
python run.py \
--app_image_path /path/to/appearance/image.png \
--struct_image_path /path/to/structure/image.png \
--output_path /path/to/output/images.png \
--domain_name [domain the objects are taken from (e.g., animal, building)] \
--use_masked_adain True \
--contrast_strength 1.67 \
--swap_guidance_scale 3.5 \
```
Notes:
- To perform the inversion, if no prompt is specified explicitly, we will use the prompt `"A photo of a [domain_name]"`
- If `--use_masked_adain` is set to `True` (its default value), then `--domain_name` must be given in order
to compute the masks using the self-segmentation technique.
- In cases where the domains are not well-defined, you can also set `--use_masked_adain` to `False` and
no `domain_name` is required.
- You can set `--load_latents` to `True` to load the latents from a file instead of inverting the input images every time.
- This is useful if you want to generate multiple images with the same structure but different appearances.

### Demo Notebook

Additional appearance transfer results obtained by our cross-image attention technique.

We also provide a notebook to run in Google Colab, please see `notebooks/demo.ipynb`.

## HuggingFaceDemo :hugs:
We also provide a simple HuggingFace demo to run our method on your own images.
Check it out [here](https://huggingface.co/spaces/yuvalalaluf/cross-image-attention)!

## Acknowledgements
This code builds on the code from the [diffusers](https://github.com/huggingface/diffusers) library. In addition, we
borrow code from the following repositories:
- [Edit-Friendly DDPM Inversion](https://github.com/inbarhub/DDPM_inversion) for inverting the input images.
- [Prompt Mixing](https://github.com/orpatashnik/local-prompt-mixing) for computing the masks used in our AdaIN operation.
- [FreeU](https://github.com/ChenyangSi/FreeU) for improving the general generation quality of Stable Diffusion.

## Citation
If you use this code for your research, please cite the following work:
```
@misc{alaluf2023crossimage,
title={Cross-Image Attention for Zero-Shot Appearance Transfer},
author={Yuval Alaluf and Daniel Garibi and Or Patashnik and Hadar Averbuch-Elor and Daniel Cohen-Or},
year={2023},
eprint={2311.03335},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/garibida/cross-image-attention

Awesome Lists containing this project

README