Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://yandex-research.github.io/invertible-cd/
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
https://yandex-research.github.io/invertible-cd/
consistency-distillation-stable-diffusion image-editing image-generation
Last synced: 3 months ago
JSON representation
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
- Host: GitHub
- URL: https://yandex-research.github.io/invertible-cd/
- Owner: yandex-research
- License: apache-2.0
- Created: 2024-06-16T11:10:23.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-07-04T06:28:02.000Z (7 months ago)
- Last Synced: 2024-08-01T18:33:10.784Z (6 months ago)
- Topics: consistency-distillation-stable-diffusion, image-editing, image-generation
- Language: Python
- Homepage: https://yandex-research.github.io/invertible-cd/
- Size: 23.9 MB
- Stars: 77
- Watchers: 5
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Project
README
# Invertible Consistency Distillation for
Text-Guided Image Editing in Around 7 StepsThis paper proposes **invertible Consistency Distillation**, enabling
1. highly efficient and accurate **text-guided image editing**
2. diverse and high-quality **image generation**
## Table of contents
* [Installation](#installation)
* [Easy-to-run examples](#easy-to-run-examples) (iCD-SD1.5)
* [Generation](#generation-with-sd15)
* [Editing](#editing-with-sd15)
* [Easy-to-run examples](running/sdxl) (iCD-SDXL)
* [In-depth generation and editing](running) (iCD-SDXL and iCD-SD1.5)
* [iCD training example](training) (iCD-SDXL and iCD-SD1.5)
* [Citation](#citation)## Installation
```shell
# Clone a repo
git clone https://github.com/yandex-research/invertible-cd# Create an environment and install packages
conda create -n icd python=3.10 -y
conda activate icdpip3 install -r requirements/req.txt
```
We provide the following checkpoints:1. Guidance distilled diffusion models
- [Stable Diffusion 1.5, 3GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/sd15_cfg_distill.pt.tar.gz
)
- [SDXL, 8.9GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/sdxl_cfg_distill.pt.tar.gz)These models saved as `.pt` files.
2. Invertible Consistency Distillation (_forward_ and _reverse_ CD) on top of the guidance distilled models
| Model | Steps | Time steps |
|-----------------------------------------------------------------------------------------------------------|-------|------------------------------------------------------------------|
| [iCD-SD1.5, 0.5GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/iCD-SD15_4steps_1.tar.gz) | 4 | Reverse: [259, 519, 779, 999];
Forward: [19, 259, 519, 779] |
| [iCD-SD1.5, 0.5GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/iCD-SD15_4steps_2.tar.gz) | 4 | Reverse: [249, 499, 699, 999];
Forward: [19, 249, 499, 699] |
| [iCD-SD1.5, 0.5GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/iCD-SD15_3steps.tar.gz) | 3 | Reverse: [339, 699, 999];
Forward: [19, 339, 699] |
| [iCD-SDXL, 1.4GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/iCD-SDXL_4steps_1.tar.gz) | 4 | Reverse: [259, 519, 779, 999];
Forward: [19, 259, 519, 779] |
| [iCD-SDXL, 1.4GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/iCD-SDXL_4steps_2.tar.gz) | 4 | Reverse: [249, 499, 699, 999];
Forward: [19, 249, 499, 699] |
| [iCD-SDXL, 1.4GB](https://storage.yandexcloud.net/yandex-research/invertible-cd/iCD-SDXL_3steps.tar.gz) | 3 | Reverse: [339, 699, 999];
Forward: [19, 339, 699] |These models saved as `.safetensors` files.
## Easy-to-run examples
Step 0. Download the models and put them to the *checkpoints* folder
For this example, we consider [iCD-SD1.5](https://storage.yandexcloud.net/yandex-research/invertible-cd/iCD-SD1.5_1.tar) using
reverse: [259, 519, 779, 999], forward: [19, 259, 519, 779] time steps.Step 1. Load the models
```Python
from utils.loading import load_models
from diffusers import DDPMSchedulerroot = 'checkpoints'
ldm_stable, reverse_cons_model, forward_cons_model = load_models(
model_id="runwayml/stable-diffusion-v1-5",
device='cuda',
forward_checkpoint=f'{root}/iCD-SD15-forward_19_259_519_779.safetensors',
reverse_checkpoint=f'{root}/iCD-SD15-reverse_259_519_779_999.safetensors',
r=64,
w_embed_dim=512,
teacher_checkpoint=f'{root}/sd15_cfg_distill.pt',
)tokenizer = ldm_stable.tokenizer
noise_scheduler = DDPMScheduler.from_pretrained(
"runwayml/stable-diffusion-v1-5", subfolder="scheduler", )
```Step 2. Specify the configuration according to the downloaded model
```Python
from utils import p2p, generationNUM_REVERSE_CONS_STEPS = 4
REVERSE_TIMESTEPS = [259, 519, 779, 999]
NUM_FORWARD_CONS_STEPS = 4
FORWARD_TIMESTEPS = [19, 259, 519, 779]
NUM_DDIM_STEPS = 50solver = generation.Generator(
model=ldm_stable,
noise_scheduler=noise_scheduler,
n_steps=NUM_DDIM_STEPS,
forward_cons_model=forward_cons_model,
forward_timesteps=FORWARD_TIMESTEPS,
reverse_cons_model=reverse_cons_model,
reverse_timesteps=REVERSE_TIMESTEPS,
num_endpoints=NUM_REVERSE_CONS_STEPS,
num_forward_endpoints=NUM_FORWARD_CONS_STEPS,
max_forward_timestep_index=49,
start_timestep=19)p2p.NUM_DDIM_STEPS = NUM_DDIM_STEPS
p2p.tokenizer = tokenizer
p2p.device = 'cuda'
```### Generation with iCD-SD1.5
Step 3. Generate
```Python
import torchprompt = ['a cute owl with a graduation cap']
controller = p2p.AttentionStore()generator = torch.Generator().manual_seed(150)
tau = 1.0
image, _ = generation.runner(
# Playing params
guidance_scale=19.0,
tau1=tau, # Dynamic guidance if tau < 1.0
tau2=tau,# Fixed params
is_cons_forward=True,
model=reverse_cons_model,
w_embed_dim=512,
solver=solver,
prompt=prompt,
controller=controller,
generator=generator,
latent=None,
return_type='image')# The left image is inversion, the right - editing.
generation.to_pil_images(image).save('test_generation_iCD-SD1.5.jpg')
generation.view_images(image)
```
### Editing with iCD-SD1.5
Step 3. Load and invert real image
```Python
from utils import inversionimage_path = f"assets/bird.jpg"
prompt = ["a photo of a bird standing on a branch"](image_gt, image_rec), ddim_latent, uncond_embeddings = inversion.invert(
# Playing params
image_path=image_path,
prompt=prompt,# Fixed params
is_cons_inversion=True,
w_embed_dim=512,
inv_guidance_scale=0.0,
stop_step=50,
solver=solver,
seed=10500)
```
Step 4. Edit the image
```Python
p2p.NUM_DDIM_STEPS = 4
p2p.tokenizer = tokenizer
p2p.device = 'cuda'prompts = ["a photo of a bird standing on a branch",
"a photo of a lego bird standing on a branch"
]# Playing params
cross_replace_steps = {'default_': 0.2, }
self_replace_steps = 0.2
blend_word = ((('bird',), ('lego',)))
eq_params = {"words": ("lego",), "values": (3.,)}controller = p2p.make_controller(prompts,
False, # (is_replacement) True if only one word is changed
cross_replace_steps,
self_replace_steps,
blend_word,
eq_params)tau = 0.8
image, _ = generation.runner(
# Playing params
guidance_scale=19.0,
tau1=tau, # Dynamic guidance if tau < 1.0
tau2=tau,# Fixed params
model=reverse_cons_model,
is_cons_forward=True,
w_embed_dim=512,
solver=solver,
prompt=prompts,
controller=controller,
num_inference_steps=50,
generator=None,
latent=ddim_latent,
uncond_embeddings=uncond_embeddings,
return_type='image')generation.to_pil_images(image).save('test_editing_iCD-SD1.5.jpg')
generation.view_images(image)
```
>**Note**:
Please note that zero-shot editing is highly sensitive to hyperparameters. Thus, we recommend tuning: cross_replace_steps
> (from 0.0 to 1.0), self_replace_steps (from 0.0 to 1), tau (0.7 or 0.8 seems to work best),
> guidance scale (up to 19), and amplify factor (eq_params).You can also consider the similar easy-to-run examples for the [SDXL model](running/sdxl)
or move on to [in-depth examples](running)## Citation
```bibtex
@article{starodubcev2024invertible,
title={Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps},
author={Starodubcev, Nikita and Khoroshikh, Mikhail and Babenko, Artem and Baranchuk, Dmitry},
journal={arXiv preprint arXiv:2406.14539},
year={2024}
}
```