Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/leeruibin/SPDInv

[ECCV2024] Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
https://github.com/leeruibin/SPDInv

Last synced: 2 months ago
JSON representation

[ECCV2024] Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

Awesome Lists containing this project

README

        

## Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

  

>[Ruibin Li](https://github.com/leeruibin)1 | [Ruihuang Li](https://scholar.google.com/citations?user=8CfyOtQAAAAJ&hl=zh-CN)1 |[Song Guo](https://scholar.google.com/citations?user=Ib-sizwAAAAJ&hl=en)2 | [Lei Zhang](https://www4.comp.polyu.edu.hk/~cslzhang/)1*

>1The Hong Kong Polytechnic University, 2The Hong Kong University of Science and Technology.

>In ECCV2024

## 🔎 Overview framework

Pipelines of different inversion methods in text-driven editing. (a) DDIM inversion inverts a real image to a latent noise code, but the inverted noise code often results in large gap of reconstruction $D_{Rec}$ with higher CFG parameters. (b) NTI optimizes the null-text embedding to narrow the gap of reconstruction $D_{Rec}$, while NPI further optimizes the speed of NTI. (c) DirectInv records the differences between the inversion feature and the reconstruction feature, and merges them back to achieve high-quality reconstruction. (d) Our SPDInv aims to minimize the gap of noise $D_{Noi}$, instead of $D_{Rec}$, which can reduce the impact of source prompt on the editing process and thus reduce the artifacts and inconsistent details encountered by the previous methods.

![SPDInv](figures/methods.png)

## ⚙️ Dependencies and Installation
```
## git clone this repository
git clone https://github.com/leeruibin/SPDInv.git
cd SPDInv

# create an environment with python >= 3.8
conda env create -f environment.yaml
conda activate SPDInv
```

## 🚀 Quick Inference

#### Run P2P with SPDInv

```
python run_SPDInv_P2P.py --input xxx --source [source prompt] --target [target prompt] --blended_word "word1 word2"
```

#### Run MasaCtrl with SPDInv
```
python run_SPDInv_MasaCtrl.py --input xxx --source [source prompt] --target [target prompt]
```

#### Run PNP with SPDInv
To run PNP, you should first upgrade diffusers to 0.17.1 by

```
pip install diffusers==0.17.1
```
then, you can run
```
python run_SPDInv_PNP.py --input xxx --source [source prompt] --target [target prompt]
```

#### Run ELITE with SPDInv
For ELITE, you should first download the pre-trained [global_mapper.pt](https://drive.google.com/drive/folders/1VkiVZzA_i9gbfuzvHaLH2VYh7kOTzE0x?usp=sharing) checkpoint provided by the ELITE, put it into the checkpoints folder.
```
python run_SPDInv_ELITE.py --input xxx --source [source prompt] --target [target prompt]
```

## 📷 Editing cases with P2P, MasaCtrl, PNP, ELITE
## Editing cases with P2P

P2P

## Editing cases with MasaCtrl

MasaCtrl

## Editing cases with PNP

PNP

## Editing cases with ELITE

ELITE

## Citation

```
@article{li2024source,
title={Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models},
author={Li, Ruibin and Li, Ruihuang and Guo, Song and Zhang, Lei},
booktitle={European Conference on Computer Vision},
year={2024}
}
```

## Acknowledgements

This code is built on [diffusers](https://github.com/huggingface/diffusers/) version of [Stable Diffusion](https://github.com/CompVis/stable-diffusion).

Meanwhile, the code is heavily based on the [Prompt-to-Prompt](https://github.com/google/prompt-to-prompt), [Null-Text Inversion](https://github.com/google/prompt-to-prompt), [MasaCtrl](https://github.com/TencentARC/MasaCtrl), [ProxEdit](https://github.com/phymhan/prompt-to-prompt), [ELITE](https://github.com/csyxwei/ELITE), [Plug-and-Play](https://github.com/MichalGeyer/plug-and-play), [DirectInversion](https://github.com/cure-lab/PnPInversion), thanks to all the contributors!.