An open API service indexing awesome lists of open source software.

https://github.com/Lingzhi-Pan/PILOT

Official Implement of the work "Coherent and Multi-modality Image Inpainting via Latent Space Optimization"
https://github.com/Lingzhi-Pan/PILOT

Last synced: 3 months ago
JSON representation

Official Implement of the work "Coherent and Multi-modality Image Inpainting via Latent Space Optimization"

Awesome Lists containing this project

README

        

#

PILOT: Coherent and Multi-modality Image Inpainting via Latent Space Optimization

---

Official Implement of PILOT.

[Lingzhi Pan](https://github.com/Lingzhi-Pan), [Tong Zhang](https://people.epfl.ch/tong.zhang?lang=en), [Bingyuan Chen](https://github.com/Alex-Lord), [Qi Zhou](https://github.com/zaqai), [Wei Ke](https://gr.xjtu.edu.cn/en/web/wei.ke), [Sabine Susstrunk](https://people.epfl.ch/sabine.susstrunk), [Mathieu Salzmann](https://people.epfl.ch/mathieu.salzmann)

![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/teaser.jpg)

## Method Overview

![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/framework_a.png)
![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/framework_b.png)

## Getting Started
It is recommended to create and use a Torch virtual environment, such as conda. Next, download the appropriate PyTorch version compatible with your CUDA devices, and install the required packages listed in requirements.txt.
```
git clone https://github.com/Lingzhi-Pan/PILOT.git
cd PILOT
conda create -n pilot python==3.9
conda activate pilot
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```
You can download the `stable-diffusion-v1-5` model from the website "https://huggingface.co/runwayml/stable-diffusion-v1-5" and save it to your local path.

## Run Examples
We provide three types of conditions to guide the inpainting process: text, spatial controls, and reference images. Each condition-control refers to a different configuration in the directory `configs/`.

### Text-guided
Modify the `model_path` parameter in the config file to point to the directory where you saved your SD model, and then execute the following instruction:
```
python run_example.py --config_file configs/t2i_step50.yaml
```
![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/text_add.png)

### Text + Spatial Controls
To introduce spatial controls using ControlNet or T2I-Adapter, we offer options for both models, but we recommend using ControlNet. First, download the ControlNet checkpoint, such as ControlNet conditioned on Scribble images, published by Lvmin Zhang from the following link: https://huggingface.co/lllyasviel/sd-controlnet-scribble. Then, execute the instructions below:
```
python run_example.py --config_file configs/controlnet_step30.yaml
```
You can also download other ControlNet models published by Lvmin Zhang to enable inpainting with other conditions such as canny map, segmentation map, and normal map.
![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/controlNet_results.png)

### Text + Reference Image
Download the checkpoint of IP-Adapter from the website "https://huggingface.co/h94/IP-Adapter", and then run the following instruction:
```
python run_example.py --config_file configs/ipa_step50.yaml
```
![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/ip_adapter_a.png)

### Text + Spatial Controls + Reference Image
You can also use ControlNet and IP-Adapter together to achieve multi-condition controls:
```
python run_example.py --config_file configs/ipa_controlnet_step30.yaml
```
![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/ip_adapter_b.png)

### Personalized Image Inpainting
You can also integrate LORA into the base model or replace the base model with other personalized Text-to-Image (T2I) models to achieve personalized image inpainting. For example, replacing the base model with a T2I model fine-tuned by DreamBooth using several photos of a cute dog can generate the dog inside the masked region while preserving the dog's identity effectively.
![image](https://github.com/Lingzhi-Pan/PILOT/blob/main/assets/subject.png)

**See our [Paper](https://arxiv.org/abs/2407.08019) for more information!**
## BibTeX
If you find this work helpful, please consider citing:
```bibtex
@article{pan2024coherent,
title={Coherent and Multi-modality Image Inpainting via Latent Space Optimization},
author={Pan, Lingzhi and Zhang, Tong and Chen, Bingyuan and Zhou, Qi and Ke, Wei and S{\"u}sstrunk, Sabine and Salzmann, Mathieu},
journal={arXiv preprint arXiv:2407.08019},
year={2024}
}
```