An open API service indexing awesome lists of open source software.

https://github.com/jinjianRick/DCI_ICO


https://github.com/jinjianRick/DCI_ICO

Last synced: 4 days ago
JSON representation

Awesome Lists containing this project

README

          

Customized Generation Reimagined: Fidelity and Editability Harmonized

Jian Jin, [Yang Shen](https://aassxun.github.io/), Zhenyong Fu† [Jian Yang](https://scholar.google.com.hk/citations?hl=zh-CN&user=6CIDtZQAAAAJ)†

(†corresponding author)

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology

ECCV 2024

## 📖 Abstract
![results_of_multi_concept](figures/method_overall.png)


Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts.
However, customized generation suffers from an inherent trade-off between concept fidelity and editability, i.e., between precisely modeling the concept and faithfully adhering to the prompts.
Previous methods reluctantly seek a compromise and struggle to achieve both high concept fidelity and ideal prompt alignment simultaneously.
In this paper, we propose a "Divide, Conquer, then Integrate" (DCI) framework, which performs a surgical adjustment in the early stage of denoising to liberate the fine-tuned model from the fidelity-editability trade-off at inference.
The two conflicting components in the trade-off are decoupled and individually conquered by two collaborative branches, which are then selectively integrated to preserve high concept fidelity while achieving faithful prompt adherence.
To obtain a better fine-tuned model, we introduce an Image-specific Context Optimization (ICO) strategy for model customization.
ICO replaces manual prompt templates with learnable image-specific contexts, providing an adaptive and precise fine-tuning direction to promote the overall performance.
Extensive experiments demonstrate the effectiveness of our method in reconciling the fidelity-editability trade-off.

## Getting Started

### Config environment
```
cd stable-diffusion
conda env create -f environment.yaml
conda activate ldm
pip install clip-retrieval tqdm
```

or

```
conda env create -f environment.yaml
conda activate ldm
```

### Prepare pre-trained model
Download the stable-diffusion model checkpoint
`wget https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt`

## 🚀 Run

### Finetuning Stage

Config the items in train.sh and run:

```
bash train.sh
```

You can download the real regularization images [here](https://drive.google.com/file/d/1-oYRCLq87dKnMm6h8-y6X4WZ5_CU2CoM/view?usp=sharing) and place them in ```./real_reg```, or curate them yourself and organize them in the same format.

### Inference

Config the items in inference.sh and run:

```
bash inference.sh
```

## 🖊️ BibTeX
If you find this project useful in your research, please consider cite:

```bibtex
@inproceedings{jin2024customized,
title={customized generation reimagined: fidelity and editability harmonized},
author={Jian Jin and Yang Shen and Zhenyong Fu and Jian Yang},
booktitle={European Conference on Computer Vision},
year={2024}
}
```

## 🙏 Acknowledgements
This code is based on the [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [Custom Diffusion](https://github.com/adobe-research/custom-diffusion). Thank them for their outstanding work.

## 📧 Contact
Should you have any question or suggestion, please contact .