Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Shilin-LU/TF-ICON?tab=readme-ov-file
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)
https://github.com/Shilin-LU/TF-ICON?tab=readme-ov-file
diffusion-model generative-ai image-composition image-inversion stable-diffusion text-to-image
Last synced: 3 months ago
JSON representation
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)
- Host: GitHub
- URL: https://github.com/Shilin-LU/TF-ICON?tab=readme-ov-file
- Owner: Shilin-LU
- License: mit
- Created: 2023-07-23T03:54:45.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-17T02:05:51.000Z (10 months ago)
- Last Synced: 2024-08-01T18:39:39.509Z (3 months ago)
- Topics: diffusion-model, generative-ai, image-composition, image-inversion, stable-diffusion, text-to-image
- Language: Python
- Homepage: https://shilin-lu.github.io/tf-icon.github.io/
- Size: 75.2 MB
- Stars: 778
- Watchers: 35
- Forks: 100
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Generative-Image-Composition - TF-ICON test benchmark - domain, single-ref): 332 samples. Each sample consists of a background image, a foreground image, a (Test Set)
README
# TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition (ICCV 2023)
## [Project Page] [Poster]
[![arXiv](https://img.shields.io/badge/arXiv-TF--ICON-green.svg?style=plastic)](https://arxiv.org/abs/2307.12493) [![TI2I](https://img.shields.io/badge/benchmarks-TF--ICON-blue.svg?style=plastic)](https://entuedu-my.sharepoint.com/:f:/g/personal/shilin002_e_ntu_edu_sg/EmmCgLm_3OZCssqjaGdvjMwBCIvqfjsyphjqNs7g2DFzQQ?e=JSwOHY)
Official implementation of [TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition](https://shilin-lu.github.io/tf-icon.github.io/).
> **TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition**
> Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong
> ICCV 2023
>
>**Abstract**:
Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains.![teaser](assets/tf-icon.png)
---
![framework](assets/framework_vector.png)
---
## Contents
- [Setup](#setup)
- [Creating a Conda Environment](#creating-a-conda-environment)
- [Downloading Stable-Diffusion Weights](#downloading-stable\-diffusion-weights)
- [Running TF-ICON](#running-tf\-icon)
- [Data Preparation](#data-preparation)
- [Image Composition](#image-composition)
- [TF-ICON Test Benchmark](#tf\-icon-test-benchmark)
- [Additional Results](#additional-results)
- [Sketchy Painting](#sketchy-painting)
- [Oil Painting](#oil-painting)
- [Photorealism](#photorealism)
- [Cartoon](#cartoon)
- [Acknowledgments](#acknowledgments)
- [Citation](#citation)
## Setup
Our codebase is built on [Stable-Diffusion](https://github.com/Stability-AI/stablediffusion)
and has shared dependencies and model architecture. A VRAM of 23 GB is recommended, though this may vary depending on the input samples (minimum 20 GB).### Creating a Conda Environment
```
git clone https://github.com/Shilin-LU/TF-ICON.git
cd TF-ICON
conda env create -f tf_icon_env.yaml
conda activate tf-icon
```### Downloading Stable-Diffusion Weights
Download the StableDiffusion weights from the [Stability AI at Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-2-1-base/blob/main/v2-1_512-ema-pruned.ckpt)
(download the `sd-v2-1_512-ema-pruned.ckpt` file), and put it under `./ckpt` folder.## Running TF-ICON
### Data Preparation
Several input samples are available under `./inputs` directory. Each sample involves one background (bg), one foreground (fg), one segmentation mask for the foreground (fg_mask), and one user mask that denotes the desired composition location (mask_bg_fg). The input data structure is like this:
```
inputs
├── cross_domain
│ ├── prompt1
│ │ ├── bgxx.png
│ │ ├── fgxx.png
│ │ ├── fgxx_mask.png
│ │ ├── mask_bg_fg.png
│ ├── prompt2
│ ├── ...
├── same_domain
│ ├── prompt1
│ │ ├── bgxx.png
│ │ ├── fgxx.png
│ │ ├── fgxx_mask.png
│ │ ├── mask_bg_fg.png
│ ├── prompt2
│ ├── ...
```More samples are available in [TF-ICON Test Benchmark](#tf\-icon-test-benchmark) or you can customize them. Note that the resolution of the input foreground should not be too small.
- Cross domain: the background and foreground images originate from different visual domains.
- Same domain: both the background and foreground images belong to the same photorealism domain.### Image Composition
To execute the TF-ICON under the 'cross_domain' mode, run the following commands:```
python scripts/main_tf_icon.py --ckpt \
--root ./inputs/cross_domain \
--domain 'cross' \
--dpm_steps 20 \
--dpm_order 2 \
--scale 5 \
--tau_a 0.4 \
--tau_b 0.8 \
--outdir ./outputs \
--gpu cuda:0 \
--seed 3407
```For the 'same_domain' mode, run the following commands:
```
python scripts/main_tf_icon.py --ckpt \
--root ./inputs/same_domain \
--domain 'same' \
--dpm_steps 20 \
--dpm_order 2 \
--scale 2.5 \
--tau_a 0.4 \
--tau_b 0.8 \
--outdir ./outputs \
--gpu cuda:0 \
--seed 3407
```- `ckpt`: The path to the checkpoint of Stable Diffusion.
- `root`: The path to your input data.
- `domain`: Setting 'cross' if the foreground and background are from different visual domains, otherwise 'same'.
- `dpm_steps`: The diffusion sampling steps.
- `dpm_solver`: The order of the probability flow ODE solver.
- `scale`: The classifier-free guidance (CFG) scale.
- `tau_a`: The threshold for injecting composite self-attention maps.
- `tau_b`: The threshold for preserving background.## TF-ICON Test Benchmark
The complete TF-ICON test benchmark is available in [this OneDrive folder](https://entuedu-my.sharepoint.com/:f:/g/personal/shilin002_e_ntu_edu_sg/EmmCgLm_3OZCssqjaGdvjMwBCIvqfjsyphjqNs7g2DFzQQ?e=JSwOHY). If you find the benchmark useful for your research, please consider citing.
## Additional Results
### Sketchy Painting
![sketchy-comp](assets/Additional_composition_ske.png)---
### Oil Painting
![painting-comp](assets/Additional_composition_oil.png)---
### Photorealism
![real-comp](assets/Additional_composition_real1.png)---
### Cartoon
![carton-comp](assets/Additional_composition_carton.png)---
## Acknowledgments
Our work is standing on the shoulders of giants. We thank the following contributors that our code is based on: [Stable-Diffusion](https://github.com/Stability-AI/stablediffusion) and [Prompt-to-Prompt](https://github.com/google/prompt-to-prompt).## Citation
If you find the repo useful, please consider citing:
```
@inproceedings{lu2023tf,
title={TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition},
author={Lu, Shilin and Liu, Yanzhu and Kong, Adams Wai-Kin},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={2294--2305},
year={2023}
}
```