Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ZYM-PKU/UDiffText

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
https://github.com/ZYM-PKU/UDiffText

Last synced: 13 days ago
JSON representation

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Awesome Lists containing this project

README

        

## UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models



#### Our proposed UDiffText is capable of synthesizing accurate and harmonious text in either synthetic or real-word images, thus can be applied to tasks like scene text editing (a), arbitrary text generation (b) and accurate T2I generation (c)

![UDiffText Teaser](demo/teaser.png)

### ๐Ÿ“ฌ News

- **2023.7.16** Our paper is accepted by ECCV2024!๐Ÿฅณ
- **2023.12.11** Version 2.0 update (getting rid of trash codes๐Ÿšฎ)
- **2023.12.3** Build Hugging Face demo
- **2023.12.1** Build Github project page
- **2023.11.30** Version 1.0 upload

### ๐Ÿ”จ Installation

1. Clone this repo:
```
git clone https://github.com/ZYM-PKU/UDiffText.git
cd UDiffText
```

2. Install required Python packages

```
conda create -n udiff python=3.11
conda activate udiff
pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
```

3. Make the checkpoint directory and build the tree structure

```
mkdir ./checkpoints

checkpoints
โ”œโ”€โ”€ AEs // AutoEncoder
โ”œโ”€โ”€ encoders
โ”œโ”€โ”€ LabelEncoder // Character-level encoder
โ””โ”€โ”€ ViTSTR // STR encoder
โ”œโ”€โ”€ predictors // STR model
โ”œโ”€โ”€ pretrained // Pretrained SD
โ””โ”€โ”€ ***.ckpt // UDiffText checkpoint
```

### ๐Ÿ’ป Training

1. Prepare your data

#### LAION-OCR
- Create a data directory **{your data root}/LAION-OCR** in your disk and put your data in it. Then set the **data_root** field in **./configs/dataset/locr.yaml**.
- For the downloading and preprocessing of Laion-OCR dataset, please refer to [TextDiffuser](https://github.com/microsoft/unilm/tree/master/textdiffuser) and our **./scripts/preprocess/laion_ocr_pre.ipynb**.

#### ICDAR13
- Create a data directory **{your data root}/ICDAR13** in your disk and put your data in it. Then set the **data_root** field in **./configs/dataset/icd13.yaml**.
- Build the tree structure as below:
```
ICDAR13
โ”œโ”€โ”€ train // training set
โ”œโ”€โ”€ annos // annotations
โ”œโ”€โ”€ gt_x.txt
โ”œโ”€โ”€ ...
โ””โ”€โ”€ images // images
โ”œโ”€โ”€ img_x.jpg
โ”œโ”€โ”€ ...
โ””โ”€โ”€ val // validation set
โ”œโ”€โ”€ annos // annotations
โ”œโ”€โ”€ gt_img_x.txt
โ”œโ”€โ”€ ...
โ””โ”€โ”€ images // images
โ”œโ”€โ”€ img_x.jpg
โ”œโ”€โ”€ ...
```

#### TextSeg
- Create a data directory **{your data root}/TextSeg** in your disk and put your data in it. Then set the **data_root** field in **./configs/dataset/tsg.yaml**.
- Build the tree structure as below:
```
TextSeg
โ”œโ”€โ”€ train // training set
โ”œโ”€โ”€ annotation // annotations
โ”œโ”€โ”€ x_anno.json // annotation json file
โ”œโ”€โ”€ x_mask.png // character-level mask
โ”œโ”€โ”€ ...
โ””โ”€โ”€ image // images
โ”œโ”€โ”€ x.jpg.jpg
โ”œโ”€โ”€ ...
โ””โ”€โ”€ val // validation set
โ”œโ”€โ”€ annotation // annotations
โ”œโ”€โ”€ x_anno.json // annotation json file
โ”œโ”€โ”€ x_mask.png // character-level mask
โ”œโ”€โ”€ ...
โ””โ”€โ”€ image // images
โ”œโ”€โ”€ x.jpg
โ”œโ”€โ”€ ...
```

#### SynthText
- Create a data directory **{your data root}/SynthText** in your disk and put your data in it. Then set the **data_root** field in **./configs/dataset/st.yaml**.
- Build the tree structure as below:
```
SynthText
โ”œโ”€โ”€ 1 // part 1
โ”œโ”€โ”€ ant+hill_1_0.jpg // image
โ”œโ”€โ”€ ant+hill_1_1.jpg
โ”œโ”€โ”€ ...
โ”œโ”€โ”€ 2 // part 2
โ”œโ”€โ”€ ...
โ””โ”€โ”€ gt.mat // annotation file
```

2. Train the character-level encoder

Set the parameters in **./configs/pretrain.yaml** and run:

```
python pretrain.py
```

3. Train the UDiffText model

Download the [pretrained model](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/512-inpainting-ema.ckpt) and put it in **./checkpoints/pretrained/**. You can ignore the "Missing Key" or "Unexcepted Key" warning when loading the checkpoint.

Set the parameters in **./configs/train.yaml**, especially the paths:

```
load_ckpt_path: ./checkpoints/pretrained/512-inpainting-ema.ckpt // Checkpoint of the pretrained SD
model_cfg_path: ./configs/train/textdesign_sd_2.yaml // UDiffText model config
dataset_cfg_path: ./configs/dataset/locr.yaml // Use the Laion-OCR dataset
```

and run:

```
python train.py
```

### ๐Ÿ“ Evaluation

1. Download our available [checkpoints](https://drive.google.com/drive/folders/1s8IWqqydaJBjukxViGKFj2N33lfoVkGf?usp=sharing) and put them in the corresponding directories in **./checkpoints**.

2. Set the parameters in **./configs/test.yaml**, especially the paths:

```
load_ckpt_path: "./checkpoints/***.ckpt" // UDiffText checkpoint
model_cfg_path: "./configs/test/textdesign_sd_2.yaml" // UDiffText model config
dataset_cfg_path: "./configs/dataset/locr.yaml" // LAION-OCR dataset config
```

and run:

```
python test.py
```

### ๐Ÿ–ผ๏ธ Demo

In order to run an interactive demo on your own machine, execute the code:

```
python demo.py
```

or try our online demo at [hugging face](https://huggingface.co/spaces/ZYMPKU/UDiffText):

![Demo](demo/demo.png)

### ๐ŸŽ‰ Acknowledgement

- **Dataset**: We sincerely thank the open-source large image-text dataset LAION-OCR with character-level segmentations provided by [TextDiffuser](https://github.com/microsoft/unilm/tree/master/textdiffuser).

- **Code & Model**: We build our project based on the code repo of [Stable Diffusion XL](https://github.com/Stability-AI/generative-models) and leverage the pretrained checkpoint of [Stable Diffusion 2.0](https://github.com/Stability-AI/stablediffusion).

### ๐Ÿชฌ Citation

```
@misc{zhao2023udifftext,
title={UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models},
author={Yiming Zhao and Zhouhui Lian},
year={2023},
eprint={2312.04884},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```