Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yeungchenwa/FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
https://github.com/yeungchenwa/FontDiffuser

deep-learning diffusers diffusion font-generation image-generation

Last synced: 3 months ago
JSON representation

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Awesome Lists containing this project

README

        

# FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

![FontDiffuser_LOGO](figures/logo.png)

[![arXiv preprint](http://img.shields.io/badge/arXiv-2312.12142-b31b1b)](https://arxiv.org/abs/2312.12142)
[![Gradio demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FontDiffuser-ff7c00)](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio)
[![Homepage](https://img.shields.io/badge/Homepage-FontDiffuser-green)](https://yeungchenwa.github.io/fontdiffuser-homepage/)
[![Code](https://img.shields.io/badge/Code-FontDiffuser-yellow)](https://github.com/yeungchenwa/FontDiffuser)


πŸ”₯ Model Zoo β€’
πŸ› οΈ Installation β€’
πŸ‹οΈ Training β€’
πŸ“Ί Sampling β€’
πŸ“± Run WebUI

## 🌟 Highlights
![Vis_1](figures/vis_1.png)
![Vis_2](figures/with_instructpix2pix.png)
+ We propose **FontDiffuser**, which can generate unseen characters and styles and can be extended to cross-lingual generation, such as Chinese to Korean.
+ **FontDiffuser** excels in generating complex characters and handling large style variations. And it achieves state-of-the-art performance.
+ The generated results by **FontDiffuser** can be perfectly used for **InstructPix2Pix** for decoration, as shown in thr above figure.
+ We release the πŸ’»[Hugging Face Demo](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio) online! Welcome to Try it Out!

## πŸ“… News
- **2024.01.27**: The training of phase 2 is released.
- **2023.12.20**: Our repository is public! πŸ‘πŸ€—
- **2023.12.19**: πŸ”₯πŸŽ‰ The πŸ’»[Hugging Face Demo](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio) is public! Welcome to try it out!
- **2023.12.16**: The gradio app demo is released.
- **2023.12.10**: Release source code with phase 1 training and sampling.
- **2023.12.09**: πŸŽ‰πŸŽ‰ Our [paper](https://arxiv.org/abs/2312.12142) is accepted by AAAI2024.
- **Previously**: Our [Recommendations-of-Diffusion-for-Text-Image](https://github.com/yeungchenwa/Recommendations-Diffusion-Text-Image) repo is public, which contains a paper collection of recent diffusion models for text-image generation tasks. Welcome to check it out!

## πŸ”₯ Model Zoo
| **Model** | **chekcpoint** | **status** |
|----------------------------------------------|----------------|------------|
| **FontDiffuer** | [GoogleDrive](https://drive.google.com/drive/folders/12hfuZ9MQvXqcteNuz7JQ2B_mUcTr-5jZ?usp=drive_link) / [BaiduYun:gexg](https://pan.baidu.com/s/19t1B7le8x8L2yFGaOvyyBQ) | Released |
| **SCR** | [GoogleDrive](https://drive.google.com/drive/folders/12hfuZ9MQvXqcteNuz7JQ2B_mUcTr-5jZ?usp=drive_link) / [BaiduYun:gexg](https://pan.baidu.com/s/19t1B7le8x8L2yFGaOvyyBQ) | Released |

## 🚧 TODO List
- [x] Add phase 1 training and sampling script.
- [x] Add WebUI demo.
- [x] Push demo to Hugging Face.
- [x] Add phase 2 training script and checkpoint.
- [ ] Add the pre-training of SCR module.
- [ ] Combined with InstructPix2Pix.

## πŸ› οΈ Installation
### Prerequisites (Recommended)
- Linux
- Python 3.9
- Pytorch 1.13.1
- CUDA 11.7

### Environment Setup
Clone this repo:
```bash
git clone https://github.com/yeungchenwa/FontDiffuser.git
```

**Step 0**: Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).

**Step 1**: Create a conda environment and activate it.
```bash
conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser
```

**Step 2**: Install related version Pytorch following [here](https://pytorch.org/get-started/previous-versions/).
```bash
# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
```

**Step 3**: Install the required packages.
```bash
pip install -r requirements.txt
```

## πŸ‹οΈ Training
### Data Construction
The training data files tree should be (The data examples are shown in directory `data_examples/train/`):
```
β”œβ”€β”€data_examples
β”‚ └── train
β”‚ β”œβ”€β”€ ContentImage
β”‚ β”‚ β”œβ”€β”€ char0.png
β”‚ β”‚ β”œβ”€β”€ char1.png
β”‚ β”‚ β”œβ”€β”€ char2.png
β”‚ β”‚ └── ...
β”‚ └── TargetImage.png
β”‚ β”œβ”€β”€ style0
β”‚ β”‚ β”œβ”€β”€style0+char0.png
β”‚ β”‚ β”œβ”€β”€style0+char1.png
β”‚ β”‚ └── ...
β”‚ β”œβ”€β”€ style1
β”‚ β”‚ β”œβ”€β”€style1+char0.png
β”‚ β”‚ β”œβ”€β”€style1+char1.png
β”‚ β”‚ └── ...
β”‚ β”œβ”€β”€ style2
β”‚ β”‚ β”œβ”€β”€style2+char0.png
β”‚ β”‚ β”œβ”€β”€style2+char1.png
β”‚ β”‚ └── ...
β”‚ └── ...
```
### Training Configuration
Before running the training script (including the following three modes), you should set the training configuration, such as distributed training, through:
```bash
accelerate config
```

### Training - Pretraining of SCR
```bash
Coming Soon ...
```

### Training - Phase 1
```bash
sh train_phase_1.sh
```
- `data_root`: The data root, as `./data_examples`
- `output_dir`: The training output logs and checkpoints saving directory.
- `resolution`: The resolution of the UNet in our diffusion model.
- `style_image_size`: The resolution of the style image, can be different with `resolution`.
- `content_image_size`: The resolution of the content image, should be the same as the `resolution`.
- `channel_attn`: Whether to use the channel attention in the MCA block.
- `train_batch_size`: The batch size in the training.
- `max_train_steps`: The maximum of the training steps.
- `learning_rate`: The learning rate when training.
- `ckpt_interval`: The checkpoint saving interval when training.
- `drop_prob`: The classifier-free guidance training probability.

### Training - Phase 2
After the phase 2 training, you should put the trained checkpoint files (`unet.pth`, `content_encoder.pth`, and `style_encoder.pth`) to the directory `phase_1_ckpt`. During phase 2, these parameters will be resumed.
```bash
sh train_phase_2.sh
```
- `phase_2`: Tag to phase 2 training.
- `phase_1_ckpt_dir`: The model checkpoints saving directory after phase 1 training.
- `scr_ckpt_path`: The ckpt path of pre-trained SCR module. You can download it from above πŸ”₯Model Zoo.
- `sc_coefficient`: The coefficient of style contrastive loss for supervision.
- `num_neg`: The number of negative samples, default to be `16`.

## πŸ“Ί Sampling
### Step 1 => Prepare the checkpoint
Option (1) Download the checkpoint following [GoogleDrive](https://drive.google.com/drive/folders/12hfuZ9MQvXqcteNuz7JQ2B_mUcTr-5jZ?usp=drive_link) / [BaiduYun:gexg](https://pan.baidu.com/s/19t1B7le8x8L2yFGaOvyyBQ), then put the `ckpt` to the root directory, including the files `unet.pth`, `content_encoder.pth`, and `style_encoder.pth`.
Option (2) Put your re-training checkpoint folder `ckpt` to the root directory, including the files `unet.pth`, `content_encoder.pth`, and `style_encoder.pth`.

### Step 2 => Run the script
**(1) Sampling image from content image and reference image.**
```bash
sh script/sample_content_image.sh
```
- `ckpt_dir`: The model checkpoints saving directory.
- `content_image_path`: The content/source image path.
- `style_image_path`: The style/reference image path.
- `save_image`: set `True` if saving as images.
- `save_image_dir`: The image saving directory, the saving files including an `out_single.png` and an `out_with_cs.png`.
- `device`: The sampling device, recommended GPU acceleration.
- `guidance_scale`: The classifier-free sampling guidance scale.
- `num_inference_steps`: The inference step by DPM-Solver++.

**(2) Sampling image from content character.**
**Note** Maybe you need a ttf file that contains numerous Chinese characters, you can download it from [BaiduYun:wrth](https://pan.baidu.com/s/1LhcXG4tPcso9BLaUzU6KtQ).
```bash
sh script/sample_content_character.sh
```
- `character_input`: If set `True`, use character string as content/source input.
- `content_character`: The content/source content character string.
- The other parameters are the same as the above option (1).

## πŸ“± Run WebUI
### (1) Sampling by FontDiffuser
```bash
gradio gradio_app.py
```
**Example**:



### (2) Sampling by FontDiffuser and Rendering by InstructPix2Pix
```bash
Coming Soon ...
```

## πŸŒ„ Gallery
### Characters of hard level of complexity
![vis_hard](figures/vis_hard.png)

### Characters of medium level of complexity
![vis_medium](figures/vis_medium.png)

### Characters of easy level of complexity
![vis_easy](figures/vis_easy.png)

### Cross-Lingual Generation (Chinese to Korean)
![vis_korean](figures/vis_korean.png)

## πŸ’™ Acknowledgement
- [diffusers](https://github.com/huggingface/diffusers)

## Copyright
- This repository can only be used for non-commercial research purposes.
- For commercial use, please contact Prof. Lianwen Jin ([email protected]).
- Copyright 2023, [Deep Learning and Vision Computing Lab (DLVC-Lab)](http://www.dlvc-lab.net), South China University of Technology.

## Citation
```
@inproceedings{yang2024fontdiffuser,
title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
year={2024}
}
```

## ⭐ Star Rising
[![Star Rising](https://api.star-history.com/svg?repos=yeungchenwa/FontDiffuser&type=Timeline)](https://star-history.com/#yeungchenwa/FontDiffuser&Timeline)