https://github.com/williamyang1991/DualStyleGAN

[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
https://github.com/williamyang1991/DualStyleGAN

caricatures cvpr2022 face style-transfer stylegan-image-manipulation stylegan2 toonify

Last synced: about 1 year ago
JSON representation

[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

Host: GitHub
URL: https://github.com/williamyang1991/DualStyleGAN
Owner: williamyang1991
License: other
Created: 2022-03-13T05:47:51.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-03-06T09:30:54.000Z (over 3 years ago)
Last Synced: 2025-04-08T09:06:01.061Z (about 1 year ago)
Topics: caricatures, cvpr2022, face, style-transfer, stylegan-image-manipulation, stylegan2, toonify
Language: Jupyter Notebook
Homepage:
Size: 25.6 MB
Stars: 1,663
Watchers: 28
Forks: 257
Open Issues: 29
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

awesome-image-synthesis - [Pytorch

README

# DualStyleGAN - Official PyTorch Implementation

This repository provides the official PyTorch implementation for the following paper:

**Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer**

[Shuai Yang](https://williamyang1991.github.io/), [Liming Jiang](https://liming-jiang.com/), [Ziwei Liu](https://liuziwei7.github.io/) and [Chen Change Loy](https://www.mmlab-ntu.com/person/ccloy/)

In CVPR 2022.

[**Project Page**](https://www.mmlab-ntu.com/project/dualstylegan/) | [**Paper**](https://arxiv.org/abs/2203.13248) | [**Supplementary Video**](https://www.youtube.com/watch?v=scZTu77jixI)
> **Abstract:** *Recent studies on StyleGAN show high performance on artistic portrait generation by transfer learning with limited data. In this paper, we explore more challenging exemplar-based high-resolution portrait style transfer by introducing a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain. Different from StyleGAN, DualStyleGAN provides a natural way of style transfer by characterizing the content and style of a portrait with an intrinsic style path and a new extrinsic style path, respectively. The delicately designed extrinsic style path enables our model to modulate both the color and complex structural styles hierarchically to precisely pastiche the style example. Furthermore, a novel progressive fine-tuning scheme is introduced to smoothly transform the generative space of the model to the target domain, even with the above modifications on the network architecture. Experiments demonstrate the superiority of DualStyleGAN over state-of-the-art methods in high-quality portrait style transfer and flexible style control.*

**Features**:

**High-Resolution** (1024) | **Training Data-Efficient** (~200 Images) | **Exemplar-Based Color and Structure Transfer**

## Updates

- [02/2023] Add `--wplus` in style_transfer.py to use original w+ pSp encoder rather than z+.
- [09/2022] Pre-trained models in three new [styles](#combine-dualstylegan-with-state-of-the-art-diffusion-model) (feat. StableDiffusion) are released.
- [07/2022] Source code license is updated.
- [03/2022] Paper and supplementary video are released.
- [03/2022] Web demo is created.
- [03/2022] Code is released.
- [03/2022] This website is created.

## Web Demo

Integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/hysts/DualStyleGAN) or [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Gradio-Blocks/DualStyleGAN)

## Installation
**Clone this repo:**
```bash
git clone https://github.com/williamyang1991/DualStyleGAN.git
cd DualStyleGAN
```
**Dependencies:**

All dependencies for defining the environment are provided in `environment/dualstylegan_env.yaml`.
We recommend running this repository using [Anaconda](https://docs.anaconda.com/anaconda/install/):
```bash
conda env create -f ./environment/dualstylegan_env.yaml
```
We use CUDA 10.1 so it will install PyTorch 1.7.1 (corresponding to [Line 22](https://github.com/williamyang1991/DualStyleGAN/blob/main/environment/dualstylegan_env.yaml#L22), [Line 25](https://github.com/williamyang1991/DualStyleGAN/blob/main/environment/dualstylegan_env.yaml#L25), [Line 26](https://github.com/williamyang1991/DualStyleGAN/blob/main/environment/dualstylegan_env.yaml#L26) of `dualstylegan_env.yaml`). Please install PyTorch that matches your own CUDA version following [https://pytorch.org/](https://pytorch.org/).

☞ Install on Windows: [here](https://github.com/williamyang1991/VToonify/issues/50#issuecomment-1443061101) and [here](https://github.com/williamyang1991/VToonify/issues/38#issuecomment-1442146800)

## (1) Dataset Preparation

Cartoon, Caricature and Anime datasets can be downloaded from their official pages.
We also provide the script to build new datasets.

| Dataset | Description |
| :--- | :--- |
| [Cartoon](https://mega.nz/file/HslSXS4a#7UBanJTjJqUl_2Z-JmAsreQYiJUKC-8UlZDR0rUsarw) | 317 cartoon face images from [Toonify](https://github.com/justinpinkney/toonify). |
| Caricature | 199 images from [WebCaricature](https://cs.nju.edu.cn/rl/WebCaricature.htm). Please refer to [dataset preparation](./data_preparation/readme.md#caricature-dataset) for more details. |
| Anime | 174 images from [Danbooru Portraits](https://www.gwern.net/Crops#danbooru2019-portraits). Please refer to [dataset preparation](./data_preparation/readme.md#anime-dataset) for more details. |
| [Fantasy](https://drive.google.com/drive/folders/1YjTuuy43jH4lRZU3tA6ZuJjNSs9a5h8n?usp=sharing) | 137 fantasy face images generated by [StableDiffusion](https://github.com/lowfuel/progrock-stable). |
| [Illustration](https://drive.google.com/drive/folders/1eQTzlX13kUzWKLlAQjaTXQmR7ARpXdIM?usp=sharing) | 156 illustration face images generated by [StableDiffusion](https://github.com/lowfuel/progrock-stable). |
| [Impasto](https://drive.google.com/drive/folders/1VcaMdaqHmJ4BDVK4TxNpumx3C51_2XP_?usp=sharing) | 120 impasto face images generated by [StableDiffusion](https://github.com/lowfuel/progrock-stable). |
| Other styles | Please refer to [dataset preparation](./data_preparation/readme.md#build-your-own-dataset) for the way of building new datasets. |

## (2) Inference for Style Transfer and Artistic Portrait Generation

### Inference Notebook

To help users get started, we provide a Jupyter notebook found in `./notebooks/inference_playground.ipynb` that allows one to visualize the performance of DualStyleGAN.
The notebook will download the necessary pretrained models and run inference on the images found in `./data/`.

If no GPU is available, you may refer to [Inference on CPU](./model/stylegan/op_cpu#readme), and set `device = 'cpu'` in the notebook.

### Pretrained Models

Pretrained models can be downloaded from [Google Drive](https://drive.google.com/drive/folders/1GZQ6Gs5AzJq9lUL-ldIQexi0JYPKNy8b?usp=sharing) or [Baidu Cloud](https://pan.baidu.com/s/1sOpPszHfHSgFsgw47S6aAA ) (access code: cvpr):

| Model | Description |
| :--- | :--- |
| [encoder](https://drive.google.com/file/d/1NgI4mPkboYvYw3MWcdUaQhkr0OWgs9ej/view?usp=sharing) | Pixel2style2pixel encoder that embeds FFHQ images into StyleGAN2 Z+ latent code |
| [encoder_wplus](https://drive.google.com/file/d/1IVMWKfF8Yp41tsoLLZ8edwjVg_tkjmqQ/view?usp=share_link) | Original Pixel2style2pixel encoder that embeds FFHQ images into StyleGAN2 W+ latent code |
| [cartoon](https://drive.google.com/drive/folders/1xPo8PcbMXzcUyvwe5liJrfbA5yx4OF1j?usp=sharing) | DualStyleGAN and sampling models trained on Cartoon dataset, 317 (refined) extrinsic style codes |
| [caricature](https://drive.google.com/drive/folders/1BwLXWkSyWDApblBPvaHKsRCTqnhiHxUZ?usp=sharing) | DualStyleGAN and sampling models trained on Caricature dataset, 199 (refined) extrinsic style codes |
| [anime](https://drive.google.com/drive/folders/1YvFj33Bfum4YuBeqNNCYLfiBrD4tpzg7?usp=sharing) | DualStyleGAN and sampling models trained on Anime dataset, 174 (refined) extrinsic style codes |
| [arcane](https://drive.google.com/drive/folders/1-MYwaEQthhAJ_ScWVb0LOQiVkKeSzpBm?usp=sharing) | DualStyleGAN and sampling models trained on Arcane dataset, 100 extrinsic style codes |
| [comic](https://drive.google.com/drive/folders/1qC2onFGs2R-XCXRQTP_yyNbY1fT0BdZG?usp=sharing) | DualStyleGAN and sampling models trained on Comic dataset, 101 extrinsic style codes |
| [pixar](https://drive.google.com/drive/folders/1ve4P8Yb4EZ9g_sRy_RCw3N74p46tNpeW?usp=sharing) | DualStyleGAN and sampling models trained on Pixar dataset, 122 extrinsic style codes |
| [slamdunk](https://drive.google.com/drive/folders/1X345yn_YbMEHBcj7K91O-oQZ2YjVpAcI?usp=sharing) | DualStyleGAN and sampling models trained on Slamdunk dataset, 120 extrinsic style codes |
| [fantasy](https://drive.google.com/drive/folders/1JmimgKR_Xo-lR8n35e_V9wEicSUaJqMz?usp=sharing) | DualStyleGAN models trained on Fantasy dataset, 137 extrinsic style codes |
| [illustration](https://drive.google.com/drive/folders/1ESQBW5rmXqWss3yTjIUOr8Di7R_er-o8?usp=sharing) | DualStyleGAN models trained on Illustration dataset, 156 extrinsic style codes |
| [impasto](https://drive.google.com/drive/folders/1d0Lb-B7ozphXLywRjVLXQI0PTC5PESfn?usp=sharing) | DualStyleGAN models trained on Impasto dataset, 120 extrinsic style codes |

The saved checkpoints are under the following folder structure:
```
checkpoint
|--encoder.pt % Pixel2style2pixel model
|--encoder_wplus.pt % Pixel2style2pixel model (optional)
|--cartoon
|--generator.pt % DualStyleGAN model
|--sampler.pt % The extrinsic style code sampling model
|--exstyle_code.npy % extrinsic style codes of Cartoon dataset
|--refined_exstyle_code.npy % refined extrinsic style codes of Cartoon dataset
|--caricature
% the same files as in Cartoon
...
```

### Exemplar-Based Style Transfer
Transfer the style of a default Cartoon image onto a default face:
```python
python style_transfer.py
```
The result `cartoon_transfer_53_081680.jpg` is saved in the folder `.\output\`,
where `53` is the id of the style image in the Cartoon dataset, `081680` is the name of the content face image.
An corresponding overview image `cartoon_transfer_53_081680_overview.jpg` is additionally saved to illustrate the input content image, the encoded content image, the style image (* the style image will be shown only if it is in your folder), and the result:

Specify the style image with `--style` and `--style_id` (find the mapping between id and filename [here](./data_preparation/id_filename_list.txt), find the visual mapping between id and the style image [here](./doc_images)). Specify the filename of the saved images with `--name`. Specify the weight to adjust the degree of style with `--weight`.
The following script generates the style transfer results in the teaser of the paper.
```python
python style_transfer.py
python style_transfer.py --style cartoon --style_id 10
python style_transfer.py --style caricature --name caricature_transfer --style_id 0 --weight 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
python style_transfer.py --style caricature --name caricature_transfer --style_id 187 --weight 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
python style_transfer.py --style anime --name anime_transfer --style_id 17 --weight 0 0 0 0 0.75 0.75 0.75 1 1 1 1 1 1 1 1 1 1 1
python style_transfer.py --style anime --name anime_transfer --style_id 48 --weight 0 0 0 0 0.75 0.75 0.75 1 1 1 1 1 1 1 1 1 1 1
```

Specify the content image with `--content`. If the content image is not well aligned with FFHQ, use `--align_face`. For preserving the color style of the content image, use `--preserve_color` or set the last 11 elements of `--weight` to all zeros.
```python
python style_transfer.py --content ./data/content/unsplash-rDEOVtE7vOs.jpg --align_face --preserve_color \
--style arcane --name arcane_transfer --style_id 13 \
--weight 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 1 1 1 1 1 1 1
```

→

Specify `--wplus` to use the original pSp encoder to extract the W+ intrinsic style code, which may better preserve the face features of the content image.

**Remarks**: Our trained pSp encoder on Z+/W+ space cannot perfectly encode the content image. If the style transfer result more consistent with the content image is desired, one may use latent optimization to better fit the content image or using other StyleGAN encoders (as discussed in https://github.com/williamyang1991/DualStyleGAN/issues/11 and https://github.com/williamyang1991/DualStyleGAN/issues/29).

More options can be found via `python style_transfer.py -h`.

### Artistic Portrait Generation
Generate random Cartoon face images (Results are saved in the `./output/` folder):
```python
python generate.py
```

Specify the style type with `--style` and the filename of the saved images with `--name`:
```python
python generate.py --style arcane --name arcane_generate
```

Specify the weight to adjust the degree of style with `--weight`.

Keep the intrinsic style code, extrinsic color code or extrinsic structure code fixed using `--fix_content`, `--fix_color` and `--fix_structure`, respectively.
```python
python generate.py --style caricature --name caricature_generate --weight 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 --fix_content
```

More options can be found via `python generate.py -h`.

## (3) Training DualStyleGAN

Download the supporting models to the `./checkpoint/` folder:

| Model | Description |
| :--- | :--- |
| [stylegan2-ffhq-config-f.pt](https://drive.google.com/file/d/1EM87UquaoQmk17Q8d5kYIAHqu0dkYqdT/view) | StyleGAN model trained on FFHQ taken from [rosinality](https://github.com/rosinality/stylegan2-pytorch). |
| [model_ir_se50.pth](https://drive.google.com/file/d/1KW7bjndL3QG3sxBbZxreGHigcCCpsDgn/view?usp=sharing) | Pretrained IR-SE50 model taken from [TreB1eN](https://github.com/TreB1eN/InsightFace_Pytorch) for ID loss. |

### Facial Destylization

**Step 1: Prepare data.** Prepare the dataset in `./data/DATASET_NAME/images/train/`. First create lmdb datasets:
```python
python ./model/stylegan/prepare_data.py --out LMDB_PATH --n_worker N_WORKER --size SIZE1,SIZE2,SIZE3,... DATASET_PATH
```
For example, download 317 Cartoon images into `./data/cartoon/images/train/` and run
> python ./model/stylegan/prepare_data.py --out ./data/cartoon/lmdb/ --n_worker 4 --size 1024 ./data/cartoon/images/

**Step 2: Fine-tune StyleGAN.** Fine-tune StyleGAN in distributed settings:
```python
python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT finetune_stylegan.py --batch BATCH_SIZE \
--ckpt FFHQ_MODEL_PATH --iter ITERATIONS --style DATASET_NAME --augment LMDB_PATH
```
Take the cartoon dataset for example, run (batch size of 8\*4=32 is recommended)
> python -m torch.distributed.launch --nproc_per_node=8 --master_port=8765 finetune_stylegan.py --iter 600
--batch 4 --ckpt ./checkpoint/stylegan2-ffhq-config-f.pt --style cartoon
--augment ./data/cartoon/lmdb/

The fine-tuned model can be found in `./checkpoint/cartoon/finetune-000600.pt`. Intermediate results are saved in `./log/cartoon/`.

**Step 3: Destylize artistic portraits.**
```python
python destylize.py --model_name FINETUNED_MODEL_NAME --batch BATCH_SIZE --iter ITERATIONS DATASET_NAME
```
Take the cartoon dataset for example, run:
> python destylize.py --model_name finetune-000600.pt --batch 1 --iter 300 cartoon

The intrinsic and extrinsic style codes are saved in `./checkpoint/cartoon/instyle_code.npy` and `./checkpoint/cartoon/exstyle_code.npy`, respectively. Intermediate results are saved in `./log/cartoon/destylization/`.
To speed up destylization, set `--batch` to large value like 16.
For styles severely different from real faces, set `--truncation` to small value like 0.5 to make the results more photo-realistic (it enables DualStyleGAN to learn larger structrue deformations).

### Progressive Fine-Tuning

**Stage 1 & 2: Pretrain DualStyleGAN on FFHQ.**
We provide our pretrained model [generator-pretrain.pt](https://drive.google.com/file/d/1j8sIvQZYW5rZ0v1SDMn2VEJFqfRjMW3f/view?usp=sharing) at [Google Drive](https://drive.google.com/drive/folders/1GZQ6Gs5AzJq9lUL-ldIQexi0JYPKNy8b?usp=sharing) or [Baidu Cloud](https://pan.baidu.com/s/1sOpPszHfHSgFsgw47S6aAA ) (access code: cvpr). This model is obtained by:
> python -m torch.distributed.launch --nproc_per_node=1 --master_port=8765 pretrain_dualstylegan.py --iter 3000
--batch 4 ./data/ffhq/lmdb/

where `./data/ffhq/lmdb/` contains the lmdb data created from the FFHQ dataset via `./model/stylegan/prepare_data.py`.

**Stage 3: Fine-Tune DualStyleGAN on Target Domain.** Fine-tune DualStyleGAN in distributed settings:
```python
python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT finetune_dualstylegan.py --iter ITERATIONS \
--batch BATCH_SIZE --ckpt PRETRAINED_MODEL_PATH --augment DATASET_NAME
```
The loss term weights can be specified by `--style_loss` (λ_FM), `--CX_loss` (λ_CX), `--perc_loss` (λ_perc), `--id_loss` (λ_ID) and `--L2_reg_loss` (λ_reg). λ_ID and λ_reg are suggested to be tuned for each style dataset to achieve ideal performance. More options can be found via `python finetune_dualstylegan.py -h`.

Take the Cartoon dataset as an example, run (multi-GPU enables a large batch size of 8\*4=32 for better performance):
> python -m torch.distributed.launch --nproc_per_node=8 --master_port=8765 finetune_dualstylegan.py --iter 1500 --batch 4 --ckpt ./checkpoint/generator-pretrain.pt
--style_loss 0.25 --CX_loss 0.25 --perc_loss 1 --id_loss 1 --L2_reg_loss 0.015 --augment cartoon

The fine-tuned models can be found in `./checkpoint/cartoon/generator-ITER.pt` where ITER = 001000, 001100, ..., 001500. Intermediate results are saved in `./log/cartoon/`. Large ITER has strong cartoon styles but at the cost of artifacts, and users may select the most balanced one from 1000-1500. We use 1400 for our paper experiments.

### (optional) Latent Optimization and Sampling

**Refine extrinsic style code.** Refine the color and structure styles to better fit the example style images.
```python
python refine_exstyle.py --lr_color COLOR_LEARNING_RATE --lr_structure STRUCTURE_LEARNING_RATE DATASET_NAME
```
By default, the code will load `instyle_code.npy`, `exstyle_code.npy`, and `generator.pt` in `./checkpoint/DATASET_NAME/`. Use `--instyle_path`, `--exstyle_path`, `--ckpt` to specify other saved style codes or models. Take the Cartoon dataset as an example, run:
> python refine_exstyle.py --lr_color 0.1 --lr_structure 0.005 --ckpt ./checkpoint/cartoon/generator-001400.pt cartoon

The refined extrinsic style codes are saved in `./checkpoint/DATASET_NAME/refined_exstyle_code.npy`. `lr_color` and `lr_structure` are suggested to be tuned to better fit the example styles.

**Training sampling network.** Train a sampling network to map unit Gaussian noises to the distribution of extrinsic style codes:
```python
python train_sampler.py DATASET_NAME
```
By default, the code will load `refined_exstyle_code.npy` or `exstyle_code.npy` in `./checkpoint/DATASET_NAME/`. Use `--exstyle_path` to specify other saved extrinsic style codes. The saved model can be found in `./checkpoint/DATASET_NAME/sampler.pt`.

## (4) Results

#### Exemplar-based cartoon style trasnfer

https://user-images.githubusercontent.com/18130694/158047991-77c31137-c077-415e-bae2-865ed3ec021f.mp4

#### Exemplar-based caricature style trasnfer

https://user-images.githubusercontent.com/18130694/158048107-7b0aa439-5e3a-45a9-be0e-91ded50e9136.mp4

#### Exemplar-based anime style trasnfer

https://user-images.githubusercontent.com/18130694/158048114-237b8b81-eff3-4033-89f4-6e8a7bbf67f7.mp4

#### Other styles

#### Combine DualStyleGAN with State-of-the-Art Diffusion model

We use [StableDiffusion](https://github.com/lowfuel/progrock-stable) to generate face images of the specified style of famous artists.
Trained with these images, DualStyleGAN is able to pastiche these famous artists and generates appealing results.

## Citation

If you find this work useful for your research, please consider citing our paper:

```bibtex
@inproceedings{yang2022Pastiche,
title={Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer},
author={Yang, Shuai and Jiang, Liming and Liu, Ziwei and Loy, Chen Change},
booktitle={CVPR},
year={2022}
}
```

## Acknowledgments

The code is mainly developed based on [stylegan2-pytorch](https://github.com/rosinality/stylegan2-pytorch) and [pixel2style2pixel](https://github.com/eladrich/pixel2style2pixel).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/williamyang1991/DualStyleGAN

Awesome Lists containing this project

README