Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://rlawjdghek.github.io/StableVITON/

[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
https://rlawjdghek.github.io/StableVITON/

Last synced: 4 months ago
JSON representation

[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Host: GitHub
URL: https://rlawjdghek.github.io/StableVITON/
Owner: rlawjdghek
Created: 2023-12-02T05:56:58.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-10-16T06:55:30.000Z (4 months ago)
Last Synced: 2024-10-18T01:33:53.758Z (4 months ago)
Language: Python
Homepage: https://rlawjdghek.github.io/StableVITON/
Size: 2.35 MB
Stars: 992
Watchers: 51
Forks: 151
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Project
awesome-virtual-try-on - Project

README

        # [CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On 

This repository is the official implementation of [StableVITON](https://arxiv.org/abs/2312.01725)

> **StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On**


> [Jeongho Kim](https://scholar.google.co.kr/citations?user=4SCCBFwAAAAJ&hl=ko), [Gyojung Gu](https://www.linkedin.com/in/gyojung-gu-29033118b/), [Minho Park](https://pmh9960.github.io/), [Sunghyun Park](https://psh01087.github.io/), [Jaegul Choo](https://sites.google.com/site/jaegulchoo/)

[[arXiv Paper](https://arxiv.org/abs/2312.01725)] 

[[Project Page](https://rlawjdghek.github.io/StableVITON/)] 

![teaser](assets/teaser.png) 

## TODO List

- [x] ~~Inference code~~

- [x] ~~Release model weights~~

- [x] ~~Training code~~

## Environments

```bash

git clone https://github.com/rlawjdghek/StableVITON

cd StableVITON

conda create --name StableVITON python=3.10 -y

conda activate StableVITON

# install packages

pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117

pip install pytorch-lightning==1.5.0

pip install einops

pip install opencv-python==4.7.0.72

pip install matplotlib

pip install omegaconf

pip install albumentations

pip install transformers==4.33.2

pip install xformers==0.0.19

pip install triton==2.0.0

pip install open-clip-torch==2.19.0

pip install diffusers==0.20.2

pip install scipy==1.10.1

conda install -c anaconda ipython -y

```

## Weights and Data

Our [checkpoint](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) on VITONHD have been released! 


You can download the VITON-HD dataset from [here](https://github.com/shadow2496/VITON-HD).


For both training and inference, the following dataset structure is required:

```

train

|-- image

|-- image-densepose

|-- agnostic

|-- agnostic-mask

|-- cloth

|-- cloth_mask

|-- gt_cloth_warped_mask (for ATV loss)

test

|-- image

|-- image-densepose

|-- agnostic

|-- agnostic-mask

|-- cloth

|-- cloth_mask

```

## Preprocessing

The VITON-HD dataset serves as a benchmark and provides an agnostic mask. However, you can attempt virtual try-on on **arbitrary images** using segmentation tools like [SAM](https://github.com/facebookresearch/segment-anything). Please note that for densepose, you should use the same densepose model as used in VITON-HD.

## Inference

```bash

#### paired

CUDA_VISIBLE_DEVICES=4 python inference.py \

 --config_path ./configs/VITONHD.yaml \

 --batch_size 4 \

 --model_load_path  \

 --save_dir 

#### unpaired

CUDA_VISIBLE_DEVICES=4 python inference.py \

 --config_path ./configs/VITONHD.yaml \

 --batch_size 4 \

 --model_load_path  \

 --unpair \

 --save_dir 

#### paired repaint

CUDA_VISIBLE_DEVICES=4 python inference.py \

 --config_path ./configs/VITONHD.yaml \

 --batch_size 4 \

 --model_load_path t \

 --repaint \

 --save_dir 

#### unpaired repaint

CUDA_VISIBLE_DEVICES=4 python inference.py \

 --config_path ./configs/VITONHD.yaml \

 --batch_size 4 \

 --model_load_path  \

 --unpair \

 --repaint \

 --save_dir 

```

You can also preserve the unmasked region by '--repaint' option. 

## Training

For VITON training, we increased the first block of U-Net from 9 to 13 channels (add zero conv) based on the Paint-by-Example (PBE) model. Therefore, you should download the modified checkpoint (named as 'VITONHD_PBE_pose.ckpt') from the [Link](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) and place it in the './ckpts/' folder first.

Additionally, for more refined person texture, we utilized a VAE fine-tuned on the VITONHD dataset. You should also download the checkpoint (named as VITONHD_VAE_finetuning.ckpt') from the [Link](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) and place it in the './ckpts/' folder.

```bash

### Base model training

CUDA_VISIBLE_DEVICES=3,4 python train.py \

 --config_name VITONHD \

 --transform_size shiftscale3 hflip \

 --transform_color hsv bright_contrast \

 --save_name Base_test

### ATV loss finetuning

CUDA_VISIBLE_DEVICES=5,6 python train.py \

 --config_name VITONHD \

 --transform_size shiftscale3 hflip \

 --transform_color hsv bright_contrast \

 --use_atv_loss \

 --resume_path  \

 --save_name ATVloss_test

```

## Citation

If you find our work useful for your research, please cite us:

```

@inproceedings{kim2024stableviton,

  title={Stableviton: Learning semantic correspondence with latent diffusion model for virtual try-on},

  author={Kim, Jeongho and Gu, Guojung and Park, Minho and Park, Sunghyun and Choo, Jaegul},

  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  pages={8176--8185},

  year={2024}

}

```

**Acknowledgements** Sunghyun Park is the corresponding author.

## License

Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).