Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://rlawjdghek.github.io/StableVITON/
[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
https://rlawjdghek.github.io/StableVITON/
Last synced: 9 days ago
JSON representation
[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
- Host: GitHub
- URL: https://rlawjdghek.github.io/StableVITON/
- Owner: rlawjdghek
- Created: 2023-12-02T05:56:58.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-10-16T06:55:30.000Z (23 days ago)
- Last Synced: 2024-10-18T01:33:53.758Z (22 days ago)
- Language: Python
- Homepage: https://rlawjdghek.github.io/StableVITON/
- Size: 2.35 MB
- Stars: 992
- Watchers: 51
- Forks: 151
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-diffusion-categorized - [Project
- awesome-virtual-try-on - Project
README
# [CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
This repository is the official implementation of [StableVITON](https://arxiv.org/abs/2312.01725)> **StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On**
> [Jeongho Kim](https://scholar.google.co.kr/citations?user=4SCCBFwAAAAJ&hl=ko), [Gyojung Gu](https://www.linkedin.com/in/gyojung-gu-29033118b/), [Minho Park](https://pmh9960.github.io/), [Sunghyun Park](https://psh01087.github.io/), [Jaegul Choo](https://sites.google.com/site/jaegulchoo/)[[arXiv Paper](https://arxiv.org/abs/2312.01725)]
[[Project Page](https://rlawjdghek.github.io/StableVITON/)]![teaser](assets/teaser.png)
## TODO List
- [x] ~~Inference code~~
- [x] ~~Release model weights~~
- [x] ~~Training code~~## Environments
```bash
git clone https://github.com/rlawjdghek/StableVITON
cd StableVITONconda create --name StableVITON python=3.10 -y
conda activate StableVITON# install packages
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
pip install pytorch-lightning==1.5.0
pip install einops
pip install opencv-python==4.7.0.72
pip install matplotlib
pip install omegaconf
pip install albumentations
pip install transformers==4.33.2
pip install xformers==0.0.19
pip install triton==2.0.0
pip install open-clip-torch==2.19.0
pip install diffusers==0.20.2
pip install scipy==1.10.1
conda install -c anaconda ipython -y
```## Weights and Data
Our [checkpoint](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) on VITONHD have been released!
You can download the VITON-HD dataset from [here](https://github.com/shadow2496/VITON-HD).
For both training and inference, the following dataset structure is required:```
train
|-- image
|-- image-densepose
|-- agnostic
|-- agnostic-mask
|-- cloth
|-- cloth_mask
|-- gt_cloth_warped_mask (for ATV loss)test
|-- image
|-- image-densepose
|-- agnostic
|-- agnostic-mask
|-- cloth
|-- cloth_mask
```## Preprocessing
The VITON-HD dataset serves as a benchmark and provides an agnostic mask. However, you can attempt virtual try-on on **arbitrary images** using segmentation tools like [SAM](https://github.com/facebookresearch/segment-anything). Please note that for densepose, you should use the same densepose model as used in VITON-HD.## Inference
```bash
#### paired
CUDA_VISIBLE_DEVICES=4 python inference.py \
--config_path ./configs/VITONHD.yaml \
--batch_size 4 \
--model_load_path \
--save_dir#### unpaired
CUDA_VISIBLE_DEVICES=4 python inference.py \
--config_path ./configs/VITONHD.yaml \
--batch_size 4 \
--model_load_path \
--unpair \
--save_dir#### paired repaint
CUDA_VISIBLE_DEVICES=4 python inference.py \
--config_path ./configs/VITONHD.yaml \
--batch_size 4 \
--model_load_path t \
--repaint \
--save_dir#### unpaired repaint
CUDA_VISIBLE_DEVICES=4 python inference.py \
--config_path ./configs/VITONHD.yaml \
--batch_size 4 \
--model_load_path \
--unpair \
--repaint \
--save_dir
```You can also preserve the unmasked region by '--repaint' option.
## Training
For VITON training, we increased the first block of U-Net from 9 to 13 channels (add zero conv) based on the Paint-by-Example (PBE) model. Therefore, you should download the modified checkpoint (named as 'VITONHD_PBE_pose.ckpt') from the [Link](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) and place it in the './ckpts/' folder first.Additionally, for more refined person texture, we utilized a VAE fine-tuned on the VITONHD dataset. You should also download the checkpoint (named as VITONHD_VAE_finetuning.ckpt') from the [Link](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) and place it in the './ckpts/' folder.
```bash
### Base model training
CUDA_VISIBLE_DEVICES=3,4 python train.py \
--config_name VITONHD \
--transform_size shiftscale3 hflip \
--transform_color hsv bright_contrast \
--save_name Base_test### ATV loss finetuning
CUDA_VISIBLE_DEVICES=5,6 python train.py \
--config_name VITONHD \
--transform_size shiftscale3 hflip \
--transform_color hsv bright_contrast \
--use_atv_loss \
--resume_path \
--save_name ATVloss_test
```## Citation
If you find our work useful for your research, please cite us:
```
@inproceedings{kim2024stableviton,
title={Stableviton: Learning semantic correspondence with latent diffusion model for virtual try-on},
author={Kim, Jeongho and Gu, Guojung and Park, Minho and Park, Sunghyun and Choo, Jaegul},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={8176--8185},
year={2024}
}
```**Acknowledgements** Sunghyun Park is the corresponding author.
## License
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).