{"id":13487800,"url":"https://github.com/zengjianhao/cat-dm","last_synced_at":"2025-03-27T23:31:41.578Z","repository":{"id":196795387,"uuid":"697154596","full_name":"zengjianhao/CAT-DM","owner":"zengjianhao","description":"CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model","archived":false,"fork":false,"pushed_at":"2024-09-23T02:42:38.000Z","size":1039,"stargazers_count":113,"open_issues_count":12,"forks_count":10,"subscribers_count":20,"default_branch":"main","last_synced_at":"2024-10-30T23:35:45.172Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://zengjianhao.github.io/CAT-DM","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zengjianhao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-27T07:00:37.000Z","updated_at":"2024-10-23T19:40:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"7310ac35-09f0-470f-91f4-38910ef9fd0a","html_url":"https://github.com/zengjianhao/CAT-DM","commit_stats":null,"previous_names":["zengjianhao/fc-vton","zengjianhao/cat-dm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zengjianhao%2FCAT-DM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zengjianhao%2FCAT-DM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zengjianhao%2FCAT-DM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zengjianhao%2FCAT-DM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zengjianhao","download_url":"https://codeload.github.com/zengjianhao/CAT-DM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944020,"owners_count":20697945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:04.110Z","updated_at":"2025-03-27T23:31:41.052Z","avatar_url":"https://github.com/zengjianhao.png","language":"Python","funding_links":[],"categories":["Personalized Restoration"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003eCAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model\u003c/h1\u003e\n\n\u003cdiv\u003e\n     \u003ca href=\"https://zengjianhao.github.io/\" target=\"_blank\"\u003eJianhao Zeng\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\n     \u003ca href=\"http://seea.tju.edu.cn/info/1014/1460.htm\" target=\"_blank\"\u003eDan Song\u003c/a\u003e\u003csup\u003e1,*\u003c/sup\u003e,\n     \u003ca href=\"https://seea.tju.edu.cn/info/1014/1451.htm\" target=\"_blank\"\u003eWeizhi Nie\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\n     \u003ca href=\"https://seea.tju.edu.cn/info/1014/3931.htm\" target=\"_blank\"\u003eHongshuo Tian\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\n\u003c/div\u003e\n\u003cdiv\u003e\n     \u003ca href=\"https://tongttwang.github.io/\" target=\"_blank\"\u003eTongtong Wang\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e,\n     \u003ca href=\"https://liuanantju.github.io/\" target=\"_blank\"\u003eAnan Liu\u003c/a\u003e\u003csup\u003e1,*\u003c/sup\u003e\n\u003c/div\u003e\n\n\u003cdiv\u003e\n    \u003csup\u003e1\u003c/sup\u003eTianjin University \u0026emsp; \u003csup\u003e2\u003c/sup\u003eTencent LightSpeed Studio\n\u003c/div\u003e\n\n[[Paper]](https://arxiv.org/abs/2311.18405) [[Project]](https://zengjianhao.github.io/CAT-DM)\n\n\u003cimg src=\"./assets/CAT-DM.png\" style=\"width:20%;\"\u003e\n\n\u003c/div\u003e\n\n\n\n## Abstract\n\n\u003e Image-based virtual try-on enables users to virtually try on different garments by altering original clothes in their photographs. Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on, but have not resolved problems such as unnatural deformation of garments and the blurry generation quality. Recently, diffusion models have emerged with surprising performance across various image generation tasks. While the generative quality of diffusion models is impressive, achieving controllability poses a significant challenge when applying it to virtual try-on tasks and multiple denoising iterations limit its potential for real-time applications. In this paper, we propose Controllable Accelerated virtual Try-on with Diffusion Model called CAT-DM. To enhance the controllability, a basic diffusion-based virtual try-on network is designed, which utilizes ControlNet to introduce additional control conditions and improves the feature extraction of garment images. In terms of acceleration, CAT-DM initiates a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model. Compared with previous try-on methods based on diffusion models, CAT-DM not only retains the pattern and texture details of the in-shop garment but also reduces the sampling steps without compromising generation quality. Extensive experiments demonstrate the superiority of CAT-DM against both GAN-based and diffusion-based methods in producing more realistic images and accurately reproducing garment patterns.\n\n## Hardware Requirement\n\nOur experiments were conducted on two NVIDIA GeForce RTX 4090 graphics cards, with a single RTX 4090 having 24GB of video memory. Please note that our model cannot be trained on graphics cards with less video memory than the RTX 4090.\n\n## Environment Requirement\n\n1.   Clone the repository\n\n```bash\ngit clone https://github.com/zengjianhao/CAT-DM\n```\n\n2.   A suitable `conda` environment named `CAT-DM` can be created and activated with:\n\n```bash\ncd CAT-DM\nconda env create -f environment.yaml\nconda activate CAT-DM\n```\n\n-   If you want to change the name of the environment you created, you need to modify the `name` in both `environment.yaml` and `setup.py`.\n-   You need to make sure that `conda` is installed on your computer.\n-   If there is a network error, try updating the environment using `conda env update -f environment.yaml`.\n\n3.   Installing xFormers：\n\n```bash\ngit clone https://github.com/facebookresearch/xformers.git\ncd xformers\ngit submodule update --init --recursive\npip install -r requirements.txt\npip install -U xformers\ncd ..\nrm -rf xformers\n```\n\n4.   open `src/taming-transformers/taming/data/utils.py`, delete `from torch._six import string_classes`, and change `elif isinstance(elem, string_classes):` to `elif isinstance(elem, str):`\n\n## Dataset Preparing\n\n### VITON-HD\n\n1.  Download the [VITON-HD](https://github.com/shadow2496/VITON-HD) dataset\n2.  Create a folder `datasets`\n3.  Put the VITON-HD dataset into this folder and rename it to `vitonhd`\n4.  Generate the mask images\n\n```bash\n# Generate the train dataset mask images\npython tools/mask_vitonhd.py datasets/vitonhd/train datasets/vitonhd/train/mask\n# Generate the test dataset mask images\npython tools/mask_vitonhd.py datasets/vitonhd/test datasets/vitonhd/test/mask\n```\n\n### DressCode\n\n1. Download the [DressCode](https://github.com/aimagelab/dress-code) dataset\n2. Create a folder `datasets`\n3. Put the DressCode dataset into this folder and rename it to `dresscode`\n4. Generate the mask images and the agnostic images\n\n```bash\n# Generate the dresses dataset mask images and the agnostic images\npython tools/mask_dresscode.py datasets/dresscode/dresses datasets/dresscode/dresses/mask\n# Generate the lower_body dataset mask images and the agnostic images\npython tools/mask_dresscode.py datasets/dresscode/lower_body datasets/dresscode/lower_body/mask\n# Generate the upper_body dataset mask images and the agnostic images\npython tools/mask_dresscode.py datasets/dresscode/upper_body datasets/dresscode/upper_body/mask\n```\n\n### Details\n`datasets` folder should be as follows:\n\n```\ndatasets\n├── vitonhd\n│   ├── test\n│   │   ├── agnostic-mask\n│   │   ├── mask\n│   │   ├── cloth\n│   │   ├── image\n│   │   ├── image-densepose\n│   │   ├── ...\n│   ├── test_pairs.txt\n│   ├── train\n│   │   ├── agnostic-mask\n│   │   ├── mask\n│   │   ├── cloth\n│   │   ├── image\n│   │   ├── image-densepose\n│   │   ├── ...\n│   └── train_pairs.txt\n├── dresscode\n│   ├── dresses\n│   │   ├── dense\n│   │   ├── images\n│   │   ├── mask\n│   │   ├── ...\n│   ├── lower_body\n│   │   ├── dense\n│   │   ├── images\n│   │   ├── mask\n│   │   ├── ...\n│   ├── upper_body\n│   │   ├── dense\n│   │   ├── images\n│   │   ├── mask\n│   │   ├── ...\n│   ├── test_pairs_paired.txt\n│   ├── test_pairs_unpaired.txt\n│   ├── train_pairs.txt\n│   └── ...\n```\nPS: When we conducted the experiment, VITON-HD did not release the `agnostic-mask`. We used our own implemented `mask`, so if you are using VITON-HD's `agnostic-mask`, the generated results may vary.\n\n\n## Required Model\n\n1. Download the [Paint-by-Example](https://drive.google.com/file/d/15QzaTWsvZonJcXsNv-ilMRCYaQLhzR_i/view) model\n2. Create a folder `checkpoints`\n3. Put the Paint-by-Example model into this folder and rename it to `pbe.ckpt`\n4. Make the ControlNet model:\n\n- VITON-HD:\n```bash\npython tools/add_control.py checkpoints/pbe.ckpt checkpoints/pbe_dim6.ckpt configs/train_vitonhd.yaml\n```\n\n- DressCode:\n```bash\npython tools/add_control.py checkpoints/pbe.ckpt checkpoints/pbe_dim5.ckpt configs/train_dresscode.yaml\n```\n\n5.   `checkpoints` folder should be as follows:\n\n```\ncheckpoints\n├── pbe.ckpt\n├── pbe_dim5.ckpt\n└── pbe_dim6.ckpt\n```\n\n\n## Training\n\n### VITON-HD\n\n```bash\nbash scripts/train_vitonhd.sh\n```\n\n### DressCode\n\n```bash\nbash scripts/train_dresscode.sh\n```\n\n\n## Testing\n\n### VITON-HD\n\n1. Download the [checkpoint](https://huggingface.co/JianhaoZeng/CAT-DM/tree/main) for VITON-HD dataset and put it into `checkpoints` folder.\n\n2. Directly generate the try-on results:\n\n```bash\nbash scripts/test_vitonhd.sh\n```\n\n3. Poisson Blending\n\n```python\npython tools/poisson_vitonhd.py\n```\n\n### DressCode\n\n1. Download the [checkpoint](https://huggingface.co/JianhaoZeng/CAT-DM/tree/main) for DressCode dataset and put it into `checkpoints` folder.\n\n2. Directly generate the try-on results:\n\n```bash\nbash scripts/test_dresscode.sh\n```\n\n3. Poisson Blending\n\n```python\npython tools/poisson_dresscode.py\n```\n\n## Evaluation\n\n- FID: https://github.com/mseitzer/pytorch-fid\n\n- KID: https://github.com/toshas/torch-fidelity\n\n- SSIM: https://github.com/richzhang/PerceptualSimilarity\n\n- LPIPS: https://lightning.ai/docs/torchmetrics/stable/image/structural_similarity.html\n\n\n\n\n\n## Citing\n\n```\n@inproceedings{zeng2024cat,\n  title={CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model},\n  author={Zeng, Jianhao and Song, Dan and Nie, Weizhi and Tian, Hongshuo and Wang, Tongtong and Liu, An-An},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={8372--8382},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzengjianhao%2Fcat-dm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzengjianhao%2Fcat-dm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzengjianhao%2Fcat-dm/lists"}