{"id":13488661,"url":"https://github.com/AlonzoLeeeooo/LCDG","last_synced_at":"2025-03-28T01:37:05.574Z","repository":{"id":169261416,"uuid":"645147440","full_name":"AlonzoLeeeooo/LCDG","owner":"AlonzoLeeeooo","description":"The official code implementation of \"LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis\".","archived":false,"fork":false,"pushed_at":"2024-07-10T03:05:22.000Z","size":119290,"stargazers_count":29,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-08-01T18:39:02.461Z","etag":null,"topics":["diffusion-models","image-generation","text-to-image-generation"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2305.11520","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlonzoLeeeooo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-25T03:02:19.000Z","updated_at":"2024-07-11T09:16:53.000Z","dependencies_parsed_at":"2024-07-09T13:40:21.592Z","dependency_job_id":null,"html_url":"https://github.com/AlonzoLeeeooo/LCDG","commit_stats":null,"previous_names":["alonzoleeeooo/lcdg"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlonzoLeeeooo%2FLCDG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlonzoLeeeooo%2FLCDG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlonzoLeeeooo%2FLCDG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlonzoLeeeooo%2FLCDG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlonzoLeeeooo","download_url":"https://codeload.github.com/AlonzoLeeeooo/LCDG/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222333976,"owners_count":16968058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","image-generation","text-to-image-generation"],"created_at":"2024-07-31T18:01:19.683Z","updated_at":"2025-03-28T01:37:05.567Z","avatar_url":"https://github.com/AlonzoLeeeooo.png","language":"Python","funding_links":[],"categories":["Additional conditions"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis\n\nChang Liu, Rui Li, Kaidong Zhang, Xin Luo, Dong Liu\n\n[[`Paper`]](https://arxiv.org/pdf/2305.11520) / [[`Project`]](https://alonzoleeeooo.github.io/LCDG/) / [[`Huggingface`]](https://huggingface.co/AlonzoLeeeooo/LaCon) / [[`ModelScope`]](https://modelscope.cn/models/AlonzoLeeeoooo/LaCon) / [`Demo`]\n\u003c/div\u003e\n\n\u003c!-- omit in toc --\u003e\n# Table of Contents\n- [\u003cu\u003e1. News\u003c/u\u003e](#news)\n- [\u003cu\u003e2. To-Do Lists\u003c/u\u003e](#to-do-lists)\n- [\u003cu\u003e3. Overview of LaCon\u003c/u\u003e](#overview-of-lacon)\n- [\u003cu\u003e4. Code Structure\u003c/u\u003e](#code-structure)\n- [\u003cu\u003e5. Prerequisites\u003c/u\u003e](#prerequisites)\n- [\u003cu\u003e6. Training of Condition Aligner\u003c/u\u003e](#training-of-condition-aligner)\n- [\u003cu\u003e7. Sampling with Condition Aligner\u003c/u\u003e](#sampling-with-condition-aligner)\n- [\u003cu\u003e8. Evaluation\u003c/u\u003e](#evaluation)\n- [\u003cu\u003e9. Results\u003c/u\u003e](#results)\n- [\u003cu\u003e10. Citation\u003c/u\u003e](#citation)\n- [\u003cu\u003e11. Stars, Forked, and Star History\u003c/u\u003e](#stars-forked-and-star-history)\n\nIf you have any questions about this work, please feel free to [start a new issue](https://github.com/AlonzoLeeeooo/LCDG/issues/new) or [propose a PR](https://github.com/AlonzoLeeeooo/LCDG/pulls).\n\n\u003c!-- omit in toc --\u003e\n# News\n- [Jun. 12th] We have updated the training and sampling code of LaCon. Pre-trained model weights are currently available at our [Huggingface repo](https://huggingface.co/AlonzoLeeeooo/LaCon/tree/main) and [ModelScope repo](https://modelscope.cn/models/AlonzoLeeeoooo/LaCon).\n\n\u003c!-- omit in toc --\u003e\n# To-Do Lists\n  - [x] Upload a newer version of paper to arXiv\n  - [x] Update the codebase\n  - [x] Update the repo document\n  - [x] Upload the pre-trained model weights of LaCon based on Celeb and Stable Diffusion v1.4\n  - [ ] Update the pre-trained model weights of LaCon based on Stable Diffusion v2.1\n  - [ ] Update implementation for local Gradio demo\n  - [ ] Update online HuggingFace demo\n\n\u003c!-- omit in toc --\u003e\n# Overview of LaCon\n![teasor](github-materials/teasor.png)\n\u003e Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process, existing studies, termed as early-constraint methods in this paper, leverage extra conditions and incorporate them into pre-trained diffusion models. Particularly, some of them adopt condition-specific modules to handle conditions separately, where they struggle to generalize across other conditions. Although follow-up studies present unified solutions to solve the generalization problem, they also require extra resources to implement, e.g., additional inputs or parameter optimization, where more flexible and efficient solutions are expected to perform steerable guided image synthesis. In this paper, we present an alternative paradigm, namely Late-Constraint Diffusion (LaCon), to simultaneously integrate various conditions into pre-trained diffusion models. Specifically, LaCon establishes an alignment between the external condition and the internal features of diffusion models, and utilizes the alignment to incorporate the target condition, guiding the sampling process to produce tailored results. Experimental results on COCO dataset illustrate the effectiveness and superior generalization capability of LaCon under various conditions and settings. Ablation studies investigate the functionalities of different components in LaCon, and illustrate its great potential to serve as an efficient solution to offer flexible controllability for diffusion models.\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\n\u003c!-- omit in toc --\u003e\n# Code Structure\nThis GitHub repo is constructed following the code structure below:\n```\nLaCon/\n└── condition_aligner_src                  \u003c----- Source code of LaCon\n    ├── __init__.py\n    ├── condition_aligner_dataset.py       \u003c----- Dataset\n    ├── condition_aligner_model.py         \u003c----- Model\n    └── condition_aligner_runner.py        \u003c----- Runner (train and inference)\n├── configs                                \u003c----- Configuration files\n├── data-preprocessing                     \u003c----- Code of data pre-processing\n├── evaluation-metrics                     \u003c----- Code of evaluation metrics\n├── github-materials\n├── ldm                                    \u003c----- Source code of LDM (Stable Diffusion)\n├── taming                                 \u003c----- Source code of `taming` package\n├── tools                                  \u003c----- Code of toolkits to assist data pre-processing\n├── README.md\n├── condition-aligner-inference.py         \u003c----- Script to reconstruct conditions with the condition aligner\n├── condition-aligner-train.py             \u003c----- Script to train condition aligner\n├── generate-batch-image.py                \u003c----- Script to generate results in batch\n├── generate-single-image.py               \u003c----- Script to generate a single result\n└── install.sh                             \u003c----- Bash script to install the virtual environment\n```\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\u003c!-- omit in toc --\u003e\n# Prerequisites\n1. To install the virtual environment of LaCon, you can execute the following command lines:\n```bash\nconda create -n lacon\nconda activate lacon\npip install torch==2.0.0 torchvision==0.15.1\nbash install.sh\n```\n\n2. To prepare the pre-trained model weights of different components in `Stable Diffusion` as well as our condition aligner, please download the model weights from our [Huggingface repo](https://huggingface.co/AlonzoLeeeooo/LaCon) and put them in `./checkpoints`. Once the weights are downloaded, modify the configuration files in `./configs`. Check [this document](configs/README.md) for more details of modifying configuration files.\n**We strongly recommend you to download [the whole Huggingface repo of CLIP](https://huggingface.co/openai/clip-vit-large-patch14) locally, in order to avoid the network issue of Huggingface.**\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\n\u003c!-- omit in toc --\u003e\n# Training of Condition Aligner\n1. We use a subset of the training set [COCO](https://cocodataset.org/) with approximate 10,000 data samples. To train the condition aligner, you need to follow the instructions in [this document](data-preprocessing/README.md) and construct the data in the following structure:\n```bash\ndata/\n└── bdcn-edges\n    ├── 1.png\n    ├── 2.png\n    ├── ...\n└── saliency-masks\n    ├── 1.png\n    ├── 2.png\n    ├── ...\n└── color-strokes\n    ├── 1.png\n    ├── 2.png\n    ├── ...\n└── coco-captions\n    ├── 1.txt\n    ├── 2.txt\n    ├── ...\n└── images\n```\n\n\n2. Once the training data is ready, you need to modify the configuration files following [this document](configs/README.md).\n3. Now you are ready to go by executing the following command line:\n```bash\npython condition-aligner-train.py -b CONFIG_PATH -l OUTPUT_PATH\n```\nYou can refer to this example command line:\n```bash\npython condition-aligner-train.py -b configs/sd-edge.yaml -l outputs/training/sd-edge\n```\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\n\u003c!-- omit in toc --\u003e\n# Sampling with Condition Aligner\nExecute the following command line to generate an image with the trained condition aligner:\n```bash\npython generate-single-image.py --cond_type COND_TYPE --indir CONDITION_PATH --resume CONDITION_ALIGNER_PATH --caption TEXT_PROMPT --cond_scale CONTROLLING_SCALE --unconditional_guidance_scale CLASSIFIER_FREE_GUIDANCE_SCALE  --outdir OUTPUT_PATH -b CONFIG_PATH --seed SEED --truncation_steps TRUNCATION_STEPS --use_neg_prompt\n```\nYou can refer to this example command line:\n```bash\npython generate-single-image.py --cond_type mask --indir examples/horse.png --resume checkpoints/sdv14_mask.pth --caption \"a horse standing in the moon surface\" --cond_scale 2.0 --unconditional_guidance_scale 6.0  --outdir outputs/ -b configs/sd-mask.yaml --seed 23 --truncation_steps 600 --use_neg_prompt\n```\nWe suggest the following settings to achieve the optimal performance for various conditions:\n\n|Condition|Setting|Model Weight|Controlling Scale|Truncation Steps|\n|---|---|---|---|---|\n|Canny Edge|Unconditional Generation|`sd_celeb_edge.pth`|2.0|500|\n|HED Edge|Unconditional Generation|`sd_celeb_edge.pth`|2.0|500|\n|User Sketch|Unconditional Generation|`sd_celeb_edge.pth`|2.0|600|\n|Color Stroke|Unconditional Generation|`sd_celeb_color.pth`|2.0|600|\n|Image Palette|Unconditional Generation|`sd_celeb_color.pth`|2.0|800|\n|Canny Edge|T2I Generation|`sdv14_edge.pth`|2.0|500|\n|HED Edge|T2I Generation|`sdv14_edge.pth`|2.5|500|\n|User Sketch|T2I Generation|`sdv14_edge.pth`|2.0|600|\n|Color Stroke|T2I Generation|`sdv14_color.pth`|2.0|600|\n|Image Palette|T2I Generation|`sdv14_color.pth`|2.0|800|\n|Saliency Mask|T2I Generation|`sdv14_mask.pth`|2.0|600|\n|User Scribble|T2I Generation|`sdv14_mask.pth`|2.0|700|\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\n\u003c!-- omit in toc --\u003e\n# Evaluation\nPrepare the test set following the data structure below:\n```bash\ndata/\n└── bdcn-edges\n    ├── 1.png\n    ├── 2.png\n    ├── ...\n└── saliency-masks\n    ├── 1.png\n    ├── 2.png\n    ├── ...\n└── color-strokes\n    ├── 1.png\n    ├── 2.png\n    ├── ...\n└── image-palette\n    ├── 1.png\n    ├── 2.png\n    ├── ...\n└── coco-captions\n    ├── 1.txt\n    ├── 2.txt\n    ├── ...\n└── images\n```\n\nExecute the following command line to test all data samples in the test set:\n```bash\npython generate-batch-image.py -b CONFIG_PATH --indir DATA_FILELIST_PATH --text CAPTION_PATH --target_cond CONDITION_PATH --resume CONDITION_ALIGNER_PATH --cond_scale CONTROLLING_SCALE --truncation_steps TRUNCATION_STEPS\n```\nYou can refer to this example command line:\n```bash\npython generate-batch-image.py -b configs/sd-mask.yaml --indir data/coco2017val/data_flist.txt --text data/coco2017val/coco-captions --target_cond data/coco2017val/saliency-masks --resume checkpoints/sdv14_mask.pth --cond_scale 2.0 --truncation_steps 600\n```\nTo compute evaluation metrics (e.g., FID and CLIP scores), please refer to [this document](evaluation-metrics/README.md) for more details. We report the performance of LaCon on [COCO 2017 validation set](https://cocodataset.org/#download) in the following table:\n|Condition|Model Weight|FID|CLIP Score|\n|---|---|---|---|\n|HED Edge|`sdv14_edge.pth`|21.02|0.2590|\n|Color Stroke|`sdv14_color.pth`|20.27|0.2589|\n|Image Palette|`sdv14_color.pth`|20.61|0.2580|\n|Saliency Mask|`sdv14_mask.pth`|20.94|0.2617|\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\n\u003c!-- omit in toc --\u003e\n# Results\n\u003cdetails\u003e \u003csummary\u003e We demonstrate results generated by LaCon under various conditions in the following figures. \u003c/summary\u003e\n\n\u003cdiv align=\"center\"\u003e\nCanny Edge\n\u003c/div\u003e\n\n![canny-edge](github-materials/canny-edge.png)\n\n\n\u003cdiv align=\"center\"\u003e\nHED Edge\n\u003c/div\u003e\n\n![hed-edge](github-materials/hed-edge.png)\n\n\n\u003cdiv align=\"center\"\u003e\nUser Sketch\n\u003c/div\u003e\n\n![user-sketch](github-materials/user-sketch.png)\n\n\n\u003cdiv align=\"center\"\u003e\nColor Stroke\n\u003c/div\u003e\n\n![Color Stroke](github-materials/color-stroke.png)\n\n\u003cdiv align=\"center\"\u003e\nImage Palette\n\u003c/div\u003e\n\n![image-palette](github-materials/image-palette.png)\n\n\u003cdiv align=\"center\"\u003e\nMask\n\u003c/div\u003e\n\n![mask](github-materials/mask.png)\n\n\u003c/details\u003e\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\n\u003c!-- omit in toc --\u003e\n# Citation\nIf you find our paper helpful to your work, please cite our paper with the following BibTeX reference:\n```bibtex\n@misc{liu-etal-2024-lacon,\n      title={{LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis}}, \n      author={{Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, and Dong Liu}},\n      year={2024},\n      eprint={2305.11520},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n\n\n\u003c!-- omit in toc --\u003e\n# Stars, Forked, and Star History\n[![Stargazers repo roster for @AlonzoLeeeooo/LCDG](https://reporoster.com/stars/dark/AlonzoLeeeooo/LCDG)](https://github.com/AlonzoLeeeooo/LCDG/stargazers)\n\n[![Forkers repo roster for @AlonzoLeeeooo/LCDG](https://reporoster.com/forks/dark/AlonzoLeeeooo/LCDG)](https://github.com/AlonzoLeeeooo/LCDG/network/members)\n\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://api.star-history.com/svg?repos=AlonzoLeeeooo/LCDG\u0026type=Date\" target=\"_blank\"\u003e\n        \u003cimg width=\"500\" src=\"https://api.star-history.com/svg?repos=AlonzoLeeeooo/LCDG\u0026type=Date\" alt=\"Star History Chart\"\u003e\n    \u003c/a\u003e\n\u003cp\u003e\n\n[\u003cu\u003e\u003csmall\u003e\u003c🎯Back to Table of Contents\u003e\u003c/small\u003e\u003c/u\u003e](#table-of-contents)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlonzoLeeeooo%2FLCDG","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAlonzoLeeeooo%2FLCDG","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlonzoLeeeooo%2FLCDG/lists"}