{"id":13488062,"url":"https://github.com/adobe-research/custom-diffusion","last_synced_at":"2025-05-15T14:05:49.662Z","repository":{"id":64895221,"uuid":"575997792","full_name":"adobe-research/custom-diffusion","owner":"adobe-research","description":"Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)","archived":false,"fork":false,"pushed_at":"2023-12-20T12:00:48.000Z","size":63487,"stargazers_count":1930,"open_issues_count":52,"forks_count":141,"subscribers_count":31,"default_branch":"main","last_synced_at":"2025-04-07T18:09:41.419Z","etag":null,"topics":["computer-vision","customization","diffusion-models","few-shot","fine-tuning","pytorch","text-to-image-generation"],"latest_commit_sha":null,"homepage":"https://www.cs.cmu.edu/~custom-diffusion","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adobe-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-08T19:18:41.000Z","updated_at":"2025-04-06T04:50:40.000Z","dependencies_parsed_at":"2024-10-14T08:52:53.630Z","dependency_job_id":null,"html_url":"https://github.com/adobe-research/custom-diffusion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe-research%2Fcustom-diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe-research%2Fcustom-diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe-research%2Fcustom-diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adobe-research%2Fcustom-diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adobe-research","download_url":"https://codeload.github.com/adobe-research/custom-diffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254355334,"owners_count":22057354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","customization","diffusion-models","few-shot","fine-tuning","pytorch","text-to-image-generation"],"created_at":"2024-07-31T18:01:08.987Z","updated_at":"2025-05-15T14:05:44.654Z","avatar_url":"https://github.com/adobe-research.png","language":"Python","readme":"# Custom Diffusion\n\n### [website](https://www.cs.cmu.edu/~custom-diffusion/)  | [paper](http://arxiv.org/abs/2212.04488) \n\n\n**[NEW!]** Custom Diffusion is now supported in diffusers. Please [refer](https://github.com/huggingface/diffusers/tree/main/examples/custom_diffusion) here for training and inference details. \n\n**[NEW!]** CustomConcept101 dataset. We release a new dataset of 101 concepts along with their evaluation prompts. For more details please refer [here](customconcept101/README.md).  \n\n**[NEW!]** Custom Diffusion with SDXL. Diffusers code now with updated diffusers==0.21.4. \n\n\u003cbr\u003e\n\u003cdiv class=\"gif\"\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/results.gif' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n[Custom Diffusion](https://www.cs.cmu.edu/~custom-diffusion) allows you to fine-tune text-to-image diffusion models, such as [Stable Diffusion](https://github.com/CompVis/stable-diffusion), given a few images of a new concept (~4-20). Our method is fast (~6 minutes on 2 A100 GPUs) as it fine-tunes only a subset of model parameters, namely key and value projection matrices, in the cross-attention layers. This also reduces the extra storage for each additional concept to 75MB.\n\nOur method further allows you to use a combination of multiple concepts such as new object + new artistic style, multiple new objects, and new object + new category. See [multi-concept results](#multi-concept-results) for more visual results. \n\n***Multi-Concept Customization of Text-to-Image Diffusion*** \u003cbr\u003e\n[Nupur Kumari](https://nupurkmr9.github.io/), [Bingliang Zhang](https://zhangbingliang2019.github.io), [Richard Zhang](https://richzhang.github.io/), [Eli Shechtman](https://research.adobe.com/person/eli-shechtman/), [Jun-Yan Zhu](https://www.cs.cmu.edu/~junyanz/)\u003cbr\u003e\nIn CVPR 2023 \u003cbr\u003e\n\n\n\n## Results\n\nAll our results are based on fine-tuning [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original) model.\nWe show results on various categories of images, including scene, pet, personal toy, and style, and with a varying number of training samples. \nFor more generations and comparisons with concurrent methods, please refer to our [webpage](https://www.cs.cmu.edu/~custom-diffusion/) and [gallery](https://www.cs.cmu.edu/~custom-diffusion/results.html).\n\n\n### Single-Concept Results\n\n\u003cdiv\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/tortoise_plushy.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/teddybear.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/art.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/art2.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/moongate.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/barn.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/cat.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/dog.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\n\n### Multi-Concept Results\n\n\n\u003cdiv\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/woodenpot_cat.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/table_chair.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/woodenpot_flower.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/chair_cat.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\n\n\n## Method Details\n\n\n\u003cdiv\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/methodology.jpg' align=\"center\" width=900\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\nGiven the few user-provided images of a concept, our method augments a pre-trained text-to-image diffusion model, enabling new generations of the concept in unseen contexts. \nWe fine-tune a small subset of model weights, namely the key and value mapping from text to latent features in the cross-attention layers of the diffusion model. \nOur method also uses a small set of regularization images (200) to prevent overfitting. For personal categories, we add a new modifier token V* in front of the category name, e.g., V* dog. For multiple concepts, we jointly train on the dataset for the two concepts. Our method also enables the merging of two fine-tuned models using optimization. For more details, please refer to our [paper](https://arxiv.org/abs/2212.04488).  \n\n## Getting Started\n\n```\ngit clone https://github.com/adobe-research/custom-diffusion.git\ncd custom-diffusion\ngit clone https://github.com/CompVis/stable-diffusion.git\ncd stable-diffusion\nconda env create -f environment.yaml\nconda activate ldm\npip install clip-retrieval tqdm\n```\n\nOur code was developed on the following commit `#21f890f9da3cfbeaba8e2ac3c425ee9e998d5229` of [stable-diffusion](https://github.com/CompVis/stable-diffusion).\n\nDownload the stable-diffusion model checkpoint\n`wget https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt`\nFor more details, please refer [here](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original).\n\n**Dataset:** we release some of the datasets used in paper [here](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip). \nImages taken from UnSplash are under [UnSplash LICENSE](https://unsplash.com/license). Moongate dataset can be downloaded from [here](https://github.com/odegeasslbc/FastGAN-pytorch).\n\n**Models:** all our models can be downloaded from [here](https://www.cs.cmu.edu/~custom-diffusion/assets/models/). \n\n### Single-Concept Fine-tuning\n\n**Real images as regularization**\n```\n## download dataset\nwget https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip\nunzip data.zip\n\n## run training (30 GB on 2 GPUs)\nbash scripts/finetune_real.sh \"cat\" data/cat real_reg/samples_cat  cat finetune_addtoken.yaml \u003cpretrained-model-path\u003e\n\n## save updated model weights\npython src/get_deltas.py --path logs/\u003cfolder-name\u003e --newtoken 1\n\n## sample\npython sample.py --prompt \"\u003cnew1\u003e cat playing with a ball\" --delta_ckpt logs/\u003cfolder-name\u003e/checkpoints/delta_epoch\\=000004.ckpt --ckpt \u003cpretrained-model-path\u003e\n```\n\nThe `\u003cpretrained-model-path\u003e` is the path to the pretrained `sd-v1-4.ckpt` model. Our results in the paper are not based on the [clip-retrieval](https://github.com/rom1504/clip-retrieval) for retrieving real images as the regularization samples. But this also leads to similar results.\n\n**Generated images as regularization**\n```\nbash scripts/finetune_gen.sh \"cat\" data/cat gen_reg/samples_cat  cat finetune_addtoken.yaml \u003cpretrained-model-path\u003e\n```\n\n\n### Multi-Concept Fine-tuning\n\n**Joint training**\n\n```\n## run training (30 GB on 2 GPUs)\nbash scripts/finetune_joint.sh \"wooden pot\" data/wooden_pot real_reg/samples_wooden_pot \\\n                                    \"cat\" data/cat real_reg/samples_cat  \\\n                                    wooden_pot+cat finetune_joint.yaml \u003cpretrained-model-path\u003e\n\n## save updated model weights\npython src/get_deltas.py --path logs/\u003cfolder-name\u003e --newtoken 2\n\n## sample\npython sample.py --prompt \"the \u003cnew2\u003e cat sculpture in the style of a \u003cnew1\u003e wooden pot\" --delta_ckpt logs/\u003cfolder-name\u003e/checkpoints/delta_epoch\\=000004.ckpt --ckpt \u003cpretrained-model-path\u003e\n```\n\n**Optimization based weights merging**\n\nGiven two fine-tuned model weights `delta_ckpt1` and `delta_ckpt2` for any two categories, the weights can be merged to create a single model as shown below.  \n```\npython src/composenW.py --paths \u003cdelta_ckpt1\u003e+\u003cdelta_ckpt2\u003e --categories  \"wooden pot+cat\"  --ckpt \u003cpretrained-model-path\u003e \n\n## sample\npython sample.py --prompt \"the \u003cnew2\u003e cat sculpture in the style of a \u003cnew1\u003e wooden pot\" --delta_ckpt optimized_logs/\u003cfolder-name\u003e/checkpoints/delta_epoch\\=000000.ckpt --ckpt \u003cpretrained-model-path\u003e\n```\n\n\n### Training using Diffusers library\n\n**[NEW!]** Custom Diffusion is also supported in diffusers now. Please [refer](https://github.com/huggingface/diffusers/tree/main/examples/custom_diffusion) here for training and inference details. \n\n\n```\n## install requirements \npip install accelerate\u003e=0.24.1\npip install modelcards\npip install transformers\u003e=4.31.0\npip install deepspeed\npip install diffusers==0.21.4\naccelerate config\nexport MODEL_NAME=\"CompVis/stable-diffusion-v1-4\"\n```\n\n**Single-Concept fine-tuning**\n\n```\n## launch training script (2 GPUs recommended, increase --max_train_steps to 500 if 1 GPU)\n\naccelerate launch src/diffusers_training.py \\\n          --pretrained_model_name_or_path=$MODEL_NAME  \\\n          --instance_data_dir=./data/cat  \\\n          --class_data_dir=./real_reg/samples_cat/ \\\n          --output_dir=./logs/cat  \\\n          --with_prior_preservation --real_prior --prior_loss_weight=1.0 \\\n          --instance_prompt=\"photo of a \u003cnew1\u003e cat\"  \\\n          --class_prompt=\"cat\" \\\n          --resolution=512  \\\n          --train_batch_size=2  \\\n          --learning_rate=1e-5  \\\n          --lr_warmup_steps=0 \\\n          --max_train_steps=250 \\\n          --num_class_images=200 \\\n          --scale_lr --hflip  \\\n          --modifier_token \"\u003cnew1\u003e\"\n\n## sample \npython src/diffusers_sample.py --delta_ckpt logs/cat/delta.bin --ckpt \"CompVis/stable-diffusion-v1-4\" --prompt \"\u003cnew1\u003e cat playing with a ball\"\n```\n\nYou can also use `--enable_xformers_memory_efficient_attention` and enable `fp16` during `accelerate config` for faster training with lower VRAM requirement. To train with SDXL use `diffusers_training_sdxl.py` with `MODEL_NAME=\"stabilityai/stable-diffusion-xl-base-1.0\"`.\n\n**Multi-Concept fine-tuning**\n\nProvide a [json](assets/concept_list.json) file with the info about each concept, similar to [this](https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_dreambooth.py).\n```\n## launch training script (2 GPUs recommended, increase --max_train_steps to 1000 if 1 GPU)\n\naccelerate launch src/diffusers_training.py \\\n          --pretrained_model_name_or_path=$MODEL_NAME  \\\n          --output_dir=./logs/cat_wooden_pot  \\\n          --concepts_list=./assets/concept_list.json \\\n          --with_prior_preservation --real_prior --prior_loss_weight=1.0 \\\n          --resolution=512  \\\n          --train_batch_size=2  \\\n          --learning_rate=1e-5  \\\n          --lr_warmup_steps=0 \\\n          --max_train_steps=500 \\\n          --num_class_images=200 \\\n          --scale_lr --hflip  \\\n          --modifier_token \"\u003cnew1\u003e+\u003cnew2\u003e\" \n\n## sample \npython src/diffusers_sample.py --delta_ckpt logs/cat_wooden_pot/delta.bin --ckpt \"CompVis/stable-diffusion-v1-4\" --prompt \"\u003cnew1\u003e cat sitting inside a \u003cnew2\u003e wooden pot and looking up\"\n```\n\n**Optimization based weights merging for Multi-Concept**\n\nGiven two fine-tuned model weights `delta1.bin` and `delta2.bin` for any two categories, the weights can be merged to create a single model as shown below.  \n```\npython src/diffusers_composenW.py --paths \u003cdelta1.bin\u003e+\u003cdelta2.bin\u003e --categories  \"wooden pot+cat\"  --ckpt \"CompVis/stable-diffusion-v1-4\"\n\n## sample\npython src/diffusers_sample.py --delta_ckpt optimized_logs/\u003cfolder-name\u003e/delta.bin --ckpt \"CompVis/stable-diffusion-v1-4\" --prompt \"\u003cnew1\u003e cat sitting inside a \u003cnew2\u003e wooden pot and looking up\"\n```\n\nThe diffuser training code is modified from the following [DreamBooth]( https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py), [Textual Inversion](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py) training scripts. For more details on how to setup accelarate please refer [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth).\n\n### Fine-tuning on human faces\n\nFor fine-tuning on human faces, we recommend `learning_rate=5e-6` and `max_train_steps=750` in the above diffuser training script or using `finetune_face.yaml` config in stable-diffusion training script. \n\nWe observe better results with a lower learning rate, longer training, and more images for human faces compared to other categories shown in our paper. With fewer images, fine-tuning all parameters in the cross-attention is slightly better, which can be enabled with `--freeze_model \"crossattn\"`.  \nExample results on fine-tuning with 14 close-up photos of [Richard Zhang](https://richzhang.github.io/) with the diffusers training script. \n\n\u003cdiv\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/face1.jpg' align=\"center\" width=800\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n### Model compression\n\n```\npython src/compress.py --delta_ckpt \u003cfinetuned-delta-path\u003e --ckpt \u003cpretrained-model-path\u003e\n\n## sample\npython sample.py --prompt \"\u003cnew1\u003e cat playing with a ball\" --delta_ckpt logs/\u003cfolder-name\u003e/checkpoints/compressed_delta_epoch\\=000004.ckpt --ckpt \u003cpretrained-model-path\u003e --compress\n```\n\nSample generations with different level of compression. By default our code saves the low-rank approximation with top 60% singular values to result in ~15 MB models. \n\u003cdiv\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src='assets/compression.jpg' align=\"center\" width=900\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n### Checkpoint conversions for stable-diffusion-v1-4\n\n* From diffusers `delta.bin` to CompVis `delta_model.ckpt`. \n```\npython src/convert.py --delta_ckpt \u003cpath-to-folder\u003e/delta.bin --ckpt \u003cpath-to-model-v1-4.ckpt\u003e --mode diffuser-to-compvis                  \n# sample\npython sample.py --delta_ckpt \u003cpath-to-folder\u003e/delta_model.ckpt --ckpt \u003cpath-to-model-v1-4.ckpt\u003e --prompt \u003ctext-prompt\u003e --config configs/custom-diffusion/finetune_addtoken.yaml\n```\n\n* From diffusers `delta.bin` to [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) checkpoint. \n```\npython src/convert.py --delta_ckpt \u003cpath-to-folder\u003e/delta.bin --ckpt \u003cpath-to-model-v1-4.ckpt\u003e --mode diffuser-to-webui                  \n# launch UI in stable-diffusion-webui directory\nbash webui.sh --embeddings-dir \u003cpath-to-folder\u003e/webui/embeddings  --ckpt \u003cpath-to-folder\u003e/webui/model.ckpt\n```\n\n* From CompVis `delta_model.ckpt` to diffusers `delta.bin`. \n```\npython src/convert.py --delta_ckpt \u003cpath-to-folder\u003e/delta_model.ckpt --ckpt \u003cpath-to-model-v1-4.ckpt\u003e --mode compvis-to-diffuser                  \n# sample\npython src/diffusers_sample.py --delta_ckpt \u003cpath-to-folder\u003e/delta.bin --ckpt \"CompVis/stable-diffusion-v1-4\" --prompt \u003ctext-prompt\u003e\n```\n\n* From CompVis `delta_model.ckpt` [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) checkpoint. \n```\npython src/convert.py --delta_ckpt \u003cpath-to-folder\u003e/delta_model.ckpt --ckpt \u003cpath-to-model-v1-4.ckpt\u003e --mode compvis-to-webui                  \n# launch UI in stable-diffusion-webui directory\nbash webui.sh --embeddings-dir \u003cpath-to-folder\u003e/webui/embeddings  --ckpt \u003cpath-to-folder\u003e/webui/model.ckpt\n```\nConverted checkpoints are saved in the `\u003cpath-to-folder\u003e` of the original checkpoints. \n\n\n## References\n\n```\n@article{kumari2022customdiffusion,\n  title={Multi-Concept Customization of Text-to-Image Diffusion},\n  author={Kumari, Nupur and Zhang, Bingliang and Zhang, Richard and Shechtman, Eli and Zhu, Jun-Yan},\n  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year      = {2023}\n}\n```\n\n## Acknowledgments\nWe are grateful to Nick Kolkin, David Bau, Sheng-Yu Wang, Gaurav Parmar, John Nack, and Sylvain Paris for their helpful comments and discussion, and to Allie Chang, Chen Wu, Sumith Kulal, Minguk Kang, Yotam Nitzan, and Taesung Park for proofreading the draft. We also thank Mia Tang and Aaron Hertzmann for sharing their artwork. Some of the datasets are downloaded from Unsplash. This work was partly done by Nupur Kumari during the Adobe internship. The work is partly supported by Adobe Inc. \n","funding_links":[],"categories":["New Concept Learning","Python","其他_机器视觉"],"sub_categories":["网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadobe-research%2Fcustom-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadobe-research%2Fcustom-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadobe-research%2Fcustom-diffusion/lists"}