{"id":13488101,"url":"https://github.com/mkshing/e4t-diffusion","last_synced_at":"2026-01-17T00:57:22.647Z","repository":{"id":143165419,"uuid":"608130371","full_name":"mkshing/e4t-diffusion","owner":"mkshing","description":"Implementation of Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models","archived":false,"fork":false,"pushed_at":"2023-04-23T15:41:14.000Z","size":4239,"stargazers_count":321,"open_issues_count":12,"forks_count":24,"subscribers_count":12,"default_branch":"main","last_synced_at":"2024-10-30T23:36:24.427Z","etag":null,"topics":["deep-learning","diffusion-models","stable-diffusion","text-to-image"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2302.12228","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkshing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-01T11:38:45.000Z","updated_at":"2024-10-23T02:13:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"a77becde-2678-4979-a67d-7234f1c5cc12","html_url":"https://github.com/mkshing/e4t-diffusion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkshing%2Fe4t-diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkshing%2Fe4t-diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkshing%2Fe4t-diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkshing%2Fe4t-diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkshing","download_url":"https://codeload.github.com/mkshing/e4t-diffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944020,"owners_count":20697945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","diffusion-models","stable-diffusion","text-to-image"],"created_at":"2024-07-31T18:01:09.649Z","updated_at":"2026-01-17T00:57:22.639Z","avatar_url":"https://github.com/mkshing.png","language":"Python","funding_links":[],"categories":["New Concept Learning","Python"],"sub_categories":[],"readme":"# E4T-diffusion\n\u003ca href=\"https://colab.research.google.com/gist/mkshing/d16cb15e82ac7fd2f5dd2e83b00896a3/e4t-diffusion.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e\n\nAn implementation of [Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models](https://arxiv.org/abs/2302.12228) by using d🧨ffusers. \n\nMy summary tweet is found [here](https://twitter.com/mk1stats/status/1630891691623448576).\n\n![paper](assets/e4t-paper.png)\n\n## News \n### 2023.3.30\n- Release the current-best pre-trained model, trained on CelebA-HQ+FFHQ. Please see [Model Zoo](#model-zoo) for more information.\n\n## Installation\n```\n$ git clone https://github.com/mkshing/e4t-diffusion.git\n$ cd e4t-diffusion\n$ pip install -r requirements.txt\n```\n\n\n## Model Zoo\n- **[e4t-diffusion-ffhq-celebahq-v1](https://huggingface.co/mshing/e4t-diffusion-ffhq-celebahq-v1):** a pre-trained model for face trained on FFHQ+CelebA-HQ. To get better results, I used [Stable unCLIP](https://github.com/Stability-AI/stablediffusion/blob/main/doc/UNCLIP.MD) as data augmentation.  \n  ![e4t-diffusion-ffhq-celebahq-v1](assets/e4t-diffusion-ffhq-celebahq-v1-log.png)\n  logs at the pre-training phase\n  ![yann-in-the-beach](assets/yann-in-the-beach.png)\n  \"a photo of *s in the beach\" after domain-tuning on a [Yann LeCun's photo](https://engineering.nyu.edu/sites/default/files/styles/square_large_default_1x/public/2018-06/yann-lecun.jpg?h=65172a10\u0026itok=NItwgG8z)\n  \n## Pre-training\nYou need a domain-specific E4T pre-trained model corresponding to your target image. \nIf your target image is your face, you need to pre-train on a large face image dataset. \nOr, if you have an artistic image, you might want to train on WikiArt like so.  \n```\naccelerate launch pretrain_e4t.py \\\n  --pretrained_model_name_or_path=\"CompVis/stable-diffusion-v1-4\" \\\n  --clip_model_name_or_path=\"ViT-H-14::laion2b_s32b_b79k\" \\\n  --domain_class_token=\"art\" \\\n  --placeholder_token=\"*s\" \\\n  --prompt_template=\"art\" \\\n  --save_sample_prompt=\"a photo of the *s,a photo of the *s in monet style\" \\\n  --reg_lambda=0.01 \\\n  --domain_embed_scale=0.1 \\\n  --output_dir=\"pretrained-wikiart\" \\\n  --train_image_dataset=\"Artificio/WikiArt\" \\\n  --iterable_dataset \\\n  --resolution=512 \\\n  --train_batch_size=16 \\\n  --learning_rate=1e-6 --scale_lr \\\n  --checkpointing_steps=10000 \\\n  --log_steps=1000 \\\n  --max_train_steps=100000 \\\n  --unfreeze_clip_vision \\\n  --mixed_precision=\"fp16\" \\\n  --enable_xformers_memory_efficient_attention \n```\n\n## Domain-tuning\nWhen you get a pre-trained model, you are ready for domain tuning! \nIn this step, all parameters in addition to UNet itself (optionally text encoder) are trained. Unlike Dreambooth, E4T needs only \u003c15 training steps according to the paper.\n\n```\naccelerate launch tuning_e4t.py \\\n  --pretrained_model_name_or_path=\"e4t pre-trained model path\" \\\n  --prompt_template=\"a photo of {placeholder_token}\" \\\n  --reg_lambda=0.1 \\\n  --output_dir=\"path-to-save-model\" \\\n  --train_image_path=\"image path or url\" \\\n  --resolution=512 \\\n  --train_batch_size=16 \\\n  --learning_rate=1e-6 --scale_lr \\\n  --max_train_steps=30 \\\n  --mixed_precision=\"fp16\" \\\n  --enable_xformers_memory_efficient_attention\n```\n\n## Inference\nOnce your domain-tuning is done, you can do inference by including your placeholder token in the prompt. \n\n```\npython inference.py \\\n  --pretrained_model_name_or_path \"e4t pre-trained model path\" \\\n  --prompt \"Times square in the style of *s\" \\\n  --num_images_per_prompt 3 \\\n  --scheduler_type \"ddim\" \\\n  --image_path_or_url \"same image path or url as domain tuning\" \\\n  --num_inference_steps 50 \\\n  --guidance_scale 7.5\n```\n\n\n## Acknowledgments\nI would like to thank [Stability AI](https://stability.ai/) for providing the computer resources to test this code and train pre-trained models.\n\n## Citation\n\n```bibtex\n@misc{https://doi.org/10.48550/arXiv.2302.12228,\n    url       = {https://arxiv.org/abs/2302.12228},\n    author    = {Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or},  \n    title     = {Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models},\n    publisher = {arXiv},\n    year      = {2023},\n    copyright = {arXiv.org perpetual, non-exclusive license}\n}\n```\n\n## TODO\n- [x] Pre-training\n- [x] Domain-tuning\n- [x] Inference\n- [x] Data augmentation by [stable unclip](https://github.com/Stability-AI/stablediffusion)\n- [ ] Use an off-the-shelf face segmentation network for human face domain.\n   \u003e Finally, we find that for the human face domain, it is helpful to\nuse an off-the-shelf face segmentation network [Deng et al. 2019]\nto mask the diffusion loss at this stage.\n- [ ] Support [ToMe](https://github.com/dbolya/tomesd) for more efficient training ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkshing%2Fe4t-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkshing%2Fe4t-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkshing%2Fe4t-diffusion/lists"}