{"id":23446090,"url":"https://github.com/wangkai930418/dpl","last_synced_at":"2025-04-13T15:11:47.273Z","repository":{"id":196867726,"uuid":"697294471","full_name":"wangkai930418/DPL","owner":"wangkai930418","description":"Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing (NeurIPS 2023)","archived":false,"fork":false,"pushed_at":"2024-05-15T16:03:51.000Z","size":6253,"stargazers_count":102,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-27T06:05:49.403Z","etag":null,"topics":["diffusion","diffusion-models","image-editing","inversion","stable-diffusion","text-guided-image-editing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wangkai930418.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-27T12:45:44.000Z","updated_at":"2025-03-26T04:50:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"7d56f153-c56f-45a7-a351-3d28b28d53a8","html_url":"https://github.com/wangkai930418/DPL","commit_stats":null,"previous_names":["wangkai930418/dpl"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangkai930418%2FDPL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangkai930418%2FDPL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangkai930418%2FDPL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wangkai930418%2FDPL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wangkai930418","download_url":"https://codeload.github.com/wangkai930418/DPL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248732489,"owners_count":21152852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion","diffusion-models","image-editing","inversion","stable-diffusion","text-guided-image-editing"],"created_at":"2024-12-23T20:29:38.600Z","updated_at":"2025-04-13T15:11:47.243Z","avatar_url":"https://github.com/wangkai930418.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing [(Neurips 2023)](https://neurips.cc/virtual/2023/poster/72801) \n\n#### [Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing](https://arxiv.org/abs/2309.15664) \n\n#### [Kai Wang](https://scholar.google.com/citations?user=j14vd0wAAAAJ), [Fei Yang](https://scholar.google.com/citations?user=S1gksNwAAAAJ\u0026hl=en), [Shiqi Yang](https://www.shiqiyang.xyz/), [Muhammad Atif Butt](https://scholar.google.com/citations?user=vf7PeaoAAAAJ\u0026hl=en), [Joost van de Weijer](https://scholar.google.com/citations?user=Gsw2iUEAAAAJ\u0026hl=en)\n\n![dpl](docs/comp_method_editing.png)\n\n## Requirments\n\nThe required packages are listed in *\"torch2.yml\"*\n\n### 1. Get captions\n\nFor images without captions, we used the [BLIP](https://huggingface.co/docs/transformers/main/model_doc/blip) model to generate image captions. You can change it into [BLIP-2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) for better performance.\n\n```\npython _1_BLIP_caption.py --input_image IMAGE_FILE --results_folder ./output\n```\n\n### 2. DDIM inversion and feature visualizations\n\nWe apply DDIM inversion to get the initial noise and also visualize all attentions and features with clustering or PCA visualization as we showed in the paper.\n\n```\npython _2_DDIM_inv.py --input_image IMAGE_FILE --results_folder ./output\n```\n\n### 3. DPL inversion\n\nWe offer the bash file to run our DPL inversion as below.\n\n```\nbash  ./DPL.sh\n```\n\n#### Seg/Det DPL inversion (optional)\nIf you already have the segmentation maps or detection boxes, then we also offer the other choices for the DPL inversion as shown in *\"_3_dpl_det_inv.py\"* and *\"_3_dpl_seg_inv*.py\"\n\n### 4. P2P editing\n\nWe release our customized P2P editing code in *\"_4_image_edit.py\"*\n\n\n### 5. Other comparison methods\n\nFor comparison, there are some methods already existing in the *diffusers*, we include them over here by naming as *\"comp_XXX.py\"*.\n\n\n## Method Details\n\n![dpl](docs/method_v5.png)\n\n## NOTE\n\n- The best hyperparameters may vary for each image, we recommend to explore it for your usage. Actually, after the *\"_2_DDIM_inv.py\"*, we already save the cross-attention maps. If they have already good qualities, our method *DPL* is not necessary.\n\n\n- The editing quality cannot be ensured even with perfect cross-attention maps, we will make it as our future job.\n\n\n\n\n## Supplementary Material\n\n[Supplementary Material](supplementary.pdf) is over here.\n\n## TODO\n\nFulfill this repo with more bash files and example images in the future.\n\nMore experimental images are shared via the [Google Drive.](https://drive.google.com/file/d/1o2tMKMM8L04VzTfnCfjW-AiuE1WQJ5GH/view?usp=sharing)\n\n\n## *LocInv* (CVPR 2024 AI4CC workshop)\n\nIt is an enhanced version of DPL with localization priors, including the bounding boxes or segmentation masks obtained from pretrained segmentation/detection models. The corresponding codes are shown in *\"_3_dpl_seg_inv.py\"* and *\"_3_dpl_det_inv.py\"*.\n\n## References\nIf you find this repo helpful, do not hesitate to cite our papers. Thanks!\n\n```\n@article{wang2023DPL,\n  title={Dynamic prompt learning: Addressing cross-attention leakage for text-based image editing},\n  author={Wang, Kai and Yang, Fei and Yang, Shiqi and Butt, Muhammad Atif and van de Weijer, Joost},\n  journal={Advances in Neural Information Processing Systems},\n  volume={36},\n  year={2023}\n}\n\n@article{tang2024locinv,\n  title={LocInv: Localization-aware Inversion for Text-Guided Image Editing},\n  author={Tang, Chuanming and Wang, Kai and Yang, Fei and van de Weijer, Joost},\n  journal={CVPR 2024 AI4CC workshop},\n  year={2024}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangkai930418%2Fdpl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwangkai930418%2Fdpl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangkai930418%2Fdpl/lists"}