{"id":13487871,"url":"https://github.com/leeruibin/SPDInv","last_synced_at":"2025-03-27T23:31:58.185Z","repository":{"id":227007423,"uuid":"770159113","full_name":"leeruibin/SPDInv","owner":"leeruibin","description":"[ECCV2024] Source Prompt Disentangled Inversion for Boosting Image Editability with  Diffusion Models","archived":false,"fork":false,"pushed_at":"2024-07-04T10:46:28.000Z","size":8880,"stargazers_count":34,"open_issues_count":3,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-30T23:35:58.143Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leeruibin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-11T03:19:07.000Z","updated_at":"2024-10-29T03:55:07.000Z","dependencies_parsed_at":"2024-10-30T23:31:29.459Z","dependency_job_id":"0fd5dc5c-3d13-4045-acec-3983fb28ecd9","html_url":"https://github.com/leeruibin/SPDInv","commit_stats":null,"previous_names":["leeruibin/spdinv"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeruibin%2FSPDInv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeruibin%2FSPDInv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeruibin%2FSPDInv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeruibin%2FSPDInv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leeruibin","download_url":"https://codeload.github.com/leeruibin/SPDInv/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944020,"owners_count":20697945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:05.454Z","updated_at":"2025-03-27T23:31:53.141Z","avatar_url":"https://github.com/leeruibin.png","language":"Python","funding_links":[],"categories":["Diffusion Models Inversion"],"sub_categories":["Train-Free"],"readme":"## Source Prompt Disentangled Inversion for Boosting Image Editability with  Diffusion Models\n\n\u003ca href='http://arxiv.org/abs/2403.11105'\u003e\u003cimg src='https://img.shields.io/badge/arXiv-2403.11105-b31b1b.svg'\u003e\u003c/a\u003e \u0026nbsp;\u0026nbsp;\n\n\u003e[Ruibin Li](https://github.com/leeruibin)\u003csup\u003e1\u003c/sup\u003e | [Ruihuang Li](https://scholar.google.com/citations?user=8CfyOtQAAAAJ\u0026hl=zh-CN)\u003csup\u003e1\u003c/sup\u003e |[Song Guo](https://scholar.google.com/citations?user=Ib-sizwAAAAJ\u0026hl=en)\u003csup\u003e2\u003c/sup\u003e | [Lei Zhang](https://www4.comp.polyu.edu.hk/~cslzhang/)\u003csup\u003e1*\u003c/sup\u003e \u003cbr\u003e\n\u003e\u003csup\u003e1\u003c/sup\u003eThe Hong Kong Polytechnic University, \u003csup\u003e2\u003c/sup\u003eThe Hong Kong University of Science and Technology. \u003cbr\u003e\n\u003eIn ECCV2024\n\n## 🔎 Overview framework\n\nPipelines of different inversion methods in text-driven editing. (a) DDIM inversion inverts a real image to a latent noise code, but the inverted noise code often results in large gap of reconstruction $D_{Rec}$ with higher CFG parameters. (b) NTI optimizes the null-text embedding to narrow the gap of reconstruction $D_{Rec}$, while NPI further optimizes the speed of NTI. (c) DirectInv records the differences between the inversion feature and the reconstruction feature, and merges them back to achieve high-quality reconstruction. (d) Our SPDInv aims to minimize the gap of noise $D_{Noi}$, instead of $D_{Rec}$, which can reduce the impact of source prompt on the editing process and thus reduce the artifacts and inconsistent details encountered by the previous methods.\n\n![SPDInv](figures/methods.png)\n\n## ⚙️ Dependencies and Installation\n```\n## git clone this repository\ngit clone https://github.com/leeruibin/SPDInv.git\ncd SPDInv\n\n# create an environment with python \u003e= 3.8\nconda env create -f environment.yaml\nconda activate SPDInv\n```\n\n## 🚀 Quick Inference\n\n#### Run P2P with SPDInv\n\n```\npython run_SPDInv_P2P.py --input xxx --source [source prompt] --target [target prompt] --blended_word \"word1 word2\"\n```\n\n#### Run MasaCtrl with SPDInv\n```\npython run_SPDInv_MasaCtrl.py --input xxx --source [source prompt] --target [target prompt]\n```\n\n#### Run PNP with SPDInv\nTo run PNP, you should first upgrade diffusers to 0.17.1 by\n\n```\npip install diffusers==0.17.1\n```\nthen, you can run\n```\npython run_SPDInv_PNP.py --input xxx --source [source prompt] --target [target prompt]\n```\n\n#### Run ELITE with SPDInv\nFor ELITE, you should first download the pre-trained [global_mapper.pt](https://drive.google.com/drive/folders/1VkiVZzA_i9gbfuzvHaLH2VYh7kOTzE0x?usp=sharing) checkpoint provided by the ELITE, put it into the checkpoints folder.\n```\npython run_SPDInv_ELITE.py --input xxx --source [source prompt] --target [target prompt]\n```\n\n## 📷 Editing cases with P2P, MasaCtrl, PNP, ELITE\n## Editing cases with P2P\n\u003cdiv  align=\"center\"\u003e \u003cimg src=\"./figures/cases_P2P.jpg\" width = \"600\" alt=\"P2P\" align=center /\u003e \u003c/div\u003e\n\n## Editing cases with MasaCtrl\n\u003cdiv  align=\"center\"\u003e \u003cimg src=\"./figures/cases_MasaCtrl.jpg\" width = \"600\" alt=\"MasaCtrl\" align=center /\u003e \u003c/div\u003e\n\n## Editing cases with PNP\n\u003cdiv  align=\"center\"\u003e \u003cimg src=\"./figures/cases_PNP.jpg\" width = \"600\" alt=\"PNP\" align=center /\u003e \u003c/div\u003e\n\n## Editing cases with ELITE\n\u003cdiv  align=\"center\"\u003e \u003cimg src=\"./figures/cases_ELITE.jpg\" width = \"600\" alt=\"ELITE\" align=center /\u003e \u003c/div\u003e\n\n\n## Citation\n\n```\n@article{li2024source,\n  title={Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models},\n  author={Li, Ruibin and Li, Ruihuang and Guo, Song and Zhang, Lei},\n  booktitle={European Conference on Computer Vision},\n  year={2024}\n}\n```\n\n## Acknowledgements\n\nThis code is built on [diffusers](https://github.com/huggingface/diffusers/) version of [Stable Diffusion](https://github.com/CompVis/stable-diffusion).\n\nMeanwhile, the code is heavily based on the [Prompt-to-Prompt](https://github.com/google/prompt-to-prompt), [Null-Text Inversion](https://github.com/google/prompt-to-prompt), [MasaCtrl](https://github.com/TencentARC/MasaCtrl), [ProxEdit](https://github.com/phymhan/prompt-to-prompt), [ELITE](https://github.com/csyxwei/ELITE), [Plug-and-Play](https://github.com/MichalGeyer/plug-and-play), [DirectInversion](https://github.com/cure-lab/PnPInversion), thanks to all the contributors!.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleeruibin%2FSPDInv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleeruibin%2FSPDInv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleeruibin%2FSPDInv/lists"}