{"id":14543589,"url":"https://github.com/FusionBrainLab/Guide-and-Rescale","last_synced_at":"2025-09-03T11:32:27.076Z","repository":{"id":252788240,"uuid":"826730369","full_name":"FusionBrainLab/Guide-and-Rescale","owner":"FusionBrainLab","description":"Official Implementation for \"Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing\"","archived":false,"fork":false,"pushed_at":"2024-09-12T09:01:43.000Z","size":21615,"stargazers_count":50,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-09-12T19:41:33.626Z","etag":null,"topics":["diffusion-model","guide-and-rescale","image-editing","text-to-image"],"latest_commit_sha":null,"homepage":"https://fusionbrainlab.github.io/Guide-and-Rescale/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FusionBrainLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-10T09:09:13.000Z","updated_at":"2024-09-11T10:21:38.000Z","dependencies_parsed_at":"2024-09-11T13:16:53.643Z","dependency_job_id":null,"html_url":"https://github.com/FusionBrainLab/Guide-and-Rescale","commit_stats":null,"previous_names":["airi-institute/guide-and-rescale","fusionbrainlab/guide-and-rescale"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FusionBrainLab%2FGuide-and-Rescale","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FusionBrainLab%2FGuide-and-Rescale/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FusionBrainLab%2FGuide-and-Rescale/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FusionBrainLab%2FGuide-and-Rescale/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FusionBrainLab","download_url":"https://codeload.github.com/FusionBrainLab/Guide-and-Rescale/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231878426,"owners_count":18439887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-model","guide-and-rescale","image-editing","text-to-image"],"created_at":"2024-09-06T01:01:30.469Z","updated_at":"2025-09-03T11:32:27.037Z","avatar_url":"https://github.com/FusionBrainLab.png","language":"Jupyter Notebook","readme":"# Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing (ECCV 2024)\n\n\u003ca href=\"https://arxiv.org/abs/2409.01322\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2409.01322-b31b1b.svg\" height=22.5\u003e\u003c/a\u003e\n\u003ca href=\"https://huggingface.co/spaces/AIRI-Institute/Guide-and-Rescale\"\u003e\u003cimg src=\"https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md.svg\" height=22.5\u003e\u003c/a\u003e\n\u003ca href=\"https://colab.research.google.com/drive/1noKOOcDBBL_m5_UqU15jBBqiM8piLZ1O?usp=sharing\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=22.5\u003e\u003c/a\u003e\n[![License](https://img.shields.io/github/license/AIRI-Institute/al_toolbox)](./LICENSE)\n\n\u003eDespite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits, or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-specific appearance of the input image. Most of these approaches utilize source image information via intermediate feature caching which is inserted in generation process as itself. However, such technique produce feature misalignment of the model that leads to inconsistent results. \nWe propose a novel approach that is built upon modified diffusion sampling process via guidance mechanism. In this work, we explore self-guidance technique to preserve the overall structure of the input image and its local regions appearance that should not be edited. In particular, we explicitly introduce layout preserving energy functions that are aimed to save local and global structures of the source image. Additionally, we propose a noise rescaling mechanism that allows to preserve noise distribution by balancing the norms of classifier-free guidance and our proposed guiders during generation. It leads to more consistent and better editing results. Such guiding approach does not require fine-tuning diffusion model and exact inversion process. As a result, the proposed method provides a fast and high quality editing mechanism.\nIn our experiments, we show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing which is more preferable by the human and also achieves a better trade-off between editing quality and preservation of the original image.\n\u003e\n\n![image](docs/teaser_image.png)\n\n## Setup\n\nThis code uses a pre-trained [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.25.1/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline) from [Diffusers](https://github.com/huggingface/diffusers#readme) library. We ran our code with Python 3.8.5, PyTorch 2.3.0, Diffuser 0.17.1 on NVIDIA A100 GPU with 40GB RAM.\n\nIn order to setup the environment, run:\n```\nconda env create -f sd_env.yaml\n```\nConda environment `ldm` will be created and you can use it.\n\n\n## Quickstart\n\nWe provide examples of applying our pipeline to real image editing in Colab \u003ca href=\"https://colab.research.google.com/drive/1noKOOcDBBL_m5_UqU15jBBqiM8piLZ1O?usp=sharing\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=22.5\u003e\u003c/a\u003e. \n\nYou can try Grardio demo in HF Spaces \u003ca href=\"https://huggingface.co/spaces/AIRI-Institute/Guide-and-Rescale\"\u003e\u003cimg src=\"https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md.svg\" height=22.5\u003e\u003c/a\u003e.\n\nWe also provide [a jupyter notebook](example_notebooks/guide_and_rescale.ipynb) to try Guide-and-Rescale pipeline on your own server.\n\n## Method Diagram\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/diagram.png\" alt=\"Diagram\"/\u003e\n  \u003cbr\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n    Overall scheme of the proposed method Guide-and-Rescale. First, our method uses a classic ddim inversion of the source real image. Then the method performs real image editing via the classical denoising process. For every denoising step the noise term is modified by guider that utilizes latents $z_t$ from the current generation process and time-aligned ddim latents $z^*_t$.\n\u003c/p\u003e\n\n\n## Guiders\n\nIn our work we propose specific **guiders**, i.e. guidance signals suitable for editing. The code for these guiders can be found in [diffusion_core/guiders/opt_guiders.py](diffusion_core/guiders/opt_guiders.py).\n\nEvery guider is defined as a separate class, that inherits from the parent class `BaseGuider`. A template for defining a new guider class looks as follows:\n\n```\nclass SomeGuider(BaseGuider):\n    patched: bool\n    forward_hooks: list\n    \n    def [grad_fn or calc_energy](self, data_dict):\n        ...\n\n    def model_patch(self, model):\n        ...\n    \n    def single_output_clear(self):\n        ...\n```\n\n### grad_fn or calc_energy\n\nThe `BaseGuider` class contains a property `grad_guider`. This property is `True`, when the guider does not require any backpropagation over its outputs for retrieving the gradient w.r.t. the current latent (for example, as in classifier-free guidance). In this case, the child class contains a function `grad_fn`, where the gradient w.r.t. the current latent is estimated algorithmically.\n\nWhen the gradient has to be estimated with backpropagation and `grad_guider` is `False` (for example, as when using the norm of the difference of attention maps for guidance), the child class contains a function `calc_energy`, where the desired energy function output is calculated. This output is further used for backpropagation.\n\nThe `grad_fn` and `calc_energy` functions receive a dictionary (`data_dict`) as input. In this dictionary we store all objects (the diffusion model instance, prompts, current latent, etc.) that might be usefull for the guiders in the current pipeline.\n\n### model_patch and patched\n\nWhen the guider requires outputs of intermediate layers of the diffusion model to estimate the energy function/gradient, we define a function `model_patch` in this guider's class and set property `patched` equal `True`. We will further refer to such guiders as *patched guiders*.\n\nThis function patches the desired layers of the diffusion model, an retrieves the necesarry output from these layers. This output is then stored in the property `output` of the guider class object. This way it can be accessed by the editing pipeline an stored in `data_dict` for further use in `calc_energy`/`grad_fn` functions.\n\n### forward_hooks\n\nIn the editing pipeline we conduct 4 diffusion model forward passes:\n\n- unconditional, from the current latent $z_t$\n- `cur_inv`: conditional on the initial prompt, from the current latent $z_t$\n- `inv_inv`: conditional on the initial prompt, from the corresponding inversion latent $z^*_t$\n- `cur_trg`: conditional on the prompt describing the editing result, from the current latent $z_t$\n\nWe store the unconditional prediction in `data_dict`, as well as the ouputs of `cur_inv` and `cur_trg` forward passes for further use in classifier-free guidance.\n\nHowever, when the guider is patched, we also have its `output` to store in `data_dict`. In `forward_hooks` property of the guider class we define the list of forward passes (from the range: `cur_inv`, `inv_inv`, `cur_trg`), for which we need to store the `output`. \n\nAfter the specific forward pass is conducted we can access the `output` of the guider and store it in `data_dict`, if the forward pass is listed in `forward_hooks`. We store it with a key, specifying the current forward pass.\n\nThis way we can avoid storing unnecesary `output`s in `data_dict`, as well as distinguish `output`s from different forward passes by their keys.\n\n\n### single_output_clear\n\nThis is only relevant for patched guiders.\n\nWhen the data from the `output` property of the guiders class object is stored in `data_dict`, we need to empty the `output` to avoid exceeding memory limit. For this purpose we define a `single_output_clear` function. It returns an empty `output`, for example `None`, or an empty list `[]`.\n\n## References \u0026 Acknowledgments\n\nThe repository was started from [Prompt-to-Prompt](https://github.com/google/prompt-to-prompt/). \n\n## Citation\n\nIf you use this code for your research, please cite our paper:\n```\n@article{titov2024guideandrescale\n  title={Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing},\n  author={Vadim Titov and Madina Khalmatova and Alexandra Ivanova and Dmitry Vetrov and Aibek Alanov},\n  journal={arXiv preprint arXiv:2409.01322},\n  year={2024}\n}\n```\n","funding_links":[],"categories":["Text Guided Image Editing"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFusionBrainLab%2FGuide-and-Rescale","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFusionBrainLab%2FGuide-and-Rescale","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFusionBrainLab%2FGuide-and-Rescale/lists"}