{"id":26048566,"url":"https://github.com/joaolages/diffusers-interpret","last_synced_at":"2025-04-05T19:10:20.551Z","repository":{"id":58144957,"uuid":"529217421","full_name":"JoaoLages/diffusers-interpret","owner":"JoaoLages","description":"Diffusers-Interpret 🤗🧨🕵️‍♀️: Model explainability for 🤗 Diffusers. Get explanations for your generated images.","archived":false,"fork":false,"pushed_at":"2022-10-05T14:24:19.000Z","size":81301,"stargazers_count":274,"open_issues_count":2,"forks_count":14,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-29T18:06:56.538Z","etag":null,"topics":["computer-vision","deep-learning","diffusers","diffusion","explainable-ai","image-generation","interpretability","model-explainability","primary-attributions","pytorch","text2image","transformers"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JoaoLages.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-08-26T10:42:44.000Z","updated_at":"2025-03-10T06:18:41.000Z","dependencies_parsed_at":"2022-09-06T12:40:17.331Z","dependency_job_id":null,"html_url":"https://github.com/JoaoLages/diffusers-interpret","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoaoLages%2Fdiffusers-interpret","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoaoLages%2Fdiffusers-interpret/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoaoLages%2Fdiffusers-interpret/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JoaoLages%2Fdiffusers-interpret/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JoaoLages","download_url":"https://codeload.github.com/JoaoLages/diffusers-interpret/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247386262,"owners_count":20930619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","diffusers","diffusion","explainable-ai","image-generation","interpretability","model-explainability","primary-attributions","pytorch","text2image","transformers"],"created_at":"2025-03-08T00:26:27.224Z","updated_at":"2025-04-05T19:10:20.523Z","avatar_url":"https://github.com/JoaoLages.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\r\n\r\n# Diffusers-Interpret 🤗🧨🕵️‍♀️\r\n\r\n![PyPI Latest Package Version](https://img.shields.io/pypi/v/diffusers-interpret?logo=pypi\u0026style=flat\u0026color=orange) ![GitHub License](https://img.shields.io/github/license/JoaoLages/diffusers-interpret?logo=github\u0026style=flat\u0026color=green) \r\n\r\n`diffusers-interpret` is a model explainability tool built on top of [🤗 Diffusers](https://github.com/huggingface/diffusers)\r\n\u003c/div\u003e\r\n\r\n## Installation\r\n\r\nInstall directly from PyPI:\r\n\r\n    pip install --upgrade diffusers-interpret\r\n\r\n## Usage\r\n\r\nLet's see how we can interpret the **[new 🎨🎨🎨 Stable Diffusion](https://github.com/huggingface/diffusers#new--stable-diffusion-is-now-fully-compatible-with-diffusers)!**\r\n\r\n1. [Explanations for StableDiffusionPipeline](#explanations-for-stablediffusionpipeline)\r\n2. [Explanations for StableDiffusionImg2ImgPipeline](#explanations-for-stablediffusionimg2imgpipeline)\r\n3. [Explanations for StableDiffusionInpaintPipeline](#explanations-for-stablediffusioninpaintpipeline)\r\n\r\n### Explanations for StableDiffusionPipeline\r\n[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JoaoLages/diffusers-interpret/blob/main/notebooks/stable_diffusion_example_colab.ipynb)\r\n\r\n```python\r\nimport torch\r\nfrom diffusers import StableDiffusionPipeline\r\nfrom diffusers_interpret import StableDiffusionPipelineExplainer\r\n\r\npipe = StableDiffusionPipeline.from_pretrained(\r\n    \"CompVis/stable-diffusion-v1-4\", \r\n    use_auth_token=True,\r\n    revision='fp16',\r\n    torch_dtype=torch.float16\r\n).to('cuda')\r\n\r\n# optional: reduce memory requirement with a speed trade off \r\npipe.enable_attention_slicing()\r\n\r\n# pass pipeline to the explainer class\r\nexplainer = StableDiffusionPipelineExplainer(pipe)\r\n\r\n# generate an image with `explainer`\r\nprompt = \"A cute corgi with the Eiffel Tower in the background\"\r\nwith torch.autocast('cuda'):\r\n    output = explainer(\r\n        prompt, \r\n        num_inference_steps=15\r\n    )\r\n```\r\n\r\nIf you are having GPU memory problems, try reducing `n_last_diffusion_steps_to_consider_for_attributions`, `height`, `width` and/or `num_inference_steps`.\r\n```\r\noutput = explainer(\r\n    prompt, \r\n    num_inference_steps=15,\r\n    height=448,\r\n    width=448,\r\n    n_last_diffusion_steps_to_consider_for_attributions=5\r\n)\r\n```\r\n\r\nYou can completely deactivate token/pixel attributions computation by passing `n_last_diffusion_steps_to_consider_for_attributions=0`.  \r\n\r\nGradient checkpointing also reduces GPU usage, but makes computations a bit slower:\r\n```\r\nexplainer = StableDiffusionPipelineExplainer(pipe, gradient_checkpointing=True)\r\n```\r\n\r\nTo see the final generated image:\r\n```python\r\noutput.image\r\n```\r\n\r\n![](assets/corgi_eiffel_tower.png)\r\n\r\nYou can also check all the images that the diffusion process generated at the end of each step:\r\n```python\r\noutput.all_images_during_generation.show()\r\n```\r\n![](assets/image_slider_cropped.gif)\r\n\r\nTo analyse how a token in the input `prompt` influenced the generation, you can study the token attribution scores:\r\n```python\r\n\u003e\u003e\u003e output.token_attributions # (token, attribution)\r\n[('a', 1063.0526),\r\n ('cute', 415.62888),\r\n ('corgi', 6430.694),\r\n ('with', 1874.0208),\r\n ('the', 1223.2847),\r\n ('eiffel', 4756.4556),\r\n ('tower', 4490.699),\r\n ('in', 2463.1294),\r\n ('the', 655.4624),\r\n ('background', 3997.9395)]\r\n```\r\n\r\nOr their computed normalized version, in percentage:\r\n```python\r\n\u003e\u003e\u003e output.token_attributions.normalized # (token, attribution_percentage)\r\n[('a', 3.884),\r\n ('cute', 1.519),\r\n ('corgi', 23.495),\r\n ('with', 6.847),\r\n ('the', 4.469),\r\n ('eiffel', 17.378),\r\n ('tower', 16.407),\r\n ('in', 8.999),\r\n ('the', 2.395),\r\n ('background', 14.607)]\r\n```\r\n\r\nOr plot them!\r\n```python\r\noutput.token_attributions.plot(normalize=True)\r\n```\r\n![](assets/token_attributions_1.png)\r\n\r\n\r\n`diffusers-interpret` also computes these token/pixel attributions for generating a particular part of the image. \r\n\r\nTo do that, call `explainer` with a particular 2D bounding box defined in `explanation_2d_bounding_box`:\r\n\r\n```python\r\nwith torch.autocast('cuda'):\r\n    output = explainer(\r\n        prompt, \r\n        num_inference_steps=15, \r\n        explanation_2d_bounding_box=((70, 180), (400, 435)), # (upper left corner, bottom right corner)\r\n    )\r\noutput.image\r\n```\r\n![](assets/corgi_eiffel_tower_box_1.png)\r\n\r\nThe generated image now has a \u003cspan style=\"color:red\"\u003e **red bounding box** \u003c/span\u003e to indicate the region of the image that is being explained.\r\n\r\nThe attributions are now computed only for the area specified in the image.\r\n\r\n```python\r\n\u003e\u003e\u003e output.token_attributions.normalized # (token, attribution_percentage)\r\n[('a', 1.891),\r\n ('cute', 1.344),\r\n ('corgi', 23.115),\r\n ('with', 11.995),\r\n ('the', 7.981),\r\n ('eiffel', 5.162),\r\n ('tower', 11.603),\r\n ('in', 11.99),\r\n ('the', 1.87),\r\n ('background', 23.05)]\r\n```\r\n\r\n### Explanations for StableDiffusionImg2ImgPipeline\r\n[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JoaoLages/diffusers-interpret/blob/main/notebooks/stable_diffusion_img2img_example.ipynb)\r\n\r\n```python\r\nimport torch\r\nimport requests\r\nfrom PIL import Image\r\nfrom io import BytesIO\r\nfrom diffusers import StableDiffusionImg2ImgPipeline\r\nfrom diffusers_interpret import StableDiffusionImg2ImgPipelineExplainer\r\n\r\n\r\npipe = StableDiffusionImg2ImgPipeline.from_pretrained(\r\n    \"CompVis/stable-diffusion-v1-4\", \r\n    use_auth_token=True,\r\n).to('cuda')\r\n\r\nexplainer = StableDiffusionImg2ImgPipelineExplainer(pipe)\r\n\r\nprompt = \"A fantasy landscape, trending on artstation\"\r\n\r\n# let's download an initial image\r\nurl = \"https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg\"\r\n\r\nresponse = requests.get(url)\r\ninit_image = Image.open(BytesIO(response.content)).convert(\"RGB\")\r\ninit_image = init_image.resize((448, 448))\r\n\r\nwith torch.autocast('cuda'):\r\n    output = explainer(\r\n        prompt=prompt, init_image=init_image, strength=0.75\r\n    )\r\n```\r\n\r\n`output` will have all the properties that were presented for [StableDiffusionPipeline](#explanations-for-stablediffusionpipeline).\r\nFor example, to see the gif version of all the images during generation:\r\n```python\r\noutput.all_images_during_generation.gif()\r\n```\r\n![](assets/img2img_1.gif)\r\n\r\nAdditionally, it is also possible to visualize pixel attributions of the input image as a saliency map:\r\n```python\r\noutput.input_saliency_map.show()\r\n```\r\n![](assets/pixel_attributions_1.png)\r\n\r\nor access their values directly:\r\n```python\r\n\u003e\u003e\u003e output.pixel_attributions\r\narray([[ 1.2714844 ,  4.15625   ,  7.8203125 , ...,  2.7753906 ,\r\n         2.1308594 ,  0.66552734],\r\n       [ 5.5078125 , 11.1953125 ,  4.8125    , ...,  5.6367188 ,\r\n         6.8828125 ,  3.0136719 ],\r\n       ...,\r\n       [ 0.21386719,  1.8867188 ,  2.2109375 , ...,  3.0859375 ,\r\n         2.7421875 ,  0.7871094 ],\r\n       [ 0.85791016,  0.6694336 ,  1.71875   , ...,  3.8496094 ,\r\n         1.4589844 ,  0.5727539 ]], dtype=float32)\r\n```\r\nor the normalized version:\r\n```python\r\n\u003e\u003e\u003e output.pixel_attributions.normalized \r\narray([[7.16054201e-05, 2.34065039e-04, 4.40411852e-04, ...,\r\n        1.56300011e-04, 1.20002325e-04, 3.74801020e-05],\r\n       [3.10180156e-04, 6.30479713e-04, 2.71022669e-04, ...,\r\n        3.17439699e-04, 3.87615233e-04, 1.69719147e-04],\r\n       ...,\r\n       [1.20442292e-05, 1.06253210e-04, 1.24512037e-04, ...,\r\n        1.73788882e-04, 1.54430119e-04, 4.43271674e-05],\r\n       [4.83144104e-05, 3.77000870e-05, 9.67938031e-05, ...,\r\n        2.16796136e-04, 8.21647482e-05, 3.22554370e-05]], dtype=float32)\r\n```\r\n\r\n**Note:** Passing `explanation_2d_bounding_box` to the `explainer` will also change these values to explain a specific part of the **output** image. \r\n\u003cins\u003eThe attributions are always calculated for the model's input (image and text) with respect to the output image.\u003c/ins\u003e\r\n\r\n### Explanations for StableDiffusionInpaintPipeline\r\n[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JoaoLages/diffusers-interpret/blob/main/notebooks/stable_diffusion_inpaint_example.ipynb)\r\n\r\nSame as [StableDiffusionImg2ImgPipeline](#explanations-for-stablediffusionimg2imgpipeline), but now we also pass a `mask_image` argument to `explainer`.\r\n\r\n```python\r\nimport torch\r\nimport requests\r\nfrom PIL import Image\r\nfrom io import BytesIO\r\nfrom diffusers import StableDiffusionInpaintPipeline\r\nfrom diffusers_interpret import StableDiffusionInpaintPipelineExplainer\r\n\r\n\r\ndef download_image(url):\r\n    response = requests.get(url)\r\n    return Image.open(BytesIO(response.content)).convert(\"RGB\")\r\n\r\n\r\npipe = StableDiffusionInpaintPipeline.from_pretrained(\r\n    \"CompVis/stable-diffusion-v1-4\", \r\n    use_auth_token=True,\r\n).to('cuda')\r\n\r\nexplainer = StableDiffusionInpaintPipelineExplainer(pipe)\r\n\r\nprompt = \"a cat sitting on a bench\"\r\n\r\nimg_url = \"https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png\"\r\nmask_url = \"https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png\"\r\n\r\ninit_image = download_image(img_url).resize((448, 448))\r\nmask_image = download_image(mask_url).resize((448, 448))\r\n\r\nwith torch.autocast('cuda'):\r\n    output = explainer(\r\n        prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75\r\n    )\r\n```\r\n\r\n`output` will have all the properties that were presented for [StableDiffusionImg2ImgPipeline](#explanations-for-stablediffusionimg2imgpipeline) and [StableDiffusionPipeline](#explanations-for-stablediffusionpipeline).  \r\nFor example, to see the gif version of all the images during generation:\r\n```python\r\noutput.all_images_during_generation.gif()\r\n```\r\n![](assets/inpaint_1.gif)\r\n\r\nThe only difference in `output` now, is that we can now see the masked part of the image:\r\n```python\r\noutput.input_saliency_map.show()\r\n```\r\n![](assets/pixel_attributions_inpaint_1.png)\r\n\r\nCheck other functionalities and more implementation examples in [here](https://github.com/JoaoLages/diffusers-interpret/blob/main/notebooks/).\r\n\r\n## Future Development\r\n- [x] ~~Add interactive display of all the images that were generated in the diffusion process~~\r\n- [x] ~~Add explainer for StableDiffusionImg2ImgPipeline~~\r\n- [x] ~~Add explainer for StableDiffusionInpaintPipeline~~\r\n- [ ] Add attentions visualization \r\n- [ ] Add unit tests\r\n- [ ] Website for documentation\r\n- [ ] Do not require another generation every time the `explanation_2d_bounding_box` argument is changed\r\n- [ ] Add interactive bounding-box and token attributions visualization\r\n- [ ] Add more explainability methods\r\n\r\n## Contributing\r\nFeel free to open an [Issue](https://github.com/JoaoLages/diffusers-interpret/issues) or create a [Pull Request](https://github.com/JoaoLages/diffusers-interpret/pulls) and let's get started 🚀\r\n\r\n## Credits\r\n\r\nA special thanks to:\r\n- [@andrewizbatista](https://github.com/andrewizbatista) for creating a great [image slider](https://github.com/JoaoLages/diffusers-interpret/pull/1) to show all the generated images during diffusion! 💪 \r\n- [@TomPham97](https://github.com/TomPham97) for README improvements, the [GIF visualization](https://github.com/JoaoLages/diffusers-interpret/pull/9) and the [token attributions plot](https://github.com/JoaoLages/diffusers-interpret/pull/13) 😁\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoaolages%2Fdiffusers-interpret","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoaolages%2Fdiffusers-interpret","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoaolages%2Fdiffusers-interpret/lists"}