{"id":18113069,"url":"https://github.com/mingukkang/elatentlpips","last_synced_at":"2025-04-04T11:17:00.096Z","repository":{"id":257810106,"uuid":"847865943","full_name":"mingukkang/elatentlpips","owner":"mingukkang","description":"Author's Implementation for E-LatentLPIPS","archived":false,"fork":false,"pushed_at":"2024-11-05T15:35:47.000Z","size":14514,"stargazers_count":139,"open_issues_count":3,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-28T10:08:46.312Z","etag":null,"topics":["deep-learning","diffusion-distillation","latent-diffusion-models","one-step-diffusion-model","perceptual","perceptual-metrics","pytorch"],"latest_commit_sha":null,"homepage":"https://mingukkang.github.io/Diffusion2GAN/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mingukkang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-26T17:38:38.000Z","updated_at":"2025-03-17T10:56:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"d1293f2f-946a-4722-a49f-7d1202321e99","html_url":"https://github.com/mingukkang/elatentlpips","commit_stats":null,"previous_names":["mingukkang/elatentlpips"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mingukkang%2Felatentlpips","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mingukkang%2Felatentlpips/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mingukkang%2Felatentlpips/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mingukkang%2Felatentlpips/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mingukkang","download_url":"https://codeload.github.com/mingukkang/elatentlpips/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247166171,"owners_count":20894654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","diffusion-distillation","latent-diffusion-models","one-step-diffusion-model","perceptual","perceptual-metrics","pytorch"],"created_at":"2024-11-01T02:01:05.747Z","updated_at":"2025-04-04T11:17:00.080Z","avatar_url":"https://github.com/mingukkang.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# E-LatentLPIPS [[Project Page]](https://mingukkang.github.io/Diffusion2GAN/)\n\n**Diffusion2GAN: Distilling Diffusion Models into Conditional GANs [(paper)](https://arxiv.org/abs/2405.05967)**\n\n[Minguk Kang](https://mingukkang.github.io/), [Richard Zhang](https://richzhang.github.io/), [Connelly Barnes](https://www.connellybarnes.com/work/), [Sylvain Paris](https://research.adobe.com/person/sylvain-paris/), [Suha Kwak](https://suhakwak.github.io/), [Jaesik Park](https://jaesik.info/), [Eli Shechtman](https://research.adobe.com/person/eli-shechtman/), [Jun-Yan Zhu](https://www.cs.cmu.edu/~junyanz/), [Taesung Park](https://taesung.me/). In [ECCV](https://arxiv.org/abs/2405.05967), 2024.\n\nThis repository contains the author’s re-implementation of E-LatentLPIPS from my memory.\n\n# Notice\n\nWe found that latentlpips has an issue with Distributed Data Parallel (DDP) training. Specifically, torch.load() creates multiple processes on the main GPU instead of distributing GPU memory across devices. You'll need to update the current elatentlpips if you plan to use it with DDP training.\n\n`pip install elatentlpips --upgrade`\n\n\n# What is E-LatentLPIPS?\n\nE-LatentLPIPS is a perceptual distance metric that operates directly in the latent space of a Latent Diffusion Model, allowing users to calculate perceptual distances for regression tasks without decoding the latents into pixel space. This bypasses the costly decoding process required by LPIPS, offering a 9.7× speed up and reducing memory usage.\n\n![grid](assets/elatentlpips.png)\n\n\n![grid](assets/perception.png)\n\n# Models\n\nWe provide E-LatentLPIPS for five different Latent Diffusion Models: [SD1.5](https://github.com/runwayml/stable-diffusion), [SD2.1](https://github.com/Stability-AI/stablediffusion), [SDXL](https://github.com/Stability-AI/generative-models), [SD3](https://stability.ai/news/stable-diffusion-3), and [FLUX](https://github.com/black-forest-labs/flux).\n\n- `SD1.5 E-LatentLPIPS` Operates in the 4-channel latent space\n- `SD2.1 E-LatentLPIPS` Operates in the 4-channel latent space (identical to SD1.5)\n- `SDXL E-LatentLPIPS` Operates in the 4-channel latent space\n- `SD3 E-LatentLPIPS` Operates in the 16-channel latent space\n- `FLUX E-LatentLPIPS` Operates in the 16-channel latent space\n\nEach model supports the use of E-LatentLPIPS within its specific latent space configuration.​\n\n## Setup\n\nOption 1: Install using pip:\n\n`pip install elatentlpips`\n\nOption 2: Clone our repo and install dependencies.\n\n```\ngit clone https://github.com/mingukkang/elatentlpips.git\ncd elatentlpips\npip install -r requirements.txt\n```\n\n## Quick Start\n\n```python\nimport torch\nfrom diffusers import AutoencoderKL\nfrom elatentlpips import ELatentLPIPS\n\n# Load the VAE encoder and decoder from the FLUX model\n# If you want to use the latest FLUX encoder, please ensure that you update your diffusers package to the latest version:\n# 'pip install --upgrade diffusers[torch]'\nvae = AutoencoderKL.from_pretrained(\"black-forest-labs/FLUX.1-dev\", subfolder=\"vae\").to(\"cuda\")\n\n# Initialize E-LatentLPIPS with the specified encoder model (options: sd15, sd21, sdxl, sd3, flux)\n# The 'augment' parameter can be set to one of the following: b, bg, bgc, bgco\nelatentlpips = ELatentLPIPS(encoder=\"flux\", augment='bg').to(\"cuda\").eval()\n\n# Generate random images (ensure images are RGB and normalized to the range [-1, 1])\nimage0 = torch.zeros(1, 3, 512, 512).to(\"cuda\")  # First image (RGB, normalized)\nimage1 = torch.zeros(1, 3, 512, 512).to(\"cuda\")  # Second image (RGB, normalized)\n\n# Encode the images into the latent space using the VAE\nlatent0 = vae.encode(image0).latent_dist.sample()  # Encoded latent for image0\nlatent1 = vae.encode(image1).latent_dist.sample()  # Encoded latent for image1\n\n# Compute the perceptual distance between the two latent representations\n# Note: Set `normalize=True` if the latents (latent0 and latent1) are not already normalized \n# by `vae.config.scaling_factor` and `vae.config.shift_factor`.\ndistance = elatentlpips(latent0, latent1, normalize=True).mean()\n```\n\n## Performances\n\n\u003cdiv align=\"center\"\u003e\n\n| **Perceptual Metric**     | **ImageNet Top-1 Acc.** | **BAPPS 2AFC Traditional** | **BAPPS 2AFC CNN** |\n|---------------------------|-------------------------|----------------------------|--------------------|\n| LPIPS                     | 73.36                   | 73.36                      | 82.20              |\n| SD1.5-LatentLPIPS (paper) | 68.26                   | 74.29                      | 81.99              |\n| SD1.5-LatentLPIPS         | 69.91                   | 73.68                      | 81.77              |\n| SD2.1-LatentLPIPS         | 69.91                   | 73.68                      | 81.77              |\n| SDXL-LatentLPIPS          | 68.90                   | 71.33                      | 81.10              |\n| SD3-LatentLPIPS           | 71.10                   | 76.15                      | 82.63              |\n| FLUX-LatentLPIPS          | 66.18                   | 75.00                      | 82.47              |\n\n\u003c/div\u003e\n\n## Overfitting experiment \nWe perform overfitting experiments as outlined in Appendix B of the [Diffusion2GAN paper](https://mingukkang.github.io/Diffusion2GAN/static/paper/diffusion2gan_arxiv_v2.pdf). We observed that LatentLPIPS tends to exhibit a more favorable optimization landscape when using larger channel latent space, such as that of SD3 and FLUX.\n\nCommand: `CUDA_VISIBLE_DEVICES=0 python3 overfitting_exp.py --encoder sd15` or `flux`\n\n![grid](assets/overfitting_exp.png)\n\nWe use the following notations: `b` for pixel blitting, `g` for geometric transformations, `c` for color transformations, and `o` for cutout. Thus, `E-LatentLPIPS (bg)` refers to ensembled LatentLPIPS with pixel blitting and geometric transformations applied as augmentations.\n\n\n## E-LatentLPIPS thanks the following repos for the code sharing\n\nLPIPS (BSD-2-Clause license): https://github.com/richzhang/PerceptualSimilarity\n\nDifferentiable Augmentation (MIT license): https://github.com/mit-han-lab/data-efficient-gans\n\nADA (NVIDIA source code license): https://github.com/NVlabs/stylegan2-ada-pytorch\n\n## License\nE-LatentLPIPS is an open-source library under the [CC-BY-NC license](https://github.com/mingukkang/elatentlpips/blob/main/LICENSE).\n\n## Citation\nIf you find E-LatentLPIPS useful in your research, please cite our work:\n```bib\n@inproceedings{kang2024diffusion2gan,\n  author    = {Kang, Minguk and Zhang, Richard and Barnes, Connelly and Paris, Sylvain and Kwak, Suha and Park, Jaesik and Shechtman, Eli and Zhu, Jun-Yan and Park, Taesung},\n  title     = {{Distilling Diffusion Models into Conditional GANs}},\n  booktitle = {European Conference on Computer Vision (ECCV)},\n  year      = {2024},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmingukkang%2Felatentlpips","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmingukkang%2Felatentlpips","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmingukkang%2Felatentlpips/lists"}