{"id":13487909,"url":"https://github.com/chenwu98/unified-generative-zoo","last_synced_at":"2025-03-27T23:32:05.165Z","repository":{"id":61110148,"uuid":"548417343","full_name":"ChenWu98/unified-generative-zoo","owner":"ChenWu98","description":"[ICCV 2023] https://arxiv.org/abs/2210.05559","archived":false,"fork":false,"pushed_at":"2022-12-13T17:36:40.000Z","size":18201,"stargazers_count":119,"open_issues_count":0,"forks_count":5,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-10-30T23:36:02.520Z","etag":null,"topics":["3d-models","diffusion-models","generative-adversarial-networks","generative-models","image-synthesis","score-based-generative-models","text-to-image"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ChenWu98.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-09T14:26:23.000Z","updated_at":"2024-10-21T09:19:17.000Z","dependencies_parsed_at":"2023-01-28T13:47:07.632Z","dependency_job_id":null,"html_url":"https://github.com/ChenWu98/unified-generative-zoo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenWu98%2Funified-generative-zoo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenWu98%2Funified-generative-zoo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenWu98%2Funified-generative-zoo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenWu98%2Funified-generative-zoo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ChenWu98","download_url":"https://codeload.github.com/ChenWu98/unified-generative-zoo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944020,"owners_count":20697945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-models","diffusion-models","generative-adversarial-networks","generative-models","image-synthesis","score-based-generative-models","text-to-image"],"created_at":"2024-07-31T18:01:06.299Z","updated_at":"2025-03-27T23:32:00.152Z","avatar_url":"https://github.com/ChenWu98.png","language":"Python","funding_links":[],"categories":["Text Guided Image Editing"],"sub_categories":[],"readme":"# A Unified Interface for Guiding Generative Models (2D/3D GANs, Diffusion Models, and Their Variants)\n\nOfficial PyTorch implementation of (**Section 4.3** of) our paper \u003cbr\u003e\n**Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance** \u003cbr\u003e\nChen Henry Wu, Fernando De la Torre \u003cbr\u003e\nCarnegie Mellon University \u003cbr\u003e\n_Preprint, Oct 2022_\n\n[**[Paper link]**](https://arxiv.org/abs/2210.05559)\n\n## Updates\n**[Oct 13, 2022]** Code released.\n\n## Notes\n1. **Sections 4.1** and **4.2** of this paper is open-sourced at [CycleDiffusion](https://github.com/ChenWu98/cycle-diffusion).\n2. The code is based on [Generative Visual Prompt](https://github.com/ChenWu98/Generative-Visual-Prompt).\n3. Feel free to email me if you think I should cite your work! \n\n## Overview\n\nGANs, VAEs, and normalizing flows are usually characterized as deterministic mappings from **isometric Gaussian** latent codes to images. \nWe show that it is possible to unify various diffusion models into this formulation. \nThis allows us to guide (or condition, control) various generative models in a **unified, plug-and-play manner** by leveraging latent-space energy-based models (EBMs). \nThis repository provides a unified interface for guiding various generative models with CLIP, classifiers, and face IDs. \n\nModels studied in this paper (some of them are not included here; please check [CycleDiffusion](https://github.com/ChenWu98/cycle-diffusion)):\n\n\u003cdiv align=center\u003e\n    \u003cimg src=\"docs/models.png\" align=\"middle\" width=750\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nAn illustration of generative models as deterministic mappings from isometric Gaussian latent codes to images. \n\n\u003cdiv align=center\u003e\n    \u003cimg src=\"docs/convert.png\" align=\"middle\" width=750\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nInterestingly, we find that different models represent subpopulations and individuals in different ways, although most of them are trained on the same data. \n\n\u003cdiv align=center\u003e\n    \u003cimg src=\"docs/samples.png\" align=\"middle\" width=750\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=center\u003e\n    \u003cimg src=\"docs/ids.png\" align=\"middle\" width=725\u003e\n\u003c/div\u003e\n\n## Contents\n\n- [A Unified Interface for Guiding Generative Models (2D/3D GANs, Diffusion Models, and Their Variants)](#a-unified-interface-for-guiding-generative-models-2d3d-gans-diffusion-models-and-their-variants)\n  - [Updates](#updates)\n  - [TODOs](#todos)\n  - [Notes](#notes)\n  - [Overview](#overview)\n  - [Contents](#contents)\n  - [Dependencies](#dependencies)\n  - [Pre-trained checkpoints](#pre-trained-checkpoints)\n    - [Pre-trained generative models](#pre-trained-generative-models)\n    - [Off-the-shelf models for guidance](#off-the-shelf-models-for-guidance)\n  - [Usage](#usage)\n    - [Overview](#overview-1)\n    - [CLIP guidance for sampling sub-populations](#clip-guidance-for-sampling-sub-populations)\n    - [Classifier guidance for sampling sub-populations](#classifier-guidance-for-sampling-sub-populations)\n    - [ID guidance for sampling individuals](#id-guidance-for-sampling-individuals)\n  - [Citation](#citation)\n  - [License](#license)\n  - [Contact](#contact)\n\n## Dependencies\n\n1. Create environment by running\n```shell\nconda env create -f environment.yml\nconda activate generative_prompt\npip install git+https://github.com/openai/CLIP.git\n```\n2. Install `torch` and `torchvision` based on your CUDA version. \n3. Install [PyTorch 3D](https://github.com/facebookresearch/pytorch3d). Installing this library can be painful, but you can skip it if you are not using 3D GANs.\n4. Install [taming-transformers](https://github.com/CompVis/taming-transformers) by running\n```shell\ncd ../\ngit clone git@github.com:CompVis/taming-transformers.git\ncd taming-transformers/\npip install -e .\ncd ../\n```\n5. Set up [wandb](https://wandb.ai/) for logging (registration is required). You should modify the ```setup_wandb``` function in ```main.py``` to accomodate your wandb credentials. You may want to run something like\n```shell\nwandb login\n```\n\n## Pre-trained checkpoints\n### Pre-trained generative models\nWe provide a unified interface for various pre-trained generative models. Checkpoints for generative models used in this paper are provided below. \n1. StyleGAN2\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/iy0dkqnkx7uh2aq/ffhq.pt\nwget https://www.dropbox.com/s/lmjdijm8cfmu8h1/metfaces.pt\nwget https://www.dropbox.com/s/z1vts069w683py5/afhqcat.pt\nwget https://www.dropbox.com/s/a0hvdun57nvafab/stylegan2-church-config-f.pt\nwget https://www.dropbox.com/s/x1d19u8zd6yegx9/stylegan2-car-config-f.pt\nwget https://www.dropbox.com/s/hli2x42ekdaz2br/landscape.pt\n```\n2. StyleNeRF\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/dtqsroh95uquwoc/StyleNeRF_ffhq_256.pkl\nwget https://www.dropbox.com/s/klbuhqfv74q7e35/StyleNeRF_ffhq_512.pkl\nwget https://www.dropbox.com/s/n80cr7isveh5yfu/StyleNeRF_ffhq_1024.pkl\n```\n3. Extended Analytic DPM\n```shell\ncd ckpts/\nmkdir extended_adpm\ncd extended_adpm/\nwget https://www.dropbox.com/s/r8210seh6ekhogf/celeba64_ema_eps_epsc_pretrained_190000.ckpt.pth\nwget https://www.dropbox.com/s/6o5etzhgbihr0yh/celeba64_ema_eps_eps2_pretrained_340000.ckpt.pth\nwget https://www.dropbox.com/s/o0jw5ezai1e1z3v/celeba64_ema_eps.ckpt.pth\nwget https://www.dropbox.com/s/0axtykkvyz49hrw/celeba64_ema_eps.ms_eps.pth\n```\n4. StyleGAN-XL\n```shell\n# StyleGAN-XL will be downloaded automatically. \n```\n5. StyleSwin\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/f0nlvu6fh3bbpmd/StyleSwin_FFHQ_1024.pt\nwget https://www.dropbox.com/s/c2812gumbyxj751/StyleSwin_FFHQ_256.pt\n```\n6. StyleSDF\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/epet782zdu0hazx/stylesdf_ffhq_vol_renderer.pt\nwget https://www.dropbox.com/s/p0ptofh7sku2o8j/stylesdf_ffhq1024x1024.pt\nwget https://www.dropbox.com/s/rq756clx14a9kgd/stylesdf_afhq_vol_renderer.pt\nwget https://www.dropbox.com/s/hu5wgr40vyptzx6/stylesdf_afhq512x512.pt\nwget https://www.dropbox.com/s/8rsaxzmey64jugo/stylesdf_sphere_init.pt\n```\n7. Diffusion Autoencoder\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/ej0jj8g7crvtb5e/diffae_ffhq256.ckpt\nwget https://www.dropbox.com/s/w5y89y57r9nd1jt/diffae_ffhq256_latent.pkl\nwget https://www.dropbox.com/s/rsbpxaswnfzsyl1/diffae_ffhq128.ckpt\nwget https://www.dropbox.com/s/v1dvsj6oklpz652/diffae_ffhq128_latent.pkl\n```\n8. Latent Diffusion Model\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/9lpdgs83l7tjk6c/ldm_models.zip\nunzip ldm_models.zip\n```\n9. NVAE\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/bwwtszb5g5alw30/nvae_ffhq_256.pt\nwget https://www.dropbox.com/s/8dfryaandkmoxzz/nvae_celebahq_256.pt\n```\n10. EG3D\n```shell\ncd ckpts/\nwget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/research/eg3d/versions/1/zip -O eg3d.zip\nunzip eg3d.zip\n```\n11. Denoising Diffusion GAN\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/lsfkbln9u78rbhs/ddgan_celebahq256_netG_550.pth\n```\n12. GIRAFFE-HQ\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/jj03hto6o9rnbha/giraffehd_ffhq_1024.pt\n```\n13. Diffusion-GAN\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/25ryma8et4ohmjq/diffusion-stylegan2-ffhq.pkl\n```\n\n### Off-the-shelf models for guidance\n1. CLIP\n```text\n# CLIP will be downloaded automatically\n```\n2. ArcFace IR-SE 50 model, provided by the Colab demo in [this repo](https://github.com/orpatashnik/StyleCLIP)\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/qg7co4azsv5sacm/model_ir_se50.pth\n```\n3. CelebA classifier, trained by [this repo](https://github.com/ChenWu98/Generative-Visual-Prompt)\n```shell\ncd ckpts/\nwget https://www.dropbox.com/s/yzc8ydaa4ggj1zs/celeba.zip\nunzip celeba.zip \n```\n\n## Usage\n\n### Overview\n\nEach set notation `{A,B,C}` stands for several independent experiments. \nYou should always replace `{A,B,C}` with one of `A`, `B`, and `C`. \nModel checkpoints and image samples will be saved under `--output_dir`. \n\n### CLIP guidance for sampling sub-populations\n1. ```Generative model``` $\\in$ ```{LDM-DDIM, DiffAE, Diffusion-GAN, StyleGAN-XL, StyleGAN2, StyleNeRF, StyleSDF, EG3D, GIRAFFE-HD, StyleSwin, NVAE}```.\n2. ```Dataset and resolution``` $\\in$ ```{FFHQ1024, FFHQ512, FFHQ256, FFHQ128}```.\n3. ```Text description``` $\\in$ ```{\"a photo of a baby\", \"a photo of an old person\", \"a photo of a person with eyeglasses\", \"a photo of a person with eyeglasses and a yellow hat\"}```.\n4. ```Guidance strength``` $\\lambda_{\\text{CLIP}}$ $\\in$ ```{100, 300, 500, 700, 1000}```. In the following command, ```_500``` is omitted. \n5. Note that not all combinations of ```Generative model``` $\\times$ ```Dataset and resolution``` are available. Please check the paper and [available configs](config/experiments) for details. \n```shell\nexport CUDA_VISIBLE_DEVICES=0\nexport RUN_NAME=clip_{a_baby,an_old_person,a_person_with_eyeglasses,a_person_with_eyeglasses_and_a_yellow_hat}_{ffhq1024,ffhq512,ffhq256,ffhq128}_{styleganxl,stylegan2,styleswin,stylenerf,latentdiff_5step,latentdiff_10step,diffae_3step_3step_latent_only,stylesdf,stylegan2_no_trunc,stylesdf_no_trunc,styleswin_no_trunc,styleganxl_no_trunc,stylenerf_no_trunc,nvae,eg3d,eg3d_no_trunc,giraffehd,diffae_3step_3step_both,diffusion_stylegan2,diffusion_stylegan2_no_trunc,diffae_10step_10step_both,}_langevin{,_100,_300,_700,_1000}\nexport SEED=42\nnohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1410 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true \u003e $RUN_NAME$SEED.log 2\u003e\u00261 \u0026\n```\n\n### Classifier guidance for sampling sub-populations\n```shell\n# DDGAN CelebAHQ256 old\nexport CUDA_VISIBLE_DEVICES=0\nexport RUN_NAME=class_old_celebahq256_ddgan_langevin\nexport SEED=42\nnohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1430 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model ClassEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true \u003e $RUN_NAME$SEED.log 2\u003e\u00261 \u0026\n\n# DDGAN CelebAHQ256 eyeglassess\nexport CUDA_VISIBLE_DEVICES=0\nexport RUN_NAME=class_eyeglasses_celebahq256_ddgan_langevin\nexport SEED=42\nnohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1430 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model ClassEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true \u003e $RUN_NAME$SEED.log 2\u003e\u00261 \u0026\n\n# SN-DPM DDPM CelebA64 old\nexport CUDA_VISIBLE_DEVICES=0\nexport RUN_NAME=class_old_celeba64_sn_dpm_ddpm_langevin\nexport SEED=42\nnohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1430 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model ClassEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true \u003e $RUN_NAME$SEED.log 2\u003e\u00261 \u0026\n\n# SN-DPM DDPM CelebA64 eyeglasses\nexport CUDA_VISIBLE_DEVICES=0\nexport RUN_NAME=class_eyeglasses_celeba64_sn_dpm_ddpm_langevin\nexport SEED=42\nnohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1430 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model ClassEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true \u003e $RUN_NAME$SEED.log 2\u003e\u00261 \u0026\n```\n\n\n### ID guidance for sampling individuals\n1. ```ID reference``` $\\in$ ```{00001, 00002, 00015, 00018}```.\n2. ```Generative model``` $\\in$ ```{LDM-DDIM, DiffAE, StyleGAN-XL, StyleGAN2, DDGAN, EG3D, GIRAFFE-HD}```.\n3. ```Dataset and resolution``` $\\in$ ```{FFHQ1024, FFHQ512, FFHQ256, CelebAHQ256}```.\n4. Note that not all combinations of ```Generative model``` $\\times$ ```Dataset and resolution``` are available. Please check the paper and [available configs](config/experiments) for details. \n\n```shell\nexport CUDA_VISIBLE_DEVICES=0\nexport RUN_NAME=recon_id_{00001,00002,00015,00018}_{ffhq256,ffhq512,ffhq1024,celebahq256}_{latentdiff,diffae_10steps_10steps_both,giraffehd,stylegan2,styleganxl,ddgan,eg3d}_langevin\nexport SEED=42\nnohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1420 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true \u003e $RUN_NAME$SEED.log 2\u003e\u00261 \u0026\n```\n\n\n\n## Citation\nIf you find this repository helpful, please cite as\n```\n@inproceedings{unifydiffusion2022,\n  title={Unifying Diffusion Models' Latent Space, with Applications to {CycleDiffusion} and Guidance},\n  author={Chen Henry Wu and Fernando De la Torre},\n  booktitle={ArXiv},\n  year={2022},\n}\n```\nDo not forget to cite the original papers that proposed these models! \n\n## License\nWe use the X11 License. This license is identical to the MIT License, but with an extra sentence that prohibits using the copyright holders' names (Carnegie Mellon University in our case) for advertising or promotional purposes without written permission.\n\n\n\n\n## Contact\n[Issues](https://github.com/ChenWu98/unified-generative-zoo/issues) are welcome if you have any question about the code. \nIf you would like to discuss the method, please contact [Chen Henry Wu](https://github.com/ChenWu98).\n\n\u003ca href=\"https://github.com/ChenWu98\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/28187501?v=4\"  width=\"50\" /\u003e\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenwu98%2Funified-generative-zoo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchenwu98%2Funified-generative-zoo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenwu98%2Funified-generative-zoo/lists"}