{"id":13561777,"url":"https://github.com/phizaz/diffae","last_synced_at":"2025-05-16T10:07:10.079Z","repository":{"id":37698500,"uuid":"446713689","full_name":"phizaz/diffae","owner":"phizaz","description":"Official implementation of Diffusion Autoencoders","archived":false,"fork":false,"pushed_at":"2024-09-12T17:51:20.000Z","size":11462,"stargazers_count":910,"open_issues_count":50,"forks_count":144,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-07T23:13:35.209Z","etag":null,"topics":["autoencoder","cvpr2022","deep-learning","diffusion-models","ffhq","latent-variable-models","lsun"],"latest_commit_sha":null,"homepage":"https://diff-ae.github.io/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phizaz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-11T07:06:49.000Z","updated_at":"2025-04-03T09:04:21.000Z","dependencies_parsed_at":"2024-11-04T13:36:34.421Z","dependency_job_id":"fcb4186c-52af-4ffd-9a44-277368c3fc24","html_url":"https://github.com/phizaz/diffae","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phizaz%2Fdiffae","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phizaz%2Fdiffae/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phizaz%2Fdiffae/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phizaz%2Fdiffae/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phizaz","download_url":"https://codeload.github.com/phizaz/diffae/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254509476,"owners_count":22082891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autoencoder","cvpr2022","deep-learning","diffusion-models","ffhq","latent-variable-models","lsun"],"created_at":"2024-08-01T13:01:01.028Z","updated_at":"2025-05-16T10:07:05.070Z","avatar_url":"https://github.com/phizaz.png","language":"Jupyter Notebook","readme":"# Official implementation of Diffusion Autoencoders\r\n\r\nA CVPR 2022 (ORAL) paper ([paper](https://openaccess.thecvf.com/content/CVPR2022/html/Preechakul_Diffusion_Autoencoders_Toward_a_Meaningful_and_Decodable_Representation_CVPR_2022_paper.html), [site](https://diff-ae.github.io/), [5-min video](https://youtu.be/i3rjEsiHoUU)):\r\n\r\n```\r\n@inproceedings{preechakul2021diffusion,\r\n      title={Diffusion Autoencoders: Toward a Meaningful and Decodable Representation}, \r\n      author={Preechakul, Konpat and Chatthee, Nattanat and Wizadwongsa, Suttisak and Suwajanakorn, Supasorn},\r\n      booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, \r\n      year={2022},\r\n}\r\n```\r\n\r\n## Usage\r\n\r\n⚙️ Try a Colab walkthrough: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1OTfwkklN-IEd4hFk4LnweOleyDtS4XTh/view?usp=sharing)\r\n\r\n🤗 Try a web demo: [![Replicate](https://replicate.com/cjwbw/diffae/badge)](https://replicate.com/cjwbw/diffae)\r\n\r\nNote: Since we expect a lot of changes on the codebase, please fork the repo before using.\r\n\r\n### Prerequisites\r\n\r\nSee `requirements.txt`\r\n\r\n```\r\npip install -r requirements.txt\r\n```\r\n\r\n### Quick start\r\n\r\nA jupyter notebook.\r\n\r\nFor unconditional generation: `sample.ipynb`\r\n\r\nFor manipulation: `manipulate.ipynb`\r\n\r\nFor interpolation: `interpolate.ipynb`\r\n\r\nFor autoencoding: `autoencoding.ipynb`\r\n\r\nAligning your own images:\r\n\r\n1. Put images into the `imgs` directory\r\n2. Run `align.py` (need to `pip install dlib requests`)\r\n3. Result images will be available in `imgs_align` directory\r\n\r\n\u003ctable\u003e\r\n\u003ctr\u003e\r\n\u003cth width=\"33%\"\u003e\r\nOriginal in \u003ccode\u003eimgs\u003c/code\u003e directory\u003cbr\u003e\u003cimg src=\"imgs/sandy.JPG\" style=\"width: 100%\"\u003e\r\n\u003c/th\u003e\r\n\u003cth width=\"33%\"\u003e\r\nAligned with \u003ccode\u003ealign.py\u003c/code\u003e\u003cbr\u003e\u003cimg src=\"imgs_align/sandy.png\" style=\"width: 100%\"\u003e\r\n\u003c/th\u003e\r\n\u003cth width=\"33%\"\u003e\r\nUsing \u003ccode\u003emanipulate.ipynb\u003c/code\u003e\u003cbr\u003e\u003cimg src=\"imgs_manipulated/sandy-wavyhair.png\" style=\"width: 100%\"\u003e\r\n\u003c/th\u003e\r\n\u003c/tr\u003e\r\n\u003c/table\u003e\r\n\r\n\r\n### Checkpoints\r\n\r\nWe provide checkpoints for the following models:\r\n\r\n1. DDIM: **FFHQ128** ([72M](https://drive.google.com/drive/folders/1-fa46UPSgy9ximKngBflgSj3u87-DLrw), [130M](https://drive.google.com/drive/folders/1-Sqes07fs1y9sAYXuYWSoDE_xxTtH4yx)), [**Bedroom128**](https://drive.google.com/drive/folders/1-_8LZd5inoAOBT-hO5f7RYivt95FbYT1), [**Horse128**](https://drive.google.com/drive/folders/10Hq3zIlJs9ZSiXDQVYuVJVf0cX4a_nDB)\r\n2. DiffAE (autoencoding only): [**FFHQ256**](https://drive.google.com/drive/folders/1-5zfxT6Gl-GjxM7z9ZO2AHlB70tfmF6V), **FFHQ128** ([72M](https://drive.google.com/drive/folders/10bmB6WhLkgxybkhso5g3JmIFPAnmZMQO), [130M](https://drive.google.com/drive/folders/10UNtFNfxbHBPkoIh003JkSPto5s-VbeN)), [**Bedroom128**](https://drive.google.com/drive/folders/12EdjbIKnvP5RngKsR0UU-4kgpPAaYtlp), [**Horse128**](https://drive.google.com/drive/folders/12EtTRXzQc5uPHscpjIcci-Rg-OGa_N30)\r\n3. DiffAE (with latent DPM, can sample): [**FFHQ256**](https://drive.google.com/drive/folders/1-H8WzKc65dEONN-DQ87TnXc23nTXDTYb), [**FFHQ128**](https://drive.google.com/drive/folders/11pdjMQ6NS8GFFiGOq3fziNJxzXU1Mw3l), [**Bedroom128**](https://drive.google.com/drive/folders/11mdxv2lVX5Em8TuhNJt-Wt2XKt25y8zU), [**Horse128**](https://drive.google.com/drive/folders/11k8XNDK3ENxiRnPSUdJ4rnagJYo4uKEo)\r\n4. DiffAE's classifiers (for manipulation): [**FFHQ256's latent on CelebAHQ**](https://drive.google.com/drive/folders/117Wv7RZs_gumgrCOIhDEWgsNy6BRJorg), [**FFHQ128's latent on CelebAHQ**](https://drive.google.com/drive/folders/11EYIyuK6IX44C8MqreUyMgPCNiEnwhmI)\r\n\r\nCheckpoints ought to be put into a separate directory `checkpoints`. \r\nDownload the checkpoints and put them into `checkpoints` directory. It should look like this:\r\n\r\n```\r\ncheckpoints/\r\n- bedroom128_autoenc\r\n    - last.ckpt # diffae checkpoint\r\n    - latent.ckpt # predicted z_sem on the dataset\r\n- bedroom128_autoenc_latent\r\n    - last.ckpt # diffae + latent DPM checkpoint\r\n- bedroom128_ddpm\r\n- ...\r\n```\r\n\r\n\r\n### LMDB Datasets\r\n\r\nWe do not own any of the following datasets. We provide the LMDB ready-to-use dataset for the sake of convenience.\r\n\r\n- [FFHQ](https://1drv.ms/f/s!Ar2O0vx8sW70uLV1Ivk2pTjam1A8VA)\r\n- [CelebAHQ](https://1drv.ms/f/s!Ar2O0vx8sW70uL4GMeWEciHkHdH6vQ) \r\n\r\n**Broken links**\r\n\r\nNote: I'm trying to recover the following links. \r\n\r\n- [CelebA](https://drive.google.com/drive/folders/1HJAhK2hLYcT_n0gWlCu5XxdZj-bPekZ0?usp=sharing) \r\n- [LSUN Bedroom](https://drive.google.com/drive/folders/1O_3aT3LtY1YDE2pOQCp6MFpCk7Pcpkhb?usp=sharing)\r\n- [LSUN Horse](https://drive.google.com/drive/folders/1ooHW7VivZUs4i5CarPaWxakCwfeqAK8l?usp=sharing)\r\n\r\nThe directory tree should be:\r\n\r\n```\r\ndatasets/\r\n- bedroom256.lmdb\r\n- celebahq256.lmdb\r\n- celeba.lmdb\r\n- ffhq256.lmdb\r\n- horse256.lmdb\r\n```\r\n\r\nYou can also download from the original sources, and use our provided codes to package them as LMDB files.\r\nOriginal sources for each dataset is as follows:\r\n\r\n- FFHQ (https://github.com/NVlabs/ffhq-dataset)\r\n- CelebAHQ (https://github.com/switchablenorms/CelebAMask-HQ)\r\n- CelebA (https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)\r\n- LSUN (https://github.com/fyu/lsun)\r\n\r\nThe conversion codes are provided as:\r\n\r\n```\r\ndata_resize_bedroom.py\r\ndata_resize_celebhq.py\r\ndata_resize_celeba.py\r\ndata_resize_ffhq.py\r\ndata_resize_horse.py\r\n```\r\n\r\nGoogle drive: https://drive.google.com/drive/folders/1abNP4QKGbNnymjn8607BF0cwxX2L23jh?usp=sharing\r\n\r\n\r\n## Training\r\n\r\nWe provide scripts for training \u0026 evaluate DDIM and DiffAE (including latent DPM) on the following datasets: FFHQ128, FFHQ256, Bedroom128, Horse128, Celeba64 (D2C's crop).\r\nUsually, the evaluation results (FID's) will be available in `eval` directory.\r\n\r\nNote: Most experiment requires at least 4x V100s during training the DPM models while requiring 1x 2080Ti during training the accompanying latent DPM. \r\n\r\n\r\n\r\n**FFHQ128**\r\n```\r\n# diffae\r\npython run_ffhq128.py\r\n# ddim\r\npython run_ffhq128_ddim.py\r\n```\r\n\r\nA classifier (for manipulation) can be trained using:\r\n```\r\npython run_ffhq128_cls.py\r\n```\r\n\r\n**FFHQ256**\r\n\r\nWe only trained the DiffAE due to high computation cost.\r\nThis requires 8x V100s.\r\n```\r\nsbatch run_ffhq256.py\r\n```\r\n\r\nAfter the task is done, you need to train the latent DPM (requiring only 1x 2080Ti)\r\n```\r\npython run_ffhq256_latent.py\r\n```\r\n\r\nA classifier (for manipulation) can be trained using:\r\n```\r\npython run_ffhq256_cls.py\r\n```\r\n\r\n**Bedroom128**\r\n\r\n```\r\n# diffae\r\npython run_bedroom128.py\r\n# ddim\r\npython run_bedroom128_ddim.py\r\n```\r\n\r\n**Horse128**\r\n\r\n```\r\n# diffae\r\npython run_horse128.py\r\n# ddim\r\npython run_horse128_ddim.py\r\n```\r\n\r\n**Celeba64**\r\n\r\nThis experiment can be run on 2080Ti's.\r\n\r\n```\r\n# diffae\r\npython run_celeba64.py\r\n```\r\n","funding_links":[],"categories":["Jupyter Notebook","Papers"],"sub_categories":["Text-Image Generation"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphizaz%2Fdiffae","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphizaz%2Fdiffae","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphizaz%2Fdiffae/lists"}