{"id":20216183,"url":"https://github.com/thudm/relaydiffusion","last_synced_at":"2025-04-07T05:10:50.704Z","repository":{"id":192728104,"uuid":"687065217","full_name":"THUDM/RelayDiffusion","owner":"THUDM","description":"The official implementation of \"Relay Diffusion: Unifying diffusion process across resolutions for image synthesis\" [ICLR 2024 Spotlight]","archived":false,"fork":false,"pushed_at":"2024-04-29T09:29:51.000Z","size":21147,"stargazers_count":296,"open_issues_count":2,"forks_count":19,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-03-30T21:09:59.145Z","etag":null,"topics":["diffusion-models","generative-model","image-synthesis","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-04T14:28:18.000Z","updated_at":"2025-03-27T02:46:54.000Z","dependencies_parsed_at":"2024-04-29T10:52:50.980Z","dependency_job_id":null,"html_url":"https://github.com/THUDM/RelayDiffusion","commit_stats":null,"previous_names":["thudm/relaydiffusion"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FRelayDiffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FRelayDiffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FRelayDiffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FRelayDiffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/RelayDiffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247595335,"owners_count":20963943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","generative-model","image-synthesis","machine-learning"],"created_at":"2024-11-14T06:26:42.329Z","updated_at":"2025-04-07T05:10:50.683Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Relay Diffusion: Unifying diffusion process across resolutions for image synthesis \u003cbr\u003e\u003csub\u003eOfficial Pytorch Implementation 🌐[[WiseModel]](https://www.wisemodel.cn/models/ZhipuAI/RelayDiffsuon/intro) 🌐[[Model Scope]](https://www.modelscope.cn/models/ZhipuAI/RelayDiffusion/summary)\u003c/sub\u003e\n\n🎉**News!** The paper of RelayDiffusion has been accepted by ICLR 2024 (**Spotlight**)!\n\n![](resources/samples.jpg)\n\nWe propose ***Relay Diffusion Model (RDM)*** as a better framework for diffusion generation. ***RDM*** transfers a low-resolution image or noise into an equivalent high-resolution one via blurring diffusion and block noise. Therefore, the diffusion process can continue seamlessly in any new resolution or model without restarting from pure noise or low-resolution conditioning.\n\nRDM achieved **state-of-the-art** FID on CelebA-HQ and sFID ImageNet-256 (FID=1.87)!\n\nFor a formal introduction, Read our paper: [Relay Diffusion: Unifying diffusion process across resolutions for image synthesis](https://arxiv.org/abs/2309.03350).\n\n## Setup\n\n### Environment\n\nDownload the repo and setup the environment with:\n\n```bash\ngit clone https://github.com/THUDM/RelayDiffusion.git\ncd RelayDiffusion\nconda env create -f environment.yml\nconda activate rdm\n```\n\nWe enable `xformers.ops.memory_efficient_attention` to reduce about 15% training cost. If there is no need you can also remove `xformers` from `environment.yml`.\n\nLinux servers with Nvidia A100s are recommended. However, by setting smaller `--batch-gpu` (batch size on a single gpu), you can still run the inference and training scripts on less powerful GPUs.\n\n### Dataset\n\nWe preprocess and implement datasets with the same format as [EDM](https://github.com/NVlabs/edm). For CelebA-HQ, follow [*Progressive Growing of GANs for Improved Quality, Stability, and Variation*](https://github.com/tkarras/progressive_growing_of_gans) to construct the high-quality subset of CelebA. For ImageNet, download data from the [official site](https://www.kaggle.com/c/imagenet-object-localization-challenge/overview/description).\n\nTo convert the original data to organized data ready for training at $64\\times 64$ or $256\\times 256$ resolution, run command:\n\n```bash\npython dataset_tool.py \\\n\t--source=/path/to/original/data \\\n\t--dest=/path/to/output/data.zip \\\n    --transform=center-crop \\\n\t--resolution=64x64 # or --resolution=256x256\n```\n\n## Inference \u0026 Evaluation\n\n### Sample Generation\n\nTo generate samples from RDM models, run command:\n\n```bash\ntorchrun --standalone --nproc_per_node=1 generate.py --sampler_stages=both --outdir=/path/to/output/dir/ \\\n    --network_first=/path/to/1st/ckpt --network_second=/path/to/2nd/ckpt\n```\n\nTo generate $N$ images, set `--seed=[K]-[K+N-1]` with a randomly-picked $K$. You can assign `--nproc_per_node=N` to enable parallel generation of multiple GPUs.\n\nIf you want to generate final samples from first-stage results (only use the second stage model), set `--sampler_stages=second` and assign input directory of first-stage results by `--indir`.\n\nBesides, arguments for configurations of the first stage are:\n\n- `num_steps_first`: number of sampling steps.\n- `sigma_min_first` \u0026 `sigma_max_first`: lowest \u0026 highest noise level.\n- `rho_first`: time step exponent.\n- `cfg_scale_first`: scale of classifier-free guidance.\n- `S_churn`: stochasticity strength.\n- `S_min` \u0026 `S_max`: min \u0026 max noise level.\n- `S_noise`: noise inflation.\n\nArguments for configurations of the second stage are:\n\n- `num_steps_second`: number of sampling steps.\n- `sigma_min_second` \u0026 `sigma_max_second`: lowest \u0026 highest noise level.\n- `blur_sigma_max_second`: maximum sigma of blurring schedule.\n- `rho_second`: time step exponent.\n- `cfg_scale_second`: scale of classifier-free guidance.\n- `up_scale_second`: scale of upsampling.\n- `truncation_sigma_second` \u0026 `truncation_t_second`: truncation point of noise \u0026 time schedule.\n- `s_block_second`: strength of block noise addition.\n- `s_noise_second`: strength of stochasticity.\n\n\n### Evaluation Metrics\n\nWe quantitatively measure the sample quality by metrics including **Fréchet inception distance (FID)**, **spatial FID (sFID)**, **Inception Score (IS)**, **Precision** and **Recall**. For sFID, IS, Precision and Recall, we reformat the calculation pipeline based on the formulation in `tensorflow` from [ADM](https://github.com/openai/guided-diffusion).\n\nFirst, run the following command to generate activation data file from samples and dataset:\n\n```bash\ntorchrun --standalone --nproc_per_node=1 evaluate.py activations --data=/sample/dir/ --dest=eval-refs/activations_sample.npz --batch=64 # build sample activations\ntorchrun --standalone --nproc_per_node=1 evaluate.py activations --data=/path/to/dataset.zip --dest=eval-refs/activations_ref.npz --batch=64 # build reference activations\n```\n\nThen calculate metrics based on pre-built activations, run command:\n\n```bash\ntorchrun --standalone --nproc_per_node=1 evaluate.py calc --batch=64 \\\n    --activations_sample=eval-refs/activations_sample.npz \\\n    --activations_ref=eval-refs/activations_ref.npz \\\n    [-m fid] [-m sfid] [-m is] [-m pr] \\ # assign metrics to be calculated\n```\n\n### Performance Reproduction\n\nRDM achieves competitive results in comparison with previous SoTA models:\n\n| Dataset   | Resolution | Training Samples | FID  | sFID |   IS   | Precision | Recall |\n| --------- | ---------- | ---------------- | :--: | :--: | :----: | :-------: | :----: |\n| CelebA-HQ | 256x256    | 47M              | 3.15 |  -   |   -    |   0.77    |  0.55  |\n| ImageNet  | 256x256    | 1250M            | 1.87 | 3.97 | 278.75 |   0.81    |  0.59  |\n\nWe provide best pre-trained checkpoints of RDM and their sampler settings for reproducing performance:\n\n- CelebA-HQ $256\\times 256$:\n\n  Download checkpoints of [first stage](https://cloud.tsinghua.edu.cn/f/8e8e4b2743fe4447b497/?dl=1) and [second stage](https://cloud.tsinghua.edu.cn/f/b8cd559a0e9f4b9abd39/?dl=1), place them in `ckpts/`, generate samples and their activations by commands:\n\n  ```bash\n  torchrun --standalone --nproc_per_node=8 generate_celebahq.py --outdir=generations/celebahq_samples/ \\\n      --network_first=ckpts/celebahq_first_stage.pt \\\n      --network_second=ckpts/celebahq_second_stage.pt\n  torchrun --standalone --nproc_per_node=1 evaluate.py activations \\\n      --data=generations/celebahq_samples/ --dest=eval-refs/celebahq_act_sample.npz \n  ```\n\n  Generate activation data from CelebA-HQ zip or download our version from [here](https://cloud.tsinghua.edu.cn/f/a26f714e36304c3e948d/?dl=1):\n\n  ```bash\n  torchrun --standalone --nproc_per_node=1 evaluate.py activations \\\n      --data=datasets/celebahq-256x256.zip --dest=eval-refs/celebahq_act_ref.npz \n  ```\n\n  Calculate metrics by command:\n\n  ```bash\n  python evaluate.py calc -m fid -m pr \\\n      --activations_sample=eval-refs/celebahq_act_sample.npz \\\n      --activations_ref=eval-refs/celebahq_act_ref.npz\n  ```\n\n- ImageNet $256\\times 256$:\n\n  Download checkpoints of [first stage](https://cloud.tsinghua.edu.cn/f/c9a0ab6341704ed0be55/?dl=1) and [second stage](https://cloud.tsinghua.edu.cn/f/b5915d0b7d994e86b4bb/?dl=1), place them in `ckpts/`, generate samples and their activations by commands:\n\n  ```bash\n  torchrun --standalone --nproc_per_node=8 generate_imagenet.py --outdir=generations/imagenet_samples/ \\\n      --network_first=ckpts/imagenet_first_stage.pkl \\\n      --network_second=ckpts/imagenet_second_stage.pt\n  torchrun --standalone --nproc_per_node=1 evaluate.py activations \\\n      --data=generations/imagenet_samples/ --dest=eval-refs/imagenet_act_sample.npz \n  ```\n\n  Generate activation data from ImageNet zip: (The activation data of 128w images is up to 40GB, which is too big to upload. We only upload mu and sigma of [reference](https://cloud.tsinghua.edu.cn/f/8b98f215ae3a4977978c/?dl=1) and [samples](https://cloud.tsinghua.edu.cn/f/2b0f6fc577744fa19f1b/?dl=1) for calculating FID here. For more metrics, you need to generate activation by yourself.)\n\n  ```bash\n  torchrun --standalone --nproc_per_node=1 evaluate.py activations \\\n      --data=datasets/imagenet-256x256.zip --dest=eval-refs/imagenet_act_ref.npz \n  ```\n\n  Calculate FID, sFID and IS by command:\n\n  ```bash\n  python evaluate.py calc -m fid -m sfid -m is \\\n      --activations_sample=eval-refs/imagenet_act_sample.npz \\\n      --activations_ref=eval-refs/imagenet_act_ref.npz\n  ```\n\n  For the calculation of Precision and Recall on ImageNet, we follow [ADM](https://github.com/openai/guided-diffusion) to use 1w reference samples. You can download the activation data we produced from [here](https://cloud.tsinghua.edu.cn/f/924f9878ddc340bcb09c/?dl=1). Then run the following command:\n\n  ```bash\n  python evaluate.py calc -m pr \\\n      --activations_sample=eval-refs/imagenet_act_sample.npz \\\n      --activations_ref=eval-refs/imagenet_act_1w_ref.npz\n  ```\n\n## Training\n\nyou can follow the instruction of [EDM](https://github.com/NVlabs/edm) to train a new model of the first stage (standard diffusion). Using ImageNet for example, run command:\n\n```bash\ntorchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs --data=datasets/imagenet-64x64.zip --eff-attn=True \\\n\t--cond=1 --batch=4096  --batch-gpu=32 --lr=1e-4 --ema=50 --dropout=0.1 --fp16=1 --ls=25 \\\n\t--arch=adm --precond=edm\n```\n\nIf you want to train a second stage model (blurring diffusion), set argument `--precond=blur` and other arguments for the configuration of blurring diffusion. The command will be:\n\n```bash\ntorchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs --data=datasets/imagenet-256x256.zip --eff-attn=True \\\n\t--cond=1 --batch=4096  --batch-gpu=8 --lr=1e-4 --dropout=0.1 --fp16=1 --ls=1 \\\n\t--arch=adm --precond=blur --up-scale=4 --block-scale=0.15 --prob-length=0.93 --blur-sigma-max=3.0\n```\n\nAs for CelebA-HQ, train a first stage model with:\n\n```bash\ntorchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs --data=datasets/CelebA-HQ-64x64.zip --eff-attn=True \\\n\t--cond=0 --batch=1024  --batch-gpu=32 --lr=1e-4 --dropout=0.15 --augment=0.2 --ls=1 \\\n\t--arch=adm --precond=edm\n```\n\nAnd for training a second stage model:\n\n```bash\ntorchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs --data=datasets/CelebA-HQ-256x256.zip --eff-attn=True \\\n\t--cond=0 --batch=1024  --batch-gpu=8 --lr=1e-4 --dropout=0.2 --augment=0.2 --fp16=1 --ls=1 \\\n\t--arch=adm --precond=blur --up-scale=4 --block-scale=0.15 --prob-length=0.89 --blur-sigma-max=2.0\n```\n\n## Citation\n\n```\n@article{teng2023relay,\n  title={Relay Diffusion: Unifying diffusion process across resolutions for image synthesis},\n  author={Teng, Jiayan and Zheng, Wendi and Ding, Ming and Hong, Wenyi and Wangni, Jianqiao and Yang, Zhuoyi and Tang, Jie},\n  journal={arXiv preprint arXiv:2309.03350},\n  year={2023}\n}\n```\n\n## Acknowledgements\n\nThis implementation is based on https://github.com/NVlabs/edm (codebase of EDM). Thanks a lot!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Frelaydiffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthudm%2Frelaydiffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Frelaydiffusion/lists"}