{"id":17714365,"url":"https://github.com/czg1225/AsyncDiff","last_synced_at":"2025-03-13T23:30:48.606Z","repository":{"id":243919350,"uuid":"808473951","full_name":"czg1225/AsyncDiff","owner":"czg1225","description":"Official implementation of \"AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising\"","archived":false,"fork":false,"pushed_at":"2024-08-14T14:03:38.000Z","size":67810,"stargazers_count":130,"open_issues_count":5,"forks_count":6,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-08-14T15:33:39.666Z","etag":null,"topics":["diffusion-models","distributed-computing","efficient-inference","inference-acceleration","stable-diffusion","text-to-image","text-to-video","training-free"],"latest_commit_sha":null,"homepage":"https://czg1225.github.io/asyncdiff_page/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/czg1225.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-31T06:33:07.000Z","updated_at":"2024-08-14T14:03:41.000Z","dependencies_parsed_at":"2024-08-14T15:42:13.279Z","dependency_job_id":null,"html_url":"https://github.com/czg1225/AsyncDiff","commit_stats":null,"previous_names":["czg1225/asyncdiff"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czg1225%2FAsyncDiff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czg1225%2FAsyncDiff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czg1225%2FAsyncDiff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czg1225%2FAsyncDiff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/czg1225","download_url":"https://codeload.github.com/czg1225/AsyncDiff/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221419982,"owners_count":16817490,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","distributed-computing","efficient-inference","inference-acceleration","stable-diffusion","text-to-image","text-to-video","training-free"],"created_at":"2024-10-25T11:02:22.744Z","updated_at":"2025-03-13T23:30:48.285Z","avatar_url":"https://github.com/czg1225.png","language":"Python","funding_links":[],"categories":["Train-Free"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/logo-modified.png\" width=\"23%\"\u003e \u003cbr\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eAsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising\u003c/h1\u003e\n\n  \u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://opensource.org/licenses/Apache-2.0\"\u003e\n    \u003cimg alt=\"License: Apache 2.0\" src=\"https://img.shields.io/badge/License-Apache%202.0-4E94CE.svg\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/abs/2406.06911\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Conference-NeurIPS'24-924E7D.svg\" alt=\"Paper\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://czg1225.github.io/asyncdiff_page/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Project-Page-FFB000.svg\" alt=\"Project\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pytorch.org/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/PyTorch-%3E=v2.0.1-EE4C2C.svg\" alt=\"PyTorch\u003e=v2.0.1\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\n\u003e **AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising**   \n\u003e [Zigeng Chen](https://github.com/czg1225), [Xinyin Ma](https://horseee.github.io/), [Gongfan Fang](https://fangggf.github.io/), [Zhenxiong Tan](https://github.com/Yuanshi9815), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)   \n\u003e [xML Lab](https://sites.google.com/view/xml-nus), National University of Singapore  \n\u003e 🥯[[Paper]](https://arxiv.org/abs/2406.06911)🎄[[Project Page]](https://czg1225.github.io/asyncdiff_page/) \\\n\u003e Code Contributors: [Zigeng Chen](https://github.com/czg1225), [Zhenxiong Tan](https://github.com/Yuanshi9815)\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/combined.png\" width=\"100%\" \u003e\u003c/img\u003e\n  \u003cbr\u003e\n  \u003cem\u003e\n      2.8x Faster on SDXL with 4 devices. Top: 50 step original (13.81s). Bottom: 50 step AsyncDiff (4.98s)\n  \u003c/em\u003e\n\u003c/div\u003e\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/combined.gif\" width=\"100%\" \u003e\u003c/img\u003e\n  \u003cbr\u003e\n  \u003cem\u003e\n      1.8x Faster on AnimateDiff with 2 devices. Top: 50 step original (43.5s). Bottom: 50 step AsyncDiff (24.5s)\n  \u003c/em\u003e\n\u003c/div\u003e\n\u003cbr\u003e\n\n### Updates\n* :tada: **September 26, 2024**: Our AsyncDiff is accepted by NeurIPS 2024!\n* 🚀 **August 14, 2024**: Now supporting Stable Diffusion XL Inpainting! The inference sample of accelerating SDXL Inpainting can be found at [run_sdxl_inpaint.py](https://github.com/czg1225/AsyncDiff/blob/main/examples/run_sdxl_inpaint.py).\n* 🚀 **July 18, 2024**: Now supporting Stable Diffusion 3 Medium! The inference sample of accelerating SD 3 can be found at [run_sd3.py](https://github.com/czg1225/AsyncDiff/blob/main/examples/run_sd3.py).\n* 🚀 **June 18, 2024**: Now supporting ControlNet! The inference sample of accelerating controlnet+SDXL can be found at [run_sdxl_controlnet.py](https://github.com/czg1225/AsyncDiff/blob/main/examples/run_sdxl_controlnet.py).\n* 🚀 **June 17, 2024**: Now supporting Stable Diffusion x4 Upscaler! The inference sample can be found at [run_sd_upscaler.py](https://github.com/czg1225/AsyncDiff/blob/main/examples/run_sd_upscaler.py).\n* 🚀 **June 12, 2024**: Code of AsyncDiff is released.\n\n### Supported Diffusion Models:\n- ✅ [Stable Diffusion 3 Medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers)\n- ✅ [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1)\n- ✅ [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5)\n- ✅ [Stable Diffusion x4 Upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) \n- ✅ [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) \n- ✅ [Stable Diffusion XL Inpainting](https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1) \n- ✅ [ControlNet](https://huggingface.co/docs/diffusers/using-diffusers/controlnet#text-to-image) \n- ✅ [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)\n- ✅ [AnimateDiff](https://huggingface.co/docs/diffusers/api/pipelines/animatediff)\n\n## Introduction\nWe introduce **AsyncDiff**, a universal and plug-and-play diffusion acceleration scheme that enables model parallelism across multiple devices. Our approach divides the cumbersome noise prediction model into multiple components, assigning each to a different device. To break the dependency chain between these components, it transforms the conventional sequential denoising into an asynchronous process by exploiting the high similarity between hidden states in consecutive diffusion steps. Consequently, each component is facilitated to compute in parallel on separate devices. The proposed strategy significantly reduces inference latency while minimally impacting the generative quality.\n\n\n![AsyncDiff Overview](assets/fig2.png)\nAbove is the overview of the asynchronous denoising process. The denoising model εθ is divided into four components for clarity. Following the warm-up stage, each component’s input is\nprepared in advance, breaking the dependency chain and facilitating parallel processing.\n\n## 🔧 Quick Start\n\n### Installation\n- Prerequisites\n\n  NVIDIA GPU + CUDA \u003e= 12.0 and corresponding CuDNN\n\n\n- Create environment：\n\n  ```shell\n  conda create -n asyncdiff python=3.10\n  conda activate asyncdiff\n  pip install -r requirements.txt\n  ```\n\n### Usage Example\nSimply add two lines of code to enable asynchronous parallel inference for the diffusion model.\n```python\nimport torch\nfrom diffusers import StableDiffusionPipeline\nfrom asyncdiff.async_sd import AsyncDiff\n\npipeline = StableDiffusionPipeline.from_pretrained(\"stabilityai/stable-diffusion-2-1\", \ntorch_dtype=torch.float16, use_safetensors=True, low_cpu_mem_usage=True)\n\nasync_diff = AsyncDiff(pipeline, model_n=2, stride=1, time_shift=False)\n\nasync_diff.reset_state(warm_up=1)\nimage = pipeline(\u003cprompts\u003e).images[0]\nif dist.get_rank() == 0:\n  image.save(f\"output.jpg\")\n```\nHere, we use the Stable Diffusion pipeline as an example. You can replace `pipeline` with any variant of the Stable Diffusion pipeline, such as SD 2.1, SD 1.5, SDXL, or SVD. We also provide the implementation of AsyncDiff for AnimateDiff in `asyncdiff.async_animate`.\n* `model_n`: Number of components into which the denoising model is divided. Options: 2, 3, or 4.\n* `stride`: Denoising stride of each parallel computing batch. Options: 1 or 2.\n* `warm_up`: Number of steps for the warm-up stage. More warm-up steps can achieve pixel-level consistency with the original output while slightly reducing processing speed.\n* `time_shift`: Enables time shifting. Setting `time_shift` to `True` can enhance the denoising capability of the diffusion model. However, it should generally remain `False`. Only enable `time_shift` when the accelerated model produces images or videos with significant noise.\n\n\n\n## Inference\nWe offer detailed scripts in `examples/` for accelerating inference of SD 2.1, SD 1.5, SDXL, SD 3, ControNet, SD_Upscaler, AnimateDiff, and SVD using our AsyncDiff framework.\n\n### 🚀 Accelerate Stable Diffusion XL:\n```python\nCUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --run-path examples/run_sdxl.py\n```\n\n### 🚀 Accelerate Stable Diffusion 2.1 or 1.5:\n```python\nCUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --run-path examples/run_sd.py\n```\n\n### 🚀 Accelerate Stable Diffusion 3 Medium:\n```python\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_sd3.py\n```\n\n### 🚀 Accelerate Stable Diffusion x4 Upscaler:\n```python\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_sd_upscaler.py\n```\n\n### 🚀 Accelerate SDXL Inpainting:\n```python\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_sdxl_inpaint.py\n```\n\n### 🚀 Accelerate ControlNet+SDXL :\n```python\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_sdxl_controlnet.py\n```\n\n### 🚀 Accelerate Animate Diffusion:\n```python\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_animatediff.py\n```\n\n### 🚀 Accelerate Stable Video Diffusion:\n```python\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_svd.py\n```\n\n## Qualitative Results\nQualitative Results on SDXL and SD 2.1. More qualitative results can be found in our [paper](https://arxiv.org/abs/2406.06911).\n![Qualitative Results](assets/qualitative.png)\n\n![Qualitative Results](assets/qualitative2.png)\n\n## Quantitative Results\nQuantitative evaluations of **AsyncDiff** on three text-to-image diffusion models, showcasing various configurations. More quantitative results can be found in our [paper](https://arxiv.org/abs/2406.06911).\n![Quantitative Results](assets/quantitative.png)\n\n## Bibtex\n```\n@article{chen2024asyncdiff,\n  title={AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising},\n  author={Chen, Zigeng and Ma, Xinyin and Fang, Gongfan and Tan, Zhenxiong and Wang, Xinchao},\n  journal={arXiv preprint arXiv:2406.06911},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczg1225%2FAsyncDiff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fczg1225%2FAsyncDiff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczg1225%2FAsyncDiff/lists"}