{"id":21504954,"url":"https://github.com/Roblox/SmoothCache","last_synced_at":"2025-07-16T00:32:13.432Z","repository":{"id":264366796,"uuid":"880566939","full_name":"Roblox/SmoothCache","owner":"Roblox","description":"Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.","archived":false,"fork":false,"pushed_at":"2025-03-21T22:22:01.000Z","size":33,"stargazers_count":44,"open_issues_count":5,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-11T09:59:01.537Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Roblox.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-30T00:31:43.000Z","updated_at":"2025-06-05T01:29:50.000Z","dependencies_parsed_at":"2024-11-23T19:02:33.106Z","dependency_job_id":"ae4eef75-4768-49cd-b767-5f5073ff3cfd","html_url":"https://github.com/Roblox/SmoothCache","commit_stats":null,"previous_names":["roblox/smoothcache"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Roblox/SmoothCache","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FSmoothCache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FSmoothCache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FSmoothCache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FSmoothCache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Roblox","download_url":"https://codeload.github.com/Roblox/SmoothCache/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FSmoothCache/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265419254,"owners_count":23761842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T19:01:43.071Z","updated_at":"2025-07-16T00:32:13.421Z","avatar_url":"https://github.com/Roblox.png","language":"Python","funding_links":[],"categories":["Train-Free"],"sub_categories":[],"readme":"\u003c!-- \u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/Roblox/SmoothCache/blob/main/assets/TeaserFigureFlat.png\" width=\"100%\" \u003e\u003c/img\u003e\n  \u003cbr\u003e\n  \u003cem\u003e\n      (Accelerating Diffusion Transformer inference across multiple modalities with 50 DDIM Steps on DiT-XL-256x256, 100 DPM-Solver++(3M) SDE steps for a 10s audio sample (spectrogram shown) on Stable Audio Open, 30 Rectified Flow steps on Open-Sora 480p 2s videos) \n  \u003c/em\u003e\n\u003c/div\u003e\n\u003cbr\u003e --\u003e\n\n![Accelerating Diffusion Transformer inference across multiple modalities with 50 DDIM Steps on DiT-XL-256x256, 100 DPM-Solver++(3M) SDE steps for a 10s audio sample (spectrogram shown) on Stable Audio Open, 30 Rectified Flow steps on Open-Sora 480p 2s videos](assets/TeaserFigureFlat.png)\n\n**Figure 1. Accelerating Diffusion Transformer inference across multiple modalities with 50 DDIM Steps on DiT-XL-256x256, 100 DPM-Solver++(3M) SDE steps for a 10s audio sample (spectrogram shown) on Stable Audio Open, 30 Rectified Flow steps on Open-Sora 480p 2s videos**\n\n\n# Updates\n\n## Release v0.1\n\n[View release notes for v0.1](https://github.com/Roblox/SmoothCache/releases/tag/v0.1)\n\nSmoothCache now supports generating cache schedues using a zero-intrusion external helper. See [run_calibration.py](./examples/run_calibration.py) to find out how it generates a schedule compatible with [HuggingFace Diffusers DiTPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/dit/pipeline_dit.py), without requiring any changes to Diffusers implementation!\n\n\n# Introduction\nWe introduce **SmoothCache**, a straightforward acceleration technique for DiT architecture models, that's both **training-free, flexible and performant**. By leveraging layer-wise representation error, our method identifies redundancies in the diffusion process, generates a static caching scheme to reuse output featuremaps and therefore reduces the need for computationally expensive operations. This solution works across different models and modalities, can be easily dropped into existing Diffusion Transformer pipelines, can be stacked on different solvers, and requires no additional training or datasets. **SmoothCache** consistently outperforms various solvers designed to accelerate the diffusion process, while matching or surpassing the performance of existing modality-specific caching techniques.\n\n\u003e 🥯[[Arxiv]](https://arxiv.org/abs/2411.10510)\n\n![Illustration of SmoothCache. When the layer representation loss obtained from the calibration pass is below some threshold α, the corresponding layer is cached and used in place of the same computation on a future timestep. The figure on the left shows how the layer representation error impacts whether certain layers are eligible for caching. The error of the attention (attn) layer is higher in earlier timesteps, so our schedule caches the later timesteps accordingly. The figure on the right shows the application of the caching schedule to the DiT-XL architecture. The output of the attn layer at time t − 1 is cached and re-used in place of computing FFN t − 2, since the corresponding error is below α. This cached output is introduced in the model using the properties of the residual connection.](assets/SmoothCache2.png)\n\n## Quick Start\n\n### Install\n```bash\npip install dit-smoothcache\n```\n\n### Usage - Inference\n\nInspired by [DeepCache](https://raw.githubusercontent.com/horseee/DeepCache), we have implemented drop-in SmoothCache helper classes that easily applies to [Huggingface Diffuser DiTPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/dit), and [original DiT implementations](https://github.com/facebookresearch/DiT).\n\nGenerally, only 3 additional lines needs to be added to the original sampler scripts:\n```python\nfrom SmoothCache import \u003cDESIREDCacheHelper\u003e\ncache_helper = DiffuserCacheHelper(\u003cMODEL_HANDLER\u003e, schedule=schedule)\ncache_helper.enable()\n# Original sampler code.\ncache_helper.disable()\n```\n\n#### Usage example with Huggingface Diffuser DiTPipeline:\n```python\nimport json\nimport torch\nfrom diffusers import DiTPipeline, DPMSolverMultistepScheduler\n\n# Import SmoothCacheHelper\nfrom SmoothCache import DiffuserCacheHelper  \n\n# Load the DiT pipeline and scheduler\npipe = DiTPipeline.from_pretrained(\"facebook/DiT-XL-2-256\", torch_dtype=torch.float16)\npipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)\npipe = pipe.to(\"cuda\")\n\n# Initialize the DiffuserCacheHelper with the model\nwith open(\"smoothcache_schedules/50-N-3-threshold-0.35.json\", \"r\") as f:\n    schedule = json.load(f)\ncache_helper = DiffuserCacheHelper(pipe.transformer, schedule=schedule)\n\n# Enable the caching helper\ncache_helper.enable()\n# Prepare the input\nwords = [\"Labrador retriever\"]\nclass_ids = pipe.get_label_ids(words)\n\n# Generate images with the pipeline\ngenerator = torch.manual_seed(33)\nimage = pipe(class_labels=class_ids, num_inference_steps=50, generator=generator).images[0]\n\n# Restore the original forward method and disable the helper\n# disable() should be paired up with enable() \ncache_helper.disable()\n```\n\n#### Usage example with original DiT implementation\n```python\nimport torch\n\ntorch.backends.cuda.matmul.allow_tf32 = True\ntorch.backends.cudnn.allow_tf32 = True\nfrom torchvision.utils import save_image\nfrom diffusion import create_diffusion\nfrom diffusers.models import AutoencoderKL\nfrom download import find_model\nfrom models import DiT_models\nimport argparse\nfrom SmoothCache import DiTCacheHelper  # Import DiTCacheHelper\nimport json\n\n# Setup PyTorch:\ntorch.manual_seed(args.seed)\ntorch.set_grad_enabled(False)\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\nif args.ckpt is None:\n    assert (\n        args.model == \"DiT-XL/2\"\n    ), \"Only DiT-XL/2 models are available for auto-download.\"\n    assert args.image_size in [256, 512]\n    assert args.num_classes == 1000\n\n# Load model:\nlatent_size = args.image_size // 8\nmodel = DiT_models[args.model](\n    input_size=latent_size, num_classes=args.num_classes\n).to(device)\nckpt_path = args.ckpt or f\"DiT-XL-2-{args.image_size}x{args.image_size}.pt\"\nstate_dict = find_model(ckpt_path)\nmodel.load_state_dict(state_dict)\nmodel.eval()  # important!\nwith open(\"smoothcache_schedules/50-N-3-threshold-0.35.json\", \"r\") as f:\n    schedule = json.load(f)\ncache_helper = DiTCacheHelper(model, schedule=schedule)\n\n# number of timesteps should be consistent with provided schedules\ndiffusion = create_diffusion(str(len(schedule[cache_helper.components_to_wrap[0]])))\n\n# Enable the caching helper\ncache_helper.enable()\n\n# Sample images:\nsamples = diffusion.p_sample_loop(\n    model.forward_with_cfg,\n    z.shape,\n    z,\n    clip_denoised=False,\n    model_kwargs=model_kwargs,\n    progress=True,\n    device=device,\n)\nsamples, _ = samples.chunk(2, dim=0)  # Remove null class samples\nsamples = vae.decode(samples / 0.18215).sample\n\n# Disable the caching helper after sampling\ncache_helper.disable()\n# Save and display images:\nsave_image(samples, \"sample.png\", nrow=4, normalize=True, value_range=(-1, 1))\n```\n\n### Usage - Cache Schedule Generation\nSee [run_calibration.py](./examples/run_calibration.py), which generates schedule for the self-attention module ([attn1](https://github.com/huggingface/diffusers/blob/37a5f1b3b69ed284086fb31fb1b49668cba6c365/src/diffusers/models/attention.py#L380)) \nfrom Diffusers [BasicTransformerBlock](https://github.com/huggingface/diffusers/blob/37a5f1b3b69ed284086fb31fb1b49668cba6c365/src/diffusers/models/attention.py#L261C7-L261C28) block. \n\nNote that only self-attention, and not cross-attention, is enabled in the stock config of Diffusers [DiT module](https://github.com/huggingface/diffusers/blob/37a5f1b3b69ed284086fb31fb1b49668cba6c365/src/diffusers/models/transformers/dit_transformer_2d.py#L72-L73). We leave this behavior\nas-is for the purpose of minimal intrusion. \n\nWe welcome all contributions aimed at expending SmoothCache's model coverage and module coverage. \n\n## Visualization\n\n### 256x256 Image Generation Task\n\n![Mosaic Image](assets/dit-mosaic.png)\n\n\n\n## Evaluation\n\n### Image Generation with DiT-XL/2-256x256\n![Table 1. Results For DiT-XL-256x256 on using DDIM Sampling.\nNote that L2C is not training free](assets/table1.png)\n\n### Video Generation with OpenSora\n![Table 2. Results For OpenSora on Rectified Flow](assets/table2.png)\n\n### Audio Generation with Stable Audio Open\n![Table 3. Results For Stable Audio Open on DPMSolver++(3M) SDE on 3 datasets](assets/table3.png)\n\n\n# License\nSmoothCache is licensed under the [Apache-2.0](LICENSE) license.\n\n## Bibtex\n```\n@misc{liu2024smoothcacheuniversalinferenceacceleration,\n      title={SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers}, \n      author={Joseph Liu and Joshua Geddes and Ziyu Guo and Haomiao Jiang and Mahesh Kumar Nandwana},\n      year={2024},\n      eprint={2411.10510},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2411.10510}, \n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRoblox%2FSmoothCache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRoblox%2FSmoothCache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRoblox%2FSmoothCache/lists"}