https://github.com/modeltc/harmonica
[ICML 2025] This is the official PyTorch implementation of "HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration".
https://github.com/modeltc/harmonica
acceleration diffusion-models diffusion-transformer dit feature-caching icml icml-2025 pixart pixart-sigma
Last synced: 2 months ago
JSON representation
[ICML 2025] This is the official PyTorch implementation of "HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration".
- Host: GitHub
- URL: https://github.com/modeltc/harmonica
- Owner: ModelTC
- License: apache-2.0
- Created: 2025-05-01T13:03:28.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-07-10T21:22:59.000Z (3 months ago)
- Last Synced: 2025-07-10T23:39:35.532Z (3 months ago)
- Topics: acceleration, diffusion-models, diffusion-transformer, dit, feature-caching, icml, icml-2025, pixart, pixart-sigma
- Language: Python
- Homepage: https://arxiv.org/pdf/2410.01723
- Size: 7.93 MB
- Stars: 39
- Watchers: 5
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
π΅ HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
[](https://opensource.org/licenses/Apache-2.0)Β
[](https://arxiv.org/pdf/2410.01723)Β
[](https://github.com/ModelTC/HarmoniCa)Β**[ [Conference Paper](https://arxiv.org/abs/2410.01723) | [Slides](assets/slides.pdf) | [Poster](assets/poster.pdf) ]**
[Yushi Huang*](https://github.com/Harahan), [Zining Wang*](https://scholar.google.com/citations?user=hOXoacgAAAAJ&hl=en), [Ruihao Gongπ§](https://xhplus.github.io/), [Jing Liu](https://jing-liu.com/), [Xinjie Zhang](https://xinjie-q.github.io/), [Jinyang Guo](https://jinyangguo.github.io/), [Xianglong Liu](https://xlliu-beihang.github.io/), [Jun Zhangπ§](https://eejzhang.people.ust.hk/)
(* denotes equal contribution, π§ denotes corresponding author.)
This is the official implementation of our paper [HarmoniCa](https://arxiv.org/pdf/2410.01723), a novel training-based framework that achieves a new state-of-the-art result in block-wise caching of diffusion transformers. It achieves over 40% latency reduction (*i.e.*, $2.07\times$ theoretical speedup) and improved performance on PixArt- $\alpha$. Remarkably, our *image-free* approach reduces training time by 25\% compared with the previous method.
![]()
(Left) Generation comparison on DiT-XL/2 $256\times256$. (Right) Generation results on PixArt- $\Sigma$ $2048\times2048$. HarmoniCa shows nearly lossless $1.44\times$ and $1.73\times$ acceleration for the above models, respectively.## :fire: News
* **May 03, 2025**: π₯ We release our Python code for DiT-XL/2 presented in our paper. Have a try!
* **May 01, 2025**: π Our paper has been accepted by ICML 2025! π Cheers!
## π Overview
Overview pipeline of the proposed HarmoniCa. It first incorporates Step-Wise Denoising Training (SDT) to ensure the continuity of the denoising process, where prior steps can be leveraged. In addition, an Image Error Proxy-Guided Objective (IEPO) is applied to balance image quality against cache utilization through an efficient proxy to approximate the image error.
## β¨ Quick Start
After cloning the repository, you can follow these steps to complete the model's training and inference process.
### Requirements
With PyTorch (>2.0) installed, execute the following command to install the necessary packages and pre-trained models.
```bash
pip install accelerate diffusers timm torchvision wandb
python download.py
```### Training
We'd like to provide the following example to train the model. More details about the training can be found in our paper.
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --nnodes=1 --nproc_per_node=4 --master_port 12345 train_router.py --results-dir results --model DiT-XL/2 --image-size 256 --num-classes 1000 --epochs 2000 --global-batch-size 64 --global-seed 42 --vae ema --num-works 8 --log-every 100 --ckpt-every 1000 --wandb --num-sampling-steps 10 --l1 7e-8 --lr 0.01 --max-steps 20000 --cfg-scale 1.5 --ste-threshold 0.1 --lambda-c 500
```### Inference
Here is the corresponding command for inference.
```bash
python sample.py --model DiT-XL/2 --vae ema --image-size 256 --num-classes 1000 --cfg-scale 4 --num-sampling-steps 10 --seed 42 --accelerate-method dynamiclayer --ddim-sample --path Path/To/The/Trained/Router/ --thres 0.1
```## πͺ TODO
- [ ] Training and inference code for PixArt models.
- [ ] Combination with quantization.## π€ Acknowledgments
Our code was developed based on [DiT](https://github.com/facebookresearch/DiT) and [Learning-to-Cache](https://github.com/horseee/learning-to-cache).
## βοΈ Citation
If you find our HarmoniCa useful or relevant to your research, please kindly cite our paper:
```
@inproceedings{
anonymous2025harmonica,
title={HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration},
author={Yushi Huang and Zining Wang and Ruihao Gong and Jing Liu and Xinjie Zhang and Jinyang Guo and Xianglong Liu and Jun Zhang},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
}
```