{"id":27962952,"url":"https://github.com/modeltc/harmonica","last_synced_at":"2025-08-13T23:08:17.260Z","repository":{"id":290955379,"uuid":"976079078","full_name":"ModelTC/HarmoniCa","owner":"ModelTC","description":"[ICML 2025] This is the official PyTorch implementation of \"HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration\".","archived":false,"fork":false,"pushed_at":"2025-07-10T21:22:59.000Z","size":8315,"stargazers_count":39,"open_issues_count":1,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-07-10T23:39:35.532Z","etag":null,"topics":["acceleration","diffusion-models","diffusion-transformer","dit","feature-caching","icml","icml-2025","pixart","pixart-sigma"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2410.01723","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ModelTC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-01T13:03:28.000Z","updated_at":"2025-07-10T21:23:02.000Z","dependencies_parsed_at":"2025-05-07T19:57:50.044Z","dependency_job_id":"ae1cf161-c8ae-4ac9-9937-3b0d39d2853f","html_url":"https://github.com/ModelTC/HarmoniCa","commit_stats":null,"previous_names":["modeltc/harmonica"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ModelTC/HarmoniCa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FHarmoniCa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FHarmoniCa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FHarmoniCa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FHarmoniCa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ModelTC","download_url":"https://codeload.github.com/ModelTC/HarmoniCa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FHarmoniCa/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270330595,"owners_count":24565816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-13T02:00:09.904Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acceleration","diffusion-models","diffusion-transformer","dit","feature-caching","icml","icml-2025","pixart","pixart-sigma"],"created_at":"2025-05-07T19:57:46.223Z","updated_at":"2025-08-13T23:08:17.254Z","avatar_url":"https://github.com/ModelTC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\" style=\"font-family: charter;\"\u003e\n\u003ch1\u003e 🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration\u003c/h1\u003e\n\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\u0026nbsp;\n[![arXiv](https://img.shields.io/badge/HarmoniCa-2410.01723-b31b1b)](https://arxiv.org/pdf/2410.01723)\u0026nbsp;\n[![GitHub Stars](https://img.shields.io/github/stars/ModelTC/HarmoniCa.svg?style=social\u0026label=Star\u0026maxAge=60)](https://github.com/ModelTC/HarmoniCa)\u0026nbsp;\n\n**[ [Conference Paper](https://arxiv.org/abs/2410.01723) | [Slides](assets/slides.pdf) | [Poster](assets/poster.pdf) ]**\n\n[Yushi Huang*](https://github.com/Harahan), [Zining Wang*](https://scholar.google.com/citations?user=hOXoacgAAAAJ\u0026hl=en), [Ruihao Gong📧](https://xhplus.github.io/), [Jing Liu](https://jing-liu.com/), [Xinjie Zhang](https://xinjie-q.github.io/), [Jinyang Guo](https://jinyangguo.github.io/), [Xianglong Liu](https://xlliu-beihang.github.io/), [Jun Zhang📧](https://eejzhang.people.ust.hk/)\n\n(* denotes equal contribution, 📧 denotes corresponding author.)\n\n\u003c/div\u003e\n\nThis is the official implementation of our paper [HarmoniCa](https://arxiv.org/pdf/2410.01723), a novel training-based framework that achieves a new state-of-the-art result in block-wise caching of diffusion transformers. It achieves over 40% latency reduction (*i.e.*, $2.07\\times$ theoretical speedup) and improved performance on PixArt- $\\alpha$. Remarkably, our *image-free* approach reduces training time by 25\\% compared with the previous method.\n\n\u003cdiv align=center\u003e\n\t\u003cfigure class=\"second\"\u003e\n\t    \u003cimg src=\"./img/DiT.png\" width=\"440\"/\u003e\u003cimg src=\"./img/pixart.png\" width=\"270\"/\u003e\n\t\u003c/figure\u003e\n\t\n\u003ch align=\"justify\"\u003e(Left) Generation comparison on DiT-XL/2 $256\\times256$. (Right) Generation results on PixArt- $\\Sigma$ $2048\\times2048$. HarmoniCa shows nearly lossless $1.44\\times$ and $1.73\\times$ acceleration for the above models, respectively. \n\u003c/h\u003e\n\u003c/div\u003e\n\n## :fire: News\n\n* **May 03, 2025**: 🔥 We release our Python code for DiT-XL/2 presented in our paper. Have a try!\n\n* **May 01, 2025**: 🌟 Our paper has been accepted by ICML 2025! 🎉 Cheers!\n\n\n## 📖 Overview\n\n\u003cdiv align=\"center\" style=\"font-family: charter;\"\u003e\n\n\u003cimg src=./img/overview.png width=\"80%\"/\u003e\n\n\u003ch align=\"justify\"\u003e\u003cstrong\u003eOverview pipeline of the proposed HarmoniCa.\u003c/strong\u003e It first incorporates Step-Wise Denoising Training (SDT) to ensure the continuity of the denoising process, where prior steps can be leveraged. In addition, an Image Error Proxy-Guided Objective (IEPO) is applied to balance image quality against cache utilization through an efficient proxy to approximate the image error.\n\u003c/h\u003e\n\n\u003c/div\u003e    \n\n## ✨ Quick Start\n\nAfter cloning the repository, you can follow these steps to complete the model's training and inference process.\n\n### Requirements\n\nWith PyTorch (\u003e2.0) installed, execute the following command to install the  necessary packages and pre-trained models.\n\n```bash\npip install accelerate diffusers timm torchvision wandb\npython download.py\n```\n\n### Training\n\nWe'd like to provide the following example to train the model. More details about the training can be found in our paper.\n\n```bash\nexport CUDA_VISIBLE_DEVICES=0,1,2,3\ntorchrun --nnodes=1 --nproc_per_node=4 --master_port 12345 train_router.py --results-dir results --model DiT-XL/2 --image-size 256 --num-classes 1000 --epochs 2000 --global-batch-size 64 --global-seed 42 --vae ema --num-works 8 --log-every 100 --ckpt-every 1000 --wandb --num-sampling-steps 10 --l1 7e-8 --lr 0.01 --max-steps 20000 --cfg-scale 1.5 --ste-threshold 0.1 --lambda-c 500\n```\n\n### Inference\n\nHere is the corresponding command for inference.\n\n```bash\npython sample.py --model DiT-XL/2 --vae ema --image-size 256 --num-classes 1000 --cfg-scale 4 --num-sampling-steps 10 --seed 42 --accelerate-method dynamiclayer --ddim-sample --path Path/To/The/Trained/Router/ --thres 0.1\n```\n\n## 💪 TODO\n\n- [ ] Training and inference code for PixArt models.\n- [ ] Combination with quantization.\n\n## 🤝 Acknowledgments\n\nOur code was developed based on [DiT](https://github.com/facebookresearch/DiT) and [Learning-to-Cache](https://github.com/horseee/learning-to-cache).\n\n## ✏️ Citation\n\nIf you find our HarmoniCa useful or relevant to your research, please kindly cite our paper:\n\n```\n@inproceedings{\n    anonymous2025harmonica,\n    title={HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration},\n    author={Yushi Huang and Zining Wang and Ruihao Gong and Jing Liu and Xinjie Zhang and Jinyang Guo and Xianglong Liu and Jun Zhang},\n    booktitle={Forty-second International Conference on Machine Learning},\n    year={2025},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodeltc%2Fharmonica","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodeltc%2Fharmonica","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodeltc%2Fharmonica/lists"}