https://github.com/horseee/learning-to-cache

[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
https://github.com/horseee/learning-to-cache

diffusion-models efficient-inference

Last synced: 6 months ago
JSON representation

[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

Host: GitHub
URL: https://github.com/horseee/learning-to-cache
Owner: horseee
Created: 2024-06-03T04:04:32.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-07-15T03:49:52.000Z (12 months ago)
Last Synced: 2025-01-08T14:30:06.324Z (6 months ago)
Topics: diffusion-models, efficient-inference
Language: Python
Homepage:
Size: 5.32 MB
Stars: 88
Watchers: 2
Forks: 4
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

        # Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching



  

  


  

      (Results on DiT-XL/2 and U-ViT-H/2) 

  






> **Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching**   🥯[[Arxiv]](https://arxiv.org/abs/2406.01733)    

> [Xinyin Ma](https://horseee.github.io/), [Gongfan Fang](https://fangggf.github.io/), [Michael Bi Mi](), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)   

> [Learning and Vision Lab](http://lv-nus.org/), National University of Singapore, Huawei Technologies Ltd  

## Introduction

We introduce a novel scheme, named **L**earning-to-**C**ache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. A router is optimized to decide the layers to be cached. 



  

  


  

      (Changes in the router for U-ViT when optimizing across different layers (x-axis) over all steps (y-axis). The white indicates the layer is activated, while the black indicates it is disabled.) 

  



**Some takeaways**:

1. A large proportion of layers in the diffusion transformer can be removed, without updating the model parameters.

   - In U-ViT-H/2, up to 93.68% of the layers in the cache steps (46.84% for all steps) can be removed, with less than 0.01 drop in FID. 

2. L2C largely outperforms samplers such as DDIM and DPM-Solver. 



  

  

  


  

      (Comparison with Baselines. Left: DiT-XL/2. Right: U-ViT-H/2)

  



## Checkpoint for Routers

| Model | NFE | Checkpoint |

| -- | -- | -- |

| DiT-XL/2 |  50 | [link](DiT/ckpt/DDIM50_router.pt) |

| DiT-XL/2 |  20 | [link](DiT/ckpt/DDIM20_router.pt) |

| U-ViT-H/2 |  50 | [link](U-ViT/ckpt/dpm50_router.pth) |

| U-ViT-H/2 |  20 | [link](U-ViT/ckpt/dpm20_router.pth)|

## Code

We implement Learning-to-Cache on two basic structures: DiT and U-ViT. Check the instructions below:

1. DiT: [README](https://github.com/horseee/learning-to-cache/tree/main/DiT#learning-to-cache-for-dit)

2. U-ViT: [README](https://github.com/horseee/learning-to-cache/blob/main/U-ViT/readme.md)

## Citation

```

@misc{ma2024learningtocache,

      title={Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching}, 

      author={Xinyin Ma and Gongfan Fang and Michael Bi Mi and Xinchao Wang},

      year={2024},

      eprint={2406.01733},

      archivePrefix={arXiv},

      primaryClass={cs.LG}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/horseee/learning-to-cache

Awesome Lists containing this project

README