https://github.com/nus-hpc-ai-lab/speed

SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
https://github.com/nus-hpc-ai-lab/speed

Last synced: about 1 month ago
JSON representation

SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Host: GitHub
URL: https://github.com/nus-hpc-ai-lab/speed
Owner: NUS-HPC-AI-Lab
License: apache-2.0
Created: 2024-03-05T05:25:54.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2025-01-27T00:48:14.000Z (8 months ago)
Last Synced: 2025-05-07T20:17:55.317Z (5 months ago)
Language: Python
Homepage:
Size: 65.9 MB
Stars: 178
Watchers: 8
Forks: 7
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

awesome-diffusion-categorized - [Code

README

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

If you like SpeeD, please give us a star ⭐ on GitHub for the latest update.

### [Paper](https://arxiv.org/pdf/2405.17403) | [Project Page](https://bdemo.github.io/SpeeD/) | [Hugging Face]()

This repository contains the code and implementation details for the research paper titled "A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training." In this paper, SpeeD, a novel speed-up method for diffusion model training, is introduced.

## Authors

- [Kai Wang](https://kaiwang960112.github.io/)², Yukun Zhou^1,2, [Mingjia Shi](https://www.samjs.online/)², [Zekai Li](https://lizekai-richard.github.io/)², [Zhihang Yuan](https://zhihang.cc/)³, [Yuzhang Shang](https://42shawn.github.io/)⁴, [Xiaojiang Peng*](https://pengxj.github.io/)¹, [Hanwang Zhang](https://personal.ntu.edu.sg/hanwangzhang/)⁵, [Yang You](https://www.comp.nus.edu.sg/~youy/)²
- ¹[Shenzhen Technology University](https://english.sztu.edu.cn/), ²[National University of Singapore](https://nus.edu.sg/), ³[Infinigence-AI](https://cloud.infini-ai.com/), ⁴[Illinois Institute of Technology](https://www.iit.edu/), and ⁵[Nanyang Technological University](https://www.ntu.edu.sg/) [Kai, Yukun, and Mingjia contribute equally to this work. We will update this repo asap.]

## 😮 Highlights

Our method, which is easily compatible, can accelerate the training of diffusion model.

![comparision](visuals/accel.png)

## ✒️ Motivation

Inspired by the following observation on time steps, we propose the re-sampling + re-weighting strategy as shown below.

To take a closer look at time steps, we find that the time steps could be divided into three areas: acceleration, decceleration and convergence areas. Samples of the corresponding time step in the convergence area are of limited benefit to training, while these time steps take up the most. Empirically, the training losses of these samples are quite low compare to the ones of the other two areas.
![motivation](visuals/Findings.png)

Asymmetric Sampling: Suppress the attendance of the time step in convergence areas.

Change-Aware Weighting: The faster changing time steps in the diffusion process are given more weight.
![method](visuals/Method.png)

## 🛠️ Requirements and Installation

This code base does not use hardware acceleration technology, experimental environment is not complicated.

You can create a new conda environment:

```
conda env create -f environment.yml
conda activate speed
```

or install the necessary package by:

```
pip install -r requirements.txt
```

If necessary, we will provide more methods (e.g., docker) to facilitate the configuration of the experimental environment.

## 🗝️ Tutorial

We provide a complete process for generating tasks including **training**, **inference** and **test**. The current code is only compatible with class-conditional image generation tasks. We will be compatible with more generation tasks about diffusion in the future.

We refactor the [facebookresearch/DiT](https://github.com/facebookresearch/DiT) code and loaded the configs using [OmegaConf ](https://omegaconf.readthedocs.io/en/2.3_branch/). The configuration file loading rule is recursive for easier argument modification. Simply put, the file in the latter path will override the previous setting of **base.yaml**.

You can modify the experiment setting by modifying the config file and the command line. More details about the reading of config are written in [configs/README.md](https://github.com/kaiwang960112/SpeeD/blob/master/configs/README.md).

For each experiment, you must provide two arguments by command,

```
-c: config path;
-p: phase including ['train', 'inference', 'sample'].
```

### Train & inference

**Baseline**

Class-conditional image generation task with 256x256 ImageNet dataset and DiT-XL/2 models.

```bash
# Training: training diffusion and saving checkpoints
torchrun --nproc_per_node=8 main.py -c configs/image/imagenet_256/base.yaml -p train
# inference: generating samples for testing
torchrun --nproc_per_node=8 main.py -c configs/image/imagenet_256/base.yaml -p inference
# sample: sample some images for visualization
python main.py -c configs/image/imagenet_256/base.yaml -p sample
```

**Ablation**

You can modify the experiment setting by modifying the config file and the command line. More details about the configs are in [configs/README.md](https://github.com/kaiwang960112/SpeeD/blob/master/configs/README.md).

For example, change the classifier-free guidance scale in sampling by command line:

```
python main.py -c configs/image/imagenet_256/base.yaml -p sample guidance_scale=1.5
```

### Test

Test the generation tasks require the results of inference. The more details about testing in [evaluations](https://github.com/kaiwang960112/SpeeD/tree/master/evaluations).

## 🔒 License

The majority of this project is released under the Apache 2.0 license as found in the [LICENSE](https://github.com/PKU-YuanGroup/Video-LLaVA/blob/main/LICENSE) file.

## ✏️Citation

If you find our code useful in your research, please consider giving a star ⭐ and citation 📝.

```
@article{wang2024closer,
title={A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training},
author={Kai Wang, Mingjia Shi, Yukun Zhou, Zekai Li, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang and Yang You},
year={2024},
journal={arXiv preprint arXiv:2405.17403},
}
```

## 👍 Acknowledgement

We thank Tianyi Li, Yuchen Zhang, Yuxin Li, Zhaoyang Zeng, and Yanqing Liu for the comments on this work. Kai Wang (idea, writing, story, presentation), Yukun Zhou (implementation), and Mingjia Shi (theory, writing, presentation) contribute equally to this work. Xiaojiang Peng, Hanwang Zhang, and Yang You are equal advising. Xiaojiang Peng is the corresponding author.

We are grateful for the following exceptional work and generous contribution to open source.

* [DiT](https://github.com/facebookresearch/DiT): Scalable Diffusion Models with Transformers.
* [Open-Sora](https://github.com/hpcaitech/Open-Sora/tree/main) : Open-Sora: Democratizing Efficient Video Production for All
* [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT): An acceleration for DiT training. We adopt valuable acceleration strategies for training progress from OpenDiT.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nus-hpc-ai-lab/speed

Awesome Lists containing this project

README

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

If you like SpeeD, please give us a star ⭐ on GitHub for the latest update.