https://github.com/deepseek-ai/DualPipe
https://github.com/deepseek-ai/DualPipe
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/deepseek-ai/DualPipe
- Owner: deepseek-ai
- License: mit
- Created: 2025-02-26T13:29:57.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-02-27T02:42:34.000Z (about 2 months ago)
- Last Synced: 2025-02-27T03:20:52.981Z (about 2 months ago)
- Language: Python
- Size: 85.9 KB
- Stars: 690
- Watchers: 6
- Forks: 35
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-LLM-Inference - **DualPipe** - ai/DualPipe.svg?style=social) |⭐️⭐️ | (📖Contents / 📖DeepSeek/Multi-head Latent Attention(MLA) ([©️back👆🏻](#paperlist)))
- awesome-llm-inference - **DualPipe** - ai/DualPipe.svg?style=social) |⭐️⭐️ | (📖Contents / 📖DeepSeek/Multi-head Latent Attention(MLA) ([©️back👆🏻](#paperlist)))
README
# DualPipe
DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the [DeepSeek-V3 Technical Report](https://arxiv.org/pdf/2412.19437). It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the [profile data](https://github.com/deepseek-ai/profile-data).
### Schedules

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions.
The micro-batches in the reverse direction are symmetric to those in the forward direction, so
we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border
have mutually overlapped computation and communication### Pipeline Bubbles and Memory Usage Comparison
| Method | Bubble | Parameter | Activation |
|-------------|---------------------------------|-----------|------------|
| 1F1B | (PP-1)(𝐹+𝐵) | 1× | PP |
| ZB1P | (PP-1)(𝐹+𝐵-2𝑊) | 1× | PP |
| DualPipe | (PP/2-1)(𝐹&𝐵+𝐵-3𝑊) | 2× | PP+1 |𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of a
full backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵
denotes the execution time of two mutually overlapped forward and backward chunks.## Quick Start
The usage is shown in the following example:
```bash
python example.py
```Note: For real-world applications, you will need to implement a custom `overlapped_forward_backward` method tailored to your specific module.
## Requirements
- PyTorch 2.0 and above
## Developers
DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.
## Citation
```bibtex
@misc{deepseekai2024deepseekv3technicalreport,
title={DeepSeek-V3 Technical Report},
author={DeepSeek-AI},
year={2024},
eprint={2412.19437},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.19437},
}
```