https://github.com/shenggan/atp
Adaptive Tensor Parallelism for Foundation Models
https://github.com/shenggan/atp
attention distributed-training gpt large-model model-parallelism pytorch transformer
Last synced: 10 months ago
JSON representation
Adaptive Tensor Parallelism for Foundation Models
- Host: GitHub
- URL: https://github.com/shenggan/atp
- Owner: Shenggan
- License: mit
- Created: 2022-10-31T06:15:27.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-15T11:17:17.000Z (over 3 years ago)
- Last Synced: 2024-01-29T20:34:33.976Z (over 2 years ago)
- Topics: attention, distributed-training, gpt, large-model, model-parallelism, pytorch, transformer
- Language: Python
- Homepage:
- Size: 3.22 MB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# ATP: Adaptive Tensor Parallelism
Adaptive Tensor Parallelism for Large Model Traning and Inference
ATP provides a high-performance implementation of Topology-aware Tensor Parallelism with the following characteristics.
1. Two-Level Search Space for Tensor Parallelism.
2. Adaptive Tensor Parallelism with Hierarchical Communication Matrix.
3. Chunk-based Communication-Computation Overlapping.
4. An estimator that helps study the performance of ATP on networks with different topologies.
## Installation
To install ATP, you will need:
+ Python 3.8 or 3.9.
+ PyTorch 1.13
+ SPMD from [pytorch/tau](https://github.com/pytorch/tau)
```
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install git+https://github.com/pytorch/tau.git@89700fd
```
## Usage