https://github.com/shenggan/atp

Adaptive Tensor Parallelism for Foundation Models
https://github.com/shenggan/atp

attention distributed-training gpt large-model model-parallelism pytorch transformer

Last synced: 10 months ago
JSON representation

Adaptive Tensor Parallelism for Foundation Models

Host: GitHub
URL: https://github.com/shenggan/atp
Owner: Shenggan
License: mit
Created: 2022-10-31T06:15:27.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-12-15T11:17:17.000Z (over 3 years ago)
Last Synced: 2024-01-29T20:34:33.976Z (over 2 years ago)
Topics: attention, distributed-training, gpt, large-model, model-parallelism, pytorch, transformer
Language: Python
Homepage:
Size: 3.22 MB
Stars: 7
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

![](./assert/banner.png)

# ATP: Adaptive Tensor Parallelism

Adaptive Tensor Parallelism for Large Model Traning and Inference

ATP provides a high-performance implementation of Topology-aware Tensor Parallelism with the following characteristics.

1. Two-Level Search Space for Tensor Parallelism.
2. Adaptive Tensor Parallelism with Hierarchical Communication Matrix.
3. Chunk-based Communication-Computation Overlapping.
4. An estimator that helps study the performance of ATP on networks with different topologies.

## Installation

To install ATP, you will need:

+ Python 3.8 or 3.9.
+ PyTorch 1.13
+ SPMD from [pytorch/tau](https://github.com/pytorch/tau)

```
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install git+https://github.com/pytorch/tau.git@89700fd
```

## Usage

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shenggan/atp

Awesome Lists containing this project

README