Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher


https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

        

# PipeFisher

The implementation of pipeline-parallel training with K-FAC optimizer (PipeFisher) in PyTorch used in [PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices](https://arxiv.org/abs/2211.14133) (to appear at MLSys 2023).

## Setup

### Data preparation
https://github.com/microsoft/AzureML-BERT/blob/master/docs/dataprep.md

Please store `wikipedia.segmented.nltk.txt` file under the `bert_data/` directory.

### Installation
```
pip install -r requirements.txt
pip install asdfghjkl/
```
For training, we use `apex.optimizers.FusedLAMB` of [NVIDIA's Apex library](https://github.com/NVIDIA/apex). Please follow the [instruction](https://github.com/NVIDIA/apex#installation) for installing `apex`.

For profiling, we use [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems). Please make sure you can execute `nsys` command.

Our scripts are intended to run through the SLURM workload manager on a GPU cluster with 1 GPU per node.

## Training

Phase 1 pretraining of BERT-Base on the English Wikipedia by NVLAMB on 32 GPUs
```
sbatch scripts/train.sh
```

Phase 1 pretraining of BERT-Base on the English Wikipedia by K-FAC on 32 GPUs
```
sbatch scripts/train_kfac.sh
```

image

## Profiling

### Step 0. Profiling **Chimera** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_steps.sh
```
```
sh scripts/plot_cuda_timeline.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1.pdf`

### Step 1. Profiling **Chimera with K-FAC** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_kfac_steps.sh
```
```
sh scripts/plot_cuda_timeline_kfac.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac.pdf`

### Step 2. Automatic work assignments
```
sh scripts/auto_schedule.sh
```
output: `bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac_schedule.pickle`

### Step 3. Profiling **Chimera with PipeFisher** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_pipefisher_steps.sh
```
```
sh scripts/plot_cuda_timeline_pipefisher.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_pipefisher.pdf`

image

By changing the settings of each script, you can run training/profiling on other BERT models, pipeline methods, number of pipeline stages, number of GPUs, etc.