https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher
Owner: MachineLearningSystem
Fork: true (kazukiosawa/pipe-fisher)
Created: 2023-06-07T05:42:57.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-04-29T15:20:48.000Z (about 2 years ago)
Last Synced: 2024-11-07T08:42:09.744Z (8 months ago)
Size: 464 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-AI-system - PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices MLSYS'23

README

# PipeFisher

The implementation of pipeline-parallel training with K-FAC optimizer (PipeFisher) in PyTorch used in [PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices](https://arxiv.org/abs/2211.14133) (to appear at MLSys 2023).

## Setup

### Data preparation
https://github.com/microsoft/AzureML-BERT/blob/master/docs/dataprep.md

Please store `wikipedia.segmented.nltk.txt` file under the `bert_data/` directory.

### Installation
```
pip install -r requirements.txt
pip install asdfghjkl/
```
For training, we use `apex.optimizers.FusedLAMB` of [NVIDIA's Apex library](https://github.com/NVIDIA/apex). Please follow the [instruction](https://github.com/NVIDIA/apex#installation) for installing `apex`.

For profiling, we use [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems). Please make sure you can execute `nsys` command.

Our scripts are intended to run through the SLURM workload manager on a GPU cluster with 1 GPU per node.

## Training

Phase 1 pretraining of BERT-Base on the English Wikipedia by NVLAMB on 32 GPUs
```
sbatch scripts/train.sh
```

Phase 1 pretraining of BERT-Base on the English Wikipedia by K-FAC on 32 GPUs
```
sbatch scripts/train_kfac.sh
```

## Profiling

### Step 0. Profiling **Chimera** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_steps.sh
```
```
sh scripts/plot_cuda_timeline.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1.pdf`

### Step 1. Profiling **Chimera with K-FAC** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_kfac_steps.sh
```
```
sh scripts/plot_cuda_timeline_kfac.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac.pdf`

### Step 2. Automatic work assignments
```
sh scripts/auto_schedule.sh
```
output: `bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac_schedule.pickle`

### Step 3. Profiling **Chimera with PipeFisher** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_pipefisher_steps.sh
```
```
sh scripts/plot_cuda_timeline_pipefisher.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_pipefisher.pdf`

By changing the settings of each script, you can run training/profiling on other BERT models, pipeline methods, number of pipeline stages, number of GPUs, etc.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher

Awesome Lists containing this project

README