Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher
https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/MachineLearningSystem/23MLSYS-pipe-fisher
- Owner: MachineLearningSystem
- Fork: true (kazukiosawa/pipe-fisher)
- Created: 2023-06-07T05:42:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-04-29T15:20:48.000Z (over 1 year ago)
- Last Synced: 2024-08-02T19:33:16.768Z (4 months ago)
- Size: 464 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-AI-system - PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices MLSYS'23
README
# PipeFisher
The implementation of pipeline-parallel training with K-FAC optimizer (PipeFisher) in PyTorch used in [PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices](https://arxiv.org/abs/2211.14133) (to appear at MLSys 2023).
## Setup
### Data preparation
https://github.com/microsoft/AzureML-BERT/blob/master/docs/dataprep.mdPlease store `wikipedia.segmented.nltk.txt` file under the `bert_data/` directory.
### Installation
```
pip install -r requirements.txt
pip install asdfghjkl/
```
For training, we use `apex.optimizers.FusedLAMB` of [NVIDIA's Apex library](https://github.com/NVIDIA/apex). Please follow the [instruction](https://github.com/NVIDIA/apex#installation) for installing `apex`.For profiling, we use [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems). Please make sure you can execute `nsys` command.
Our scripts are intended to run through the SLURM workload manager on a GPU cluster with 1 GPU per node.
## Training
Phase 1 pretraining of BERT-Base on the English Wikipedia by NVLAMB on 32 GPUs
```
sbatch scripts/train.sh
```Phase 1 pretraining of BERT-Base on the English Wikipedia by K-FAC on 32 GPUs
```
sbatch scripts/train_kfac.sh
```## Profiling
### Step 0. Profiling **Chimera** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_steps.sh
```
```
sh scripts/plot_cuda_timeline.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1.pdf`### Step 1. Profiling **Chimera with K-FAC** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_kfac_steps.sh
```
```
sh scripts/plot_cuda_timeline_kfac.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac.pdf`### Step 2. Automatic work assignments
```
sh scripts/auto_schedule.sh
```
output: `bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac_schedule.pickle`### Step 3. Profiling **Chimera with PipeFisher** with 8 stages for BERT-Large on 8 GPUs
```
sbatch scripts/prof_pipefisher_steps.sh
```
```
sh scripts/plot_cuda_timeline_pipefisher.sh
```
output: `bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_pipefisher.pdf`By changing the settings of each script, you can run training/profiling on other BERT models, pipeline methods, number of pipeline stages, number of GPUs, etc.