https://github.com/huggingface/picotron_tutorial

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/huggingface/picotron_tutorial
Owner: huggingface
Created: 2024-11-15T12:36:09.000Z (8 months ago)
Default Branch: master
Last Pushed: 2024-12-20T12:57:11.000Z (6 months ago)
Last Synced: 2024-12-25T02:06:33.520Z (6 months ago)
Language: Python
Size: 340 KB
Stars: 25
Watchers: 4
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Picotron tutorial

A step by step tutorial on how to build [Picotron]((https://github.com/huggingface/picotron)) distributed training framework form scratch 🔥

## Videos

> More to come. Full playlist [here](https://www.youtube.com/playlist?list=PL-_armZiJvAnhcRr6yTJ0__f3Oi-LLi9S) 🎬

- 🎬 [[Picotron tutorial] Part 1: Model, Process Group Manager, Dataloader](https://youtu.be/u2VSwDDpaBM)

## Setup

```
conda create -n env-picotron-tutorial python=3.10 --y
conda activate env-picotron-tutorial
pip install -e .
```

## Sanity check

- Convergence testing on a Llama 1B on 4096000 tokens to see if loss match.

![](assets/llama1B_sanity_check.png)

```bash
# Basline
cd step3_dataloader/
torchrun --nproc_per_node 1 train.py --micro_batch_size 4 --gradient_accumulation_steps 8 --seq_len 1024 --max_tokens 4096000 --num_proc 16 --model_name TinyLlama/TinyLlama_v1.1 --num_hidden_layers 22 --num_attention_heads 32 --num_key_value_heads 4 --run_name baseline_1B --use_wandb

# Tensor Parallel
cd step4_tensor_parallel/
torchrun --nproc_per_node 4 train.py --tp_size 4 --micro_batch_size 4 --gradient_accumulation_steps 8 --seq_len 1024 --max_tokens 4096000 --num_proc 16 --model_name TinyLlama/TinyLlama_v1.1 --num_hidden_layers 22 --num_attention_heads 32 --num_key_value_heads 4 --run_name tp_1B --use_wandb

# Data Parallel
cd step6_data_parallel_bucket/
torchrun --nproc_per_node 4 train.py --dp_size 4 --micro_batch_size 1 --gradient_accumulation_steps 8 --seq_len 1024 --max_tokens 4096000 --num_proc 16 --model_name TinyLlama/TinyLlama_v1.1 --num_hidden_layers 22 --num_attention_heads 32 --num_key_value_heads 4 --run_name dp_bucket_1B --use_wandb

# Pipeline Parallel
cd step8_pipeline_parallel_1f1b/
torchrun --nproc_per_node 4 train.py --pp_size 4 --pp_engine 1f1b --micro_batch_size 4 --gradient_accumulation_steps 8 --seq_len 1024 --max_tokens 4096000 --num_proc 16 --model_name TinyLlama/TinyLlama_v1.1 --num_hidden_layers 22 --num_attention_heads 32 --num_key_value_heads 4 --run_name pp_1f1b_1B --use_wandb

# 3D parallelism (Tensor + Data + Pipeline parallel)
torchrun --nproc_per_node 8 train.py --tp_size 2 --pp_size 2 --pp_engine 1f1b --dp_size 2 --micro_batch_size 2 --gradient_accumulation_steps 8 --seq_len 1024 --max_tokens 4096000 --num_proc 16 --model_name TinyLlama/TinyLlama_v1.1 --num_hidden_layers 22 --num_attention_heads 32 --num_key_value_heads 4 --run_name 3D_parallelism_1B --use_wandb
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/huggingface/picotron_tutorial

Awesome Lists containing this project

README