https://github.com/iamncj/yuangpt
GPT-like Large Language Model Pretrained on Inspur's Yuan Dataset
https://github.com/iamncj/yuangpt
gpt gpt-2 large-language-models llm mlsys pytorch
Last synced: 8 months ago
JSON representation
GPT-like Large Language Model Pretrained on Inspur's Yuan Dataset
- Host: GitHub
- URL: https://github.com/iamncj/yuangpt
- Owner: iamNCJ
- License: mit
- Created: 2022-01-29T11:00:27.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-02-15T17:21:57.000Z (about 3 years ago)
- Last Synced: 2025-05-16T23:11:29.940Z (12 months ago)
- Topics: gpt, gpt-2, large-language-models, llm, mlsys, pytorch
- Language: Python
- Homepage:
- Size: 563 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# YuanGenerativeLM
Generative Language Model Pretrained on Inspur's Yuan Dataset, codebase for ASC22 supercomputing competition
## Project Structure
To simplify experiments on different distributed training frameworks, we decoupled the training code into `config`, `data`, `model` and `trainer` modules.
The idea of this decoupling is inspired by pytorch-lightning, however we decoupled it even further to make it more flexible when integrating with other frameworks.
### `config` Module
We put all hyperparameters and configurations into `config` module for better tracing and logging.
### `data` Module
We directly use `pytorch-lightning.LightningDataModule` since it's interface is well-designed and easy to use.
### `model` Module
Since most distributed training framework need to wrap the model before or after model initialization, and `pytorch-lightning.LightningModule` has already exposed some problem in integrating multiple frameworks simultaneously, we decide to further decouple this module into `BaseModel` class.
The `BaseModel` directly inherits `nn.Module`, which is the compatible for most of the distributed training frameworks. All implementations of the language model are derived from `BaseModel` and maintain only the model config, the model structure, the forward method, the loss function and the optimizer.
Currently, implemented models include:
- native model: written in native pytorch
- huggingface model: written in HuggingFace's transformers
### `trainer` Module
Now we put everything else like model initialization, training, validation and testing into `trainer` module. All training preparation and iterations are done here.
Currently, implemented trainers include:
- PytorchLightning trainer: distributed training with pytorch-lightning, with deepspeed integration provided by the lightning team
- PatrickStar Trainer
## Distributed Launch
Below are examples of how to launch the training job on different distributed frameworks.
### DDP in PyTorch-Lightning
`num_nodes` must be set to number of GPUs in all nodes, otherwise it will use the number of GPUs in the master node.
```sh
torchrun --nnodes=2 --nproc_per_node=2 --master_addr GPU04 --master_port 9001 --node_rank 1 train.ddp_pl.py
```
### DeepSpeed in PyTorch-Lightning
```sh
OMP_NUM_THREADS=32 torchrun --nnodes=2 --nproc_per_node=2 --master_addr GPU04 --master_port 9001 --node_rank 1 train.ds_pl.py
```
Note that `OMP_NUM_THREADS` is a must when offload is used, since Optimizer now runs on CPU.
### Horovod in PyTorch-Lightning
```sh
horovodrun -np 2 python train.hvd_pl.py
```
We still prefer to use `torchrun`
### PatrickStar
```sh
torchrun --nnodes=1 --nproc_per_node=2 train.pstar.py
```
### Colossal AI
```sh
GLOO_SOCKET_IFNAME=ibs5 OMP_NUM_THREADS=32 torchrun --master_addr="172.25.2.105" --master_port=29500 --nnodes=2 --node_rank=1 --nproc_per_node=2 train.col_ai.py --config=trainer/colossal_ai/strategy.py
```
## Run Profile
```sh
OMP_NUM_THREADS=32 nsys profile -o cpu_adam torchrun --nnodes=2 --nproc_per_node=2 --master_addr GPU04 --master_port 9001 --node_rank 0 train.ds_pl.py
OMP_NUM_THREADS=32 nsys profile --gpu-metrics-device=all --gpuctxsw=true --nic-metrics=true --cuda-memory-usage=true --cudabacktrace=all torchrun --nnodes=2 --nproc_per_node=2 train.col_ai.py --config=trainer/colossal_ai/strategy.py
```
## Docker Environment
```sh
docker run -it --name pytorch --gpus all --privileged --cap-add=SYS_ADMIN --ipc=host --network=host --ulimit memlock=-1 --ulimit stack=67108864 --device=/dev/infiniband -v $(pwd):/workspace registry.cn-hangzhou.aliyuncs.com/ncj/pytorch bash
```
Check details in [Dockerfile](./Dockerfile)