Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/volcengine/veGiantModel
https://github.com/volcengine/veGiantModel
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/volcengine/veGiantModel
- Owner: volcengine
- License: apache-2.0
- Created: 2021-12-10T06:58:53.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-08-17T04:07:25.000Z (over 1 year ago)
- Last Synced: 2024-08-01T17:21:17.329Z (5 months ago)
- Language: Python
- Size: 14.8 MB
- Stars: 200
- Watchers: 9
- Forks: 22
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - volcengine/veGiantModel
README
# veGiantModel
VeGiantModel is a torch based high efficient training library developed by the Applied Machine Learning team at Bytedance. This repository is for ongoing research to make giant model (such as [GPT](https://arxiv.org/abs/2005.14165), [BERT](https://arxiv.org/pdf/1810.04805.pdf) and [T5](https://arxiv.org/abs/1910.10683)) training easy, efficient, and effective. VeGiantModel builds on top of [Megatron](https://github.com/NVIDIA/Megatron-LM) and [DeepSpeed](https://github.com/microsoft/DeepSpeed), improves communication efficiency by integrating high efficient communication library [BytePs](https://github.com/bytedance/byteps) and providing customized pipline partitioning.
## initialization```python
import veGiantModel
pipeline_parallel_size = 1
model_parallel_size = 2
veGiantModel.initialize.init_distribute(pipeline_parallel_size, model_parallel_size, init_method="env://")
mp_size = veGiantModel.distributed.get_model_parallel_world_size()
dp_size = veGiantModel.distributed.get_data_parallel_world_size()
```## modules
```python
from veGiantModel.module import ColumnParallelLinear, RowParallelLinearclass PositionWiseFeedForward(nn.Module):
""" FeedForward Neural Networks for each position """def __init__(self, config: Config):
super().__init__()if self.config.use_mp_linear_in_ffn:
assert ColumnParallelLinear is not None
assert RowParallelLinear is not None
self.fc1 = ColumnParallelLinear(config.dim, config.dim_ff, use_ft=False)
self.fc2 = RowParallelLinear(config.dim_ff, config.dim, use_ft=False)
else:
self.fc1 = nn.Linear(config.dim, config.dim_ff)
self.fc2 = nn.Linear(config.dim_ff, config.dim)
self.act = Activation(config.act)
self.dropout = nn.Dropout(config.p_drop_hidden)def forward(self, x) -> torch.Tensor:
# (bsz, seq_len, dim) -> (bsz, seq_len, dim_ff / model_parallel_size) -> (bsz, seq_len, dim)
fc1_out = self.act(self.fc1(x))
if self.config.dropout_in_ffn:
fc1_out = self.dropout(fc1_out)
fc2_out = self.fc2(fc1_out)
if self.config.use_ffn_output_dropout:
fc2_out = self.dropout(fc2_out)
return fc2_out
```## Examples
### GPT Pretraining
The `examples/gpt/pretrain_gpt2_distributed.sh` scrips runs 345M parameter GPT pretraining on single 8 GPUs node. It follows largely the same as Megatron GPT script with a few notable differences. It shows good compatiblility with current megatron/Deepseed training job with little changes to adpot VeGiantModel.