Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/volcengine/veGiantModel

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/volcengine/veGiantModel
Owner: volcengine
License: apache-2.0
Created: 2021-12-10T06:58:53.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-08-17T04:07:25.000Z (11 months ago)
Last Synced: 2024-01-19T21:55:00.208Z (6 months ago)
Language: Python
Size: 14.8 MB
Stars: 191
Watchers: 10
Forks: 21
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

big-model-awesome - 字节跳动大模型训练框架 veGiantModel - gCqtz6XQ)] (Frameworks)
StarryDivineSky - volcengine/veGiantModel

README

        # veGiantModel

VeGiantModel is a torch based high efficient training library developed by the Applied Machine Learning team at Bytedance. This repository is for ongoing research to make giant model (such as [GPT](https://arxiv.org/abs/2005.14165), [BERT](https://arxiv.org/pdf/1810.04805.pdf) and [T5](https://arxiv.org/abs/1910.10683)) training easy, efficient, and effective. VeGiantModel builds on top of [Megatron](https://github.com/NVIDIA/Megatron-LM) and [DeepSpeed](https://github.com/microsoft/DeepSpeed), improves communication efficiency by integrating high efficient communication library [BytePs](https://github.com/bytedance/byteps) and providing customized pipline partitioning.

## initialization

```python

import veGiantModel

pipeline_parallel_size = 1

model_parallel_size = 2

veGiantModel.initialize.init_distribute(pipeline_parallel_size, model_parallel_size, init_method="env://")

mp_size = veGiantModel.distributed.get_model_parallel_world_size()

dp_size = veGiantModel.distributed.get_data_parallel_world_size()

```

## modules

```python

from veGiantModel.module import ColumnParallelLinear, RowParallelLinear

class PositionWiseFeedForward(nn.Module):

    """ FeedForward Neural Networks for each position """

    def __init__(self, config: Config):

        super().__init__()

        if self.config.use_mp_linear_in_ffn:

            assert ColumnParallelLinear is not None

            assert RowParallelLinear is not None

            self.fc1 = ColumnParallelLinear(config.dim, config.dim_ff, use_ft=False)

            self.fc2 = RowParallelLinear(config.dim_ff, config.dim, use_ft=False)

        else:

            self.fc1 = nn.Linear(config.dim, config.dim_ff)

            self.fc2 = nn.Linear(config.dim_ff, config.dim)

        self.act = Activation(config.act)

        self.dropout = nn.Dropout(config.p_drop_hidden)

    def forward(self, x) -> torch.Tensor:

        # (bsz, seq_len, dim) -> (bsz, seq_len, dim_ff / model_parallel_size) -> (bsz, seq_len, dim)

        fc1_out = self.act(self.fc1(x))

        if self.config.dropout_in_ffn:

            fc1_out = self.dropout(fc1_out)

        fc2_out = self.fc2(fc1_out)

        if self.config.use_ffn_output_dropout:

            fc2_out = self.dropout(fc2_out)

        return fc2_out

```

## Examples

### GPT Pretraining

The `examples/gpt/pretrain_gpt2_distributed.sh` scrips runs 345M parameter GPT pretraining on single 8 GPUs node. It follows largely the same as Megatron GPT script with a few notable differences. It shows good compatiblility with current megatron/Deepseed training job with little changes to adpot VeGiantModel.