{"id":13480195,"url":"https://github.com/volcengine/veGiantModel","last_synced_at":"2025-03-27T10:31:07.106Z","repository":{"id":40403211,"uuid":"436881866","full_name":"volcengine/veGiantModel","owner":"volcengine","description":null,"archived":false,"fork":false,"pushed_at":"2023-08-17T04:07:25.000Z","size":15490,"stargazers_count":200,"open_issues_count":4,"forks_count":22,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-08-01T17:21:17.329Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/volcengine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-10T06:58:53.000Z","updated_at":"2024-07-25T05:01:00.000Z","dependencies_parsed_at":"2023-01-17T18:15:57.933Z","dependency_job_id":null,"html_url":"https://github.com/volcengine/veGiantModel","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveGiantModel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveGiantModel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveGiantModel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcengine%2FveGiantModel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/volcengine","download_url":"https://codeload.github.com/volcengine/veGiantModel/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222230715,"owners_count":16952694,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T17:00:35.578Z","updated_at":"2024-10-30T13:31:14.885Z","avatar_url":"https://github.com/volcengine.png","language":"Python","funding_links":[],"categories":["BERT优化","Frameworks"],"sub_categories":[],"readme":"# veGiantModel\nVeGiantModel is a torch based high efficient training library developed by the Applied Machine Learning team at Bytedance. This repository is for ongoing research to make giant model (such as [GPT](https://arxiv.org/abs/2005.14165), [BERT](https://arxiv.org/pdf/1810.04805.pdf) and [T5](https://arxiv.org/abs/1910.10683)) training easy, efficient, and effective. VeGiantModel builds on top of [Megatron](https://github.com/NVIDIA/Megatron-LM) and [DeepSpeed](https://github.com/microsoft/DeepSpeed), improves communication efficiency by integrating high efficient communication library [BytePs](https://github.com/bytedance/byteps) and providing customized pipline partitioning.\n## initialization\n\n```python\nimport veGiantModel\npipeline_parallel_size = 1\nmodel_parallel_size = 2\nveGiantModel.initialize.init_distribute(pipeline_parallel_size, model_parallel_size, init_method=\"env://\")\nmp_size = veGiantModel.distributed.get_model_parallel_world_size()\ndp_size = veGiantModel.distributed.get_data_parallel_world_size()\n```\n\n## modules\n\n\n```python\nfrom veGiantModel.module import ColumnParallelLinear, RowParallelLinear\n\nclass PositionWiseFeedForward(nn.Module):\n    \"\"\" FeedForward Neural Networks for each position \"\"\"\n\n    def __init__(self, config: Config):\n        super().__init__()\n\n        if self.config.use_mp_linear_in_ffn:\n            assert ColumnParallelLinear is not None\n            assert RowParallelLinear is not None\n            self.fc1 = ColumnParallelLinear(config.dim, config.dim_ff, use_ft=False)\n            self.fc2 = RowParallelLinear(config.dim_ff, config.dim, use_ft=False)\n        else:\n            self.fc1 = nn.Linear(config.dim, config.dim_ff)\n            self.fc2 = nn.Linear(config.dim_ff, config.dim)\n        self.act = Activation(config.act)\n        self.dropout = nn.Dropout(config.p_drop_hidden)\n\n    def forward(self, x) -\u003e torch.Tensor:\n        # (bsz, seq_len, dim) -\u003e (bsz, seq_len, dim_ff / model_parallel_size) -\u003e (bsz, seq_len, dim)\n        fc1_out = self.act(self.fc1(x))\n        if self.config.dropout_in_ffn:\n            fc1_out = self.dropout(fc1_out)\n        fc2_out = self.fc2(fc1_out)\n        if self.config.use_ffn_output_dropout:\n            fc2_out = self.dropout(fc2_out)\n        return fc2_out\n```\n\n\n## Examples\n### GPT Pretraining\nThe `examples/gpt/pretrain_gpt2_distributed.sh` scrips runs 345M parameter GPT pretraining on single 8 GPUs node. It follows largely the same as Megatron GPT script with a few notable differences. It shows good compatiblility with current megatron/Deepseed training job with little changes to adpot VeGiantModel.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvolcengine%2FveGiantModel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvolcengine%2FveGiantModel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvolcengine%2FveGiantModel/lists"}