https://github.com/EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://github.com/EleutherAI/gpt-neox

deepspeed-library gpt-3 language-model transformers

Last synced: 7 months ago
JSON representation

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Host: GitHub
URL: https://github.com/EleutherAI/gpt-neox
Owner: EleutherAI
License: apache-2.0
Created: 2020-12-22T14:37:54.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-10-24T15:03:07.000Z (about 1 year ago)
Last Synced: 2024-10-29T15:35:14.982Z (12 months ago)
Topics: deepspeed-library, gpt-3, language-model, transformers
Language: Python
Homepage: https://www.eleuther.ai/
Size: 111 MB
Stars: 6,913
Watchers: 124
Forks: 1,008
Open Issues: 89
Metadata Files:
- Readme: README-MUP.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

Awesome_Multimodel_LLM - GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. (LLM Training Frameworks)
awesome-llm - GPT-NeoX - 基于DeepSpeed库的GPU并行自回归Transformer模型实现。 (LLM训练框架 / LLM 评估工具)
awesome-llm - GPT-NeoX - 基于DeepSpeed库的GPU并行自回归Transformer模型实现。 (LLM训练框架 / LLM 评估工具)
awesome-mlvid - ckpt - 04 | [Paper](https://arxiv.org/pdf/2204.06745.pdf) | [Apache 2.0](https://github.com/EleutherAI/gpt-neox/blob/main/LICENSE) | (Open LLM)
awesome-llmops - GPT-NeoX - neox.svg?style=flat-square) | (Model / Large Language Model)
awesome-llm - GPT-NeoX 2.0 (20B) - Announced by EleutherAI / 2023 (Models / Open models)
awesome-llm-eval - ckpt - 04 | [Paper](https://arxiv.org/pdf/2204.06745.pdf) | (LLM-List / Pre-trained-LLM)
Awesome-LLM - GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. (LLM Training Frameworks)
StarryDivineSky - EleutherAI/gpt-neox
awesome-open-source-lms - Training Code

README

# How to use Mup (https://github.com/microsoft/mup)

## Add mup neox args to your config

```
# mup

"use-mup": true,

"save-base-shapes": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank

"base-shapes-file": "base-shapes", # load base shapes from this file

"coord-check": false, # generate coord check plots to verify mup's implementation in neox

# mup hp search

"mup-init-scale": 1.0,

"mup-attn-temp": 1.0,

"mup-output-temp": 1.0,

"mup-embedding-mult": 1.0,

"mup-rp-embedding-mult": 1.0,
```

## Generate base shapes

1. Set use-mup to true
2. Set save-base-shapes to true
3. Run once. gpt-neox will instantiate a base model and a delta model, then save one file per rank named .. gpt-neox will exit immediately.
4. Set save-base-shapes to false

## Generate coord check plots (optional)

1. Keep use-mup true
2. Set coord-check to true
3. Run once. gpt-neox will output jpg images similar to https://github.com/microsoft/mutransformers/blob/main/README.md#coord-check. gpt-neox will exit immediately
4. Set coord-check to false

## Tune mup hyperparameters and LR

The values under `mup hp search` were added and correspond to appendix F.4 from https://arxiv.org/pdf/2203.03466.pdf. These and LR are tuned with a random search using the scaled-up config (tested with 6-7B.yml) but with hidden-size set to the value from the scaled-down config (125M.yml).

## Transfer

With the best LR set and the best mup HPs set, revert the value of hidden-size in the scaled-up config and run again.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/EleutherAI/gpt-neox

Awesome Lists containing this project

README