{"id":13478559,"url":"https://github.com/EleutherAI/gpt-neox","last_synced_at":"2025-03-27T07:31:16.854Z","repository":{"id":37257849,"uuid":"323651234","full_name":"EleutherAI/gpt-neox","owner":"EleutherAI","description":"An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries","archived":false,"fork":false,"pushed_at":"2024-10-24T15:03:07.000Z","size":116273,"stargazers_count":6913,"open_issues_count":89,"forks_count":1008,"subscribers_count":124,"default_branch":"main","last_synced_at":"2024-10-29T15:35:14.982Z","etag":null,"topics":["deepspeed-library","gpt-3","language-model","transformers"],"latest_commit_sha":null,"homepage":"https://www.eleuther.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EleutherAI.png","metadata":{"files":{"readme":"README-MUP.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-12-22T14:37:54.000Z","updated_at":"2024-10-29T09:59:09.000Z","dependencies_parsed_at":"2023-12-26T22:24:42.055Z","dependency_job_id":"5615dfe0-6df4-401f-ad6e-caea7e82ace0","html_url":"https://github.com/EleutherAI/gpt-neox","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fgpt-neox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fgpt-neox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fgpt-neox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fgpt-neox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EleutherAI","download_url":"https://codeload.github.com/EleutherAI/gpt-neox/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245802511,"owners_count":20674686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deepspeed-library","gpt-3","language-model","transformers"],"created_at":"2024-07-31T16:01:58.719Z","updated_at":"2025-03-27T07:31:16.827Z","avatar_url":"https://github.com/EleutherAI.png","language":"Python","funding_links":[],"categories":["Python","LLM Training Frameworks","LLM训练框架","Open LLM","Model","Models","LLM-List","Large Language Model","HarmonyOS","A01_文本生成_文本对话","2. Open Foundation Models","EleutherAI","LLM Training / Finetuning"],"sub_categories":["LLM 评估工具","Large Language Model","Open models","Pre-trained-LLM","Windows Manager","大语言对话模型及数据"],"readme":"# How to use Mup (https://github.com/microsoft/mup)\n\n## Add mup neox args to your config\n\n```\n# mup\n\n\"use-mup\": true,\n\n\"save-base-shapes\": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank\n\n\"base-shapes-file\": \"base-shapes\", # load base shapes from this file\n\n\"coord-check\": false, # generate coord check plots to verify mup's implementation in neox\n\n# mup hp search\n\n\"mup-init-scale\": 1.0,\n\n\"mup-attn-temp\": 1.0,\n\n\"mup-output-temp\": 1.0,\n\n\"mup-embedding-mult\": 1.0,\n\n\"mup-rp-embedding-mult\": 1.0,\n```\n\n## Generate base shapes\n\n1. Set use-mup to true\n2. Set save-base-shapes to true\n3. Run once. gpt-neox will instantiate a base model and a delta model, then save one file per rank named \u003cbase-shapes-file\u003e.\u003crank\u003e. gpt-neox will exit immediately.\n4. Set save-base-shapes to false\n\n## Generate coord check plots (optional)\n\n1. Keep use-mup true\n2. Set coord-check to true\n3. Run once. gpt-neox will output jpg images similar to https://github.com/microsoft/mutransformers/blob/main/README.md#coord-check. gpt-neox will exit immediately\n4. Set coord-check to false\n\n## Tune mup hyperparameters and LR\n\nThe values under `mup hp search` were added and correspond to appendix F.4 from https://arxiv.org/pdf/2203.03466.pdf. These and LR are tuned with a random search using the scaled-up config (tested with 6-7B.yml) but with hidden-size set to the value from the scaled-down config (125M.yml).\n\n## Transfer\n\nWith the best LR set and the best mup HPs set, revert the value of hidden-size in the scaled-up config and run again.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEleutherAI%2Fgpt-neox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEleutherAI%2Fgpt-neox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEleutherAI%2Fgpt-neox/lists"}