{"id":17985183,"url":"https://github.com/seannaren/min-llm","last_synced_at":"2026-03-16T03:32:29.700Z","repository":{"id":38466438,"uuid":"479465763","full_name":"SeanNaren/min-LLM","owner":"SeanNaren","description":"Minimal code to train a Large Language Model (LLM).","archived":false,"fork":false,"pushed_at":"2022-07-22T09:28:15.000Z","size":142,"stargazers_count":143,"open_issues_count":6,"forks_count":6,"subscribers_count":9,"default_branch":"main","last_synced_at":"2023-11-07T22:10:13.186Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SeanNaren.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-08T16:36:51.000Z","updated_at":"2023-11-06T04:19:45.000Z","dependencies_parsed_at":"2022-07-12T17:37:49.447Z","dependency_job_id":null,"html_url":"https://github.com/SeanNaren/min-LLM","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeanNaren%2Fmin-LLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeanNaren%2Fmin-LLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeanNaren%2Fmin-LLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeanNaren%2Fmin-LLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SeanNaren","download_url":"https://codeload.github.com/SeanNaren/min-LLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222090803,"owners_count":16929472,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-29T18:24:16.331Z","updated_at":"2026-03-16T03:32:29.647Z","avatar_url":"https://github.com/SeanNaren.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# min-LLM\n\nMinimal code to train a relatively large language model (1-10B parameters).\n\n* Minimal codebase to learn and adapt for your own use cases\n* Concise demonstration of tricks to optimally train a larger language model\n* Allows exploration of compute optimal models at smaller sizes based on realistic scaling laws\n\nThe project was inspired by [megatron](https://github.com/NVIDIA/Megatron-LM) and all sub-variants. This repo can be seen as a condensed variant, where some of the very large scaling tricks are stripped out for the sake of readability/simplicity.\n\nFor example, the library does not include Tensor Parallelism/Pipeline Parallelism. If you need to reach those 100B+ parameter models, I suggest looking at [megatron](https://github.com/NVIDIA/Megatron-LM).\n\n## Setup\n\nMake sure you're installing/running on a CUDA supported machine.\n\nTo improve performance, we use a few fused kernel layers from Apex (if you're unsure what fused kernels are for, I highly suggest [this](https://horace.io/brrr_intro.html) blogpost).\n\n```\ngit clone https://github.com/NVIDIA/apex\ncd apex\npip install -v --disable-pip-version-check --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n```\n\nInstall the rest of the requirements:\n\n```\npip install -r requirements.txt\n```\n\n## Train\n\nTo train a 1.5B parameter model based on the Megatron architecture sizes using 8 GPUs (model will not fit on 1 GPU with optimal throughput, we scale to multiple).\n\n```\ndeepspeed --num_gpus 8 train.py --batch_size_per_gpu 16\n``` \n\n## References\n\nCode: \n\n* [minGPT](https://github.com/karpathy/minGPT) - A lot of the base code was borrowed and extended from this awesome library\n* [microGPT](https://github.com/facebookresearch/xformers/blob/main/examples/microGPT.py) - A helpful example with xFormers\n* [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) - Learning the use of Deepspeed with the Megatron architecture/3d parallelism.\n\nPapers:\n\n* [Efficient Large-Scale Language Model Training on GPU Clusters\nUsing Megatron-LM](https://cs.stanford.edu/~matei/papers/2021/sc_megatron_lm.pdf)\n* [Training Compute-Optimal Large Language Models](https://arxiv.org/pdf/2203.15556.pdf)\n* [What Language Model to Train if You Have One Million GPU Hours?](https://openreview.net/pdf?id=rI7BL3fHIZq)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseannaren%2Fmin-llm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseannaren%2Fmin-llm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseannaren%2Fmin-llm/lists"}