{"id":17694498,"url":"https://github.com/felix-andreas/gpt-burn","last_synced_at":"2025-09-14T06:44:55.573Z","repository":{"id":240119495,"uuid":"801040760","full_name":"felix-andreas/gpt-burn","owner":"felix-andreas","description":"Implementation of the GPT architecture in Rust 🦀 + Burn 🔥","archived":false,"fork":false,"pushed_at":"2024-05-22T09:56:16.000Z","size":98,"stargazers_count":49,"open_issues_count":2,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T16:52:06.970Z","etag":null,"topics":["burn","gpt","rust","transformer"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"upl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/felix-andreas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-15T13:31:29.000Z","updated_at":"2025-03-09T00:19:08.000Z","dependencies_parsed_at":"2024-10-24T19:34:48.078Z","dependency_job_id":null,"html_url":"https://github.com/felix-andreas/gpt-burn","commit_stats":null,"previous_names":["felix-andreas/gpt-burn"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/felix-andreas/gpt-burn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-andreas%2Fgpt-burn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-andreas%2Fgpt-burn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-andreas%2Fgpt-burn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-andreas%2Fgpt-burn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/felix-andreas","download_url":"https://codeload.github.com/felix-andreas/gpt-burn/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-andreas%2Fgpt-burn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275071553,"owners_count":25400398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["burn","gpt","rust","transformer"],"created_at":"2024-10-24T13:48:57.708Z","updated_at":"2025-09-14T06:44:55.534Z","avatar_url":"https://github.com/felix-andreas.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# GPT-Burn 🔥\n\n### Implementation of the GPT architecture in Rust 🦀 + [Burn 🔥](https://burn.dev/).\n\n\u003c/div\u003e\n\nThis project aims to be a clean and concise re-implementation of [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf). The model implementation, contained in [`src/model.rs`](src/model.rs), is under 300 lines of code. While this was a fun exercise mostly for (my own) educational purposes, it demonstrates the utility of Rust and Burn in the machine learning domain: The entire project compiles into a single binary, making deployment relatively straightforward.\n\nAt the moment, only a character-level tokenizer is supported, so official weights requiring a BPE tokenizer cannot be used yet. However, for fun, you can try out the small toy model I trained ([see inference](#inference)). \n\nThe project also includes a simple CLI for training and inference. \n\n```\nUsage: gpt-burn \u003cCOMMAND\u003e\n\nCommands:\n  run    Generate text using a pre-trained model\n  train  Train a new model\n```\n\n## Installation\n\nYou can install `gpt-burn` with [Nix](https://nixos.org/):\n\n```sh\nnix run github:felix-andreas/gpt-burn\n```\n\nOr, install with `cargo`:\n\n```\ncargo install --git https://github.com/felix-andreas/gpt-burn\n```\n\nAlternatively, clone the repo and build from source:\n\n```sh\nnix develop # optional\ncargo run --release\n```\n\nIf you don't use [Nix](https://nixos.org/) and are on a Ubuntu-based distro, you need to install these additional dependencies:\n\n```sh\napt install pkg-config libssl-dev libvulkan1 mesa-vulkan-drivers vulkan-tools\n```\n\n## Inference\n\nI trained a toy model with a character-level tokenizer on the [German Wikipedia corpus](https://github.com/GermanT5/wikipedia2corpus) for 20,000 batches (batch size of 128) with the following parameters:\n\n| Parameter      | Value  |\n| -------------- | ------ |\n| parameters     | 83M    |\n| context length | 128    |\n| `n_layers`     | 12     |\n| `n_heads`      | 12     |\n| `d_model`      | 768    |\n\nYou can download it [here](https://drive.usercontent.google.com/download?id=1GGLaPnmPQ8Z2B9vJQoI6-K128X9LJKG0\u0026export=download) and extract it afterward. Or, do both in a single command:\n\n```sh\ncurl -s 'https://drive.usercontent.google.com/download?id=1GGLaPnmPQ8Z2B9vJQoI6-K128X9LJKG0\u0026export=download\u0026confirm=t' | tar xzf -\n```\n\nThen, run the model:\n\n```sh\ngpt-burn run ./model_83M\n```\n\nYou should see something along these lines:\n\n```\nSo wurden bis 1977 679 nachhaltige Wörgler Torbauten vorgeworfen, die Einwohnerzahl Sirkes bestand 2015 bis 1998.\nSie war trotz weniger als 10.000 ausgedehnter Größen wahrscheinlich auf folgende Breitenauflagen mit 932 km.\n2016 wurden rund 145 Händen nach Deutschland geladen.\n```\n\nFurther command line options are:\n\n```\nUsage: gpt-burn run [OPTIONS] \u003cMODEL_PATH\u003e\n\nArguments:\n  \u003cMODEL_PATH\u003e\n\nOptions:\n  -p, --prompt \u003cPROMPT\u003e\n  -n, --n-new-tokens \u003cN_NEW_TOKENS\u003e  [default: 1000]\n  -s, --seed \u003cSEED\u003e                  [default: 0]\n```\n\n## Training\n\nTo train your own model, run:\n\n```\ngpt-burn train --context-length 128 --n-layers 12 --n-heads 12 --d-model 768 --batch-size 128 --learning-rate 0.0003 --seed 0 --text-corpus ./corpus.txt\n```\n\n\u003e [!IMPORTANT]\n\u003e Make sure `corpus.txt` is a utf-8 encoded text file!\n\nYou can pass most hyperparameters as a command-line option:\n\n```\nUsage: gpt-burn train [OPTIONS]\n\nOptions:\n  -o, --output-path \u003cPATH\u003e\n  -c, --context-length \u003cCONTEXT_LENGTH\u003e  [default: 64]\n  -d, --d-model \u003cD_MODEL\u003e                [default: 64]\n  -l, --n-layers \u003cN_LAYERS\u003e              [default: 2]\n  -h, --n-heads \u003cN_HEADS\u003e                [default: 2]\n  -n, --n-steps \u003cN_STEPS\u003e                [default: 50]\n  -b, --batch-size \u003cBATCH_SIZE\u003e          [default: 32]\n  -r, --learning-rate \u003cLEARNING_RATE\u003e    [default: 0.003]\n  -s, --seed \u003cSEED\u003e                      [default: 0]\n  -t, --text-corpus \u003cTEXT_CORPUS\u003e        [default: .data/corpus.txt]\n  -m, --n-mega-bytes \u003cN_MEGA_BYTES\u003e      Only use first \u003cn\u003e megabytes of dataset for training\n  -x, --no-save                          Don't save trained model (useful for debugging)\n```\n\n## Tokenizer\n\nThe model can be used with different tokenizers via the `Tokenizer` trait. Below you see how the following sentence\n\n```\nAlbert Einstein war ein schweizerisch-US-amerikanischer theoretischer Physiker deutscher Herkunft.\n```\n\nis encoded by the different tokenizers.\n\n### Character-level tokenizer\n\nThe `CharTokenizer` splits every character into a separate token:\n\n```\nTokens: [\"A\", \"l\", \"b\", \"e\", \"r\", \"t\", \" \", \"E\", \"i\", \"n\", \"s\", \"t\", \"e\", \"i\", \"n\", \" \", \"w\", \"a\", \"r\", \" \", \"e\", \"i\", \"n\", \" \", \"s\", \"c\", \"h\", \"w\", \"e\", \"i\", \"z\", \"e\", \"r\", \"i\", \"s\", \"c\", \"h\", \"-\", \"U\", \"S\", \"-\", \"a\", \"m\", \"e\", \"r\", \"i\", \"k\", \"a\", \"n\", \"i\", \"s\", \"c\", \"h\", \"e\", \"r\", \" \", \"t\", \"h\", \"e\", \"o\", \"r\", \"e\", \"t\", \"i\", \"s\", \"c\", \"h\", \"e\", \"r\", \" \", \"P\", \"h\", \"y\", \"s\", \"i\", \"k\", \"e\", \"r\", \" \", \"d\", \"e\", \"u\", \"t\", \"s\", \"c\", \"h\", \"e\", \"r\", \" \", \"H\", \"e\", \"r\", \"k\", \"u\", \"n\", \"f\", \"t\", \".\"]\nValues: [28, 13, 3, 6, 19, 21, 1, 32, 10, 15, 20, 21, 6, 10, 15, 1, 24, 2, 19, 1, 6, 10, 15, 1, 20, 4, 9, 24, 6, 10, 27, 6, 19, 10, 20, 4, 9, 66, 48, 46, 66, 2, 14, 6, 19, 10, 12, 2, 15, 10, 20, 4, 9, 6, 19, 1, 21, 9, 6, 16, 19, 6, 21, 10, 20, 4, 9, 6, 19, 1, 43, 9, 26, 20, 10, 12, 6, 19, 1, 5, 6, 22, 21, 20, 4, 9, 6, 19, 1, 35, 6, 19, 12, 22, 15, 7, 21, 67]\n```\n\n### Simple-vowel tokenizer\n\nThe `SimpleVowelTokenizer` splits words before the next vowel if the chunk is longer than three characters, creating a result that resembles syllables:\n\n```\nTokens: [\"Albert\", \" \", \"Einst\", \"ein\", \" \", \"war\", \" \", \"ein\", \" \", \"schw\", \"eizer\", \"isch\", \"-\", \"US\", \"-\", \"amer\", \"ikan\", \"isch\", \"er\", \" \", \"theor\", \"etisch\", \"er\", \" \", \"Phys\", \"iker\", \" \", \"deutsch\", \"er\", \" \", \"Herk\", \"unft\"]\nValues: [2, 0, 3, 9, 0, 19, 0, 9, 0, 16, 10, 15, 1, 6, 1, 7, 13, 15, 11, 0, 17, 12, 11, 0, 5, 14, 0, 8, 11, 0, 4, 18]\n```\n\n## References\n\n* [GPT-2 Paper](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)\n* [OpenAI's GPT-2 Implementation](https://github.com/openai/gpt-2/blob/master/src/model.py)\n* [Huggingface's GPT-2 Implementation](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py)\n* [Visualization of the GPT Architecture](https://en.wikipedia.org/wiki/Generative_pre-trained_transformer#/media/File:Full_GPT_architecture.svg)\n* [Lesson by Andrej Karpathy](https://www.youtube.com/watch?v=kCc8FmEb1nY)\n* [The GPT-3 Architecture, on a Napkin](https://dugas.ch/artificial_curiosity/GPT_architecture.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelix-andreas%2Fgpt-burn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffelix-andreas%2Fgpt-burn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelix-andreas%2Fgpt-burn/lists"}