{"id":26344031,"url":"https://github.com/ssahas/implementing-gpt-from-scratch","last_synced_at":"2025-10-14T18:04:53.422Z","repository":{"id":255797906,"uuid":"853627695","full_name":"SSahas/Implementing-GPT-From-Scratch","owner":"SSahas","description":"Building a decoder-only (GPT-style) LLM from scratch using PyTorch and  training it for text generation.","archived":false,"fork":false,"pushed_at":"2025-06-08T10:47:28.000Z","size":360,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-14T18:04:32.634Z","etag":null,"topics":["datacleaning","dataprocessing","large-language-models","llm","llm-inference","llm-training","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SSahas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-09-07T04:30:48.000Z","updated_at":"2025-06-08T10:47:31.000Z","dependencies_parsed_at":"2024-10-24T17:58:21.945Z","dependency_job_id":"3bb4d483-102a-49f8-9480-e06a5ad7c878","html_url":"https://github.com/SSahas/Implementing-GPT-From-Scratch","commit_stats":null,"previous_names":["ssahas/implementing-llm-from-scratch","ssahas/implementing-gpt-from-scratch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SSahas/Implementing-GPT-From-Scratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SSahas%2FImplementing-GPT-From-Scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SSahas%2FImplementing-GPT-From-Scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SSahas%2FImplementing-GPT-From-Scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SSahas%2FImplementing-GPT-From-Scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SSahas","download_url":"https://codeload.github.com/SSahas/Implementing-GPT-From-Scratch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SSahas%2FImplementing-GPT-From-Scratch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279020318,"owners_count":26086864,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datacleaning","dataprocessing","large-language-models","llm","llm-inference","llm-training","python"],"created_at":"2025-03-16T05:26:37.274Z","updated_at":"2025-10-14T18:04:53.416Z","avatar_url":"https://github.com/SSahas.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Building and PreTraining Decoder only LLM Model (GPT style) from scratch with PyTorch\n- Pretraining a LLM model for Text generation, used Salesforce/wikitext for training. The model was trained for 30000 iterations with a batch size of 8 for ~3 hours on a 16GB Tesla P100 (Kaggle Free gpu support). The training loss is around 3.7. After training, the model is generating english with understandable grammar.\n\n\n- To train the model , clone the repository \n\n```\ngit clone https://github.com/SSahas/Implementing-GPT-From-Scratch.git\n```\n## Training \n- create tokenized data\n  \n```\npython data/load_data.py \n```\n- Train the model\n\n```\npython train.py --config config/config.json\n```\n\n## Inference\n- To generate text using a trained model\n```\npython sample.py --model_path path/to/saved/model --prompt \"Your prompt here\"\n```\n\n## Model Details and Loss curves\n```\nn_embd = 512\nvocab_size = 50257\nn_layers = 6\nn_heads = 8\nblock_size = 512 # number to previous tokens to attend to perform attention\nbatch_size = 8\nlearning rate = 5e-4\n```\n- The x-axis represents iterations in hundreds. The model was trained for a total of 30,000 training steps.\n  \nTrain Loss             |  Test loss\n:-------------------------:|:-------------------------:\n![](https://github.com/SSahas/Implementing-GPT-From-Scratch/blob/main/assets/train.png)  |  ![](https://github.com/SSahas/Implementing-GPT-From-Scratch/blob/main/assets/test.png)\n\n\n\n\n# Sample Generations\n\u003e *This is used for its purpose . The castle has its most extensive military value , with its new weapons and the ability to draw guns against and destroy obstacles ,\nbut it has always been used for long - duration.*\n\n\u003e *Once there was no threat to the United States who are expecting asylum to the United States government . The National Hurricane Center issued the same day the agency requested them to the Washington National Weather Service agencies at any request . By 1997 , the agency also considered the agency had a $ 20 , 000 fine ( equivalent to $ 15 , 060 , 061 in 2016 ) for an upcoming hurricane.*\n\n\u003e *This is to be called the \" great leader of all the major things and the most beautiful leader of all the time \" he is \" not so happy \" if he and his co - workers will be able to accomplish the truth they are in vain when him to death .*\n\n\n\n\n\n\n# References \n- [Andrej karpathy-nanoGPT](https://github.com/karpathy/nanoGPT)\n- [t5-pytorch](https://github.com/conceptofmind/t5-pytorch)\n- [nanoT5](https://github.com/PiotrNawrot/nanoT5)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssahas%2Fimplementing-gpt-from-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssahas%2Fimplementing-gpt-from-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssahas%2Fimplementing-gpt-from-scratch/lists"}