{"id":15540045,"url":"https://github.com/blaizzy/coding-llms-from-scratch","last_synced_at":"2025-04-16T05:51:52.033Z","repository":{"id":228110692,"uuid":"773152536","full_name":"Blaizzy/Coding-LLMs-from-scratch","owner":"Blaizzy","description":null,"archived":false,"fork":false,"pushed_at":"2024-06-10T10:41:33.000Z","size":7072,"stargazers_count":31,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-16T03:00:10.073Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Blaizzy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-16T22:19:39.000Z","updated_at":"2025-03-28T08:37:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"a256487c-1e75-4bc4-901d-ecaa7d26e9d6","html_url":"https://github.com/Blaizzy/Coding-LLMs-from-scratch","commit_stats":{"total_commits":9,"total_committers":1,"mean_commits":9.0,"dds":0.0,"last_synced_commit":"9c3a182fcfbe898f532c9de947a5fc692ca4cc6b"},"previous_names":["blaizzy/coding-llms-from-scratch"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2FCoding-LLMs-from-scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2FCoding-LLMs-from-scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2FCoding-LLMs-from-scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Blaizzy%2FCoding-LLMs-from-scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Blaizzy","download_url":"https://codeload.github.com/Blaizzy/Coding-LLMs-from-scratch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249205059,"owners_count":21229852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T12:12:17.897Z","updated_at":"2025-04-16T05:51:52.014Z","avatar_url":"https://github.com/Blaizzy.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Coding LLMs from scratch\r\n# Coding Llama-2\r\nYou will learn how to train and fine-tune Llama 2 model from scratch.\r\n\r\nThrought the series you will learn about transformers architecture, different attention mechanisms (MHA, MQA and GQA), KV cache, RoPE, and Hugginface Trainer in detail.\r\n\r\nBy the end, you will have created and trained a LLaMA 2 model with 100M parameters from scratch using PyTorch to do code completion.\r\n\r\n🎥 **YT Video Playlist:**\r\n - https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy\u0026si=5Y4cm-6wrMOD1Abr\r\n\r\n\r\n\r\n# Coding Llama-3\r\n\r\nYou will learn how to train and fine-tune Llama 3 model from scratch.\r\n\r\nThe goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45B params.\r\n\r\n🎥 **YT Video Playlist:**\r\n - https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy\u0026si=5Y4cm-6wrMOD1Abr\r\n\r\n📚 **Papers**:\r\n - Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints\r\n: https://arxiv.org/abs/2212.05055\r\n- Pre-training Small Base LMs with Fewer Tokens: https://arxiv.org/abs/2404.08634\r\nLeave No Context Behind Efficient Infinite Context Transformers with Infini-attention: https://arxiv.org/abs/2404.07143\r\n\r\n\r\n\r\n## Llama-3-6B-v0.1\r\n\u003cimg src=\"./Llama-3/Part 2/assets/llama-3-6B icon.jpeg\" width=\"500\" alt=\"Llama-3-6B\"/\u003e\r\n\r\nIntroducing the world's first Llama-3 base model with 6B parameters. This model is a pretrained version of [prince-canuma/Llama-3-6B-v0](https://huggingface.co/prince-canuma/Llama-3-6B-v0), which was created from Meta-Llama-3-8B using a technique called [downcycling](https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy\u0026si=9hcOol4KHIgWThgt) .\r\nThe model was continually pretrained on 1 billion tokens of English-only text from fineweb, achieving impressive results on the evaluation set:\r\n- Loss: 2.4942\r\n\r\n\r\n## Model Description\r\n\r\n- **Developed by:** [Prince Canuma](https://huggingface.co/prince-canuma)\r\n- **Sponsored by:** General\r\n- **Model type:** Llama\r\n- **License:** [Llama-3](https://llama.meta.com/llama3/license)\r\n- **Pretrained from model:** prince-canuma/Llama-3-6B-v0\r\n\r\n### Model Sources\r\n\r\n- **Repository:** https://github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3\r\n- **Video:** https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy\u0026si=5Y4cm-6wrMOD1Abr\r\n\r\n## Uses\r\n\r\n\u003c!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --\u003e\r\nYou can use this model to create instruct and chat versions for various use cases such as: Coding assistant, RAG, Function Calling and more.\r\n\r\n### Limitations\r\n\r\nThis model inherits some of the base model's limitations and some additional ones from it's creation process, such as:\r\n - Limited scope for coding and math: According to benchmarks, this model needs more pretraining/finetuning on code and math data to excel at reasoning tasks.\r\n - Language Limitations: This model was continually pretrained on english only data. If you are planning to use it for multilingual use cases I recommend fine-tuning or continued pretraining.\r\n\r\n\r\n## Read more\r\nhttps://huggingface.co/prince-canuma/Llama-3-6B-v0.1","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblaizzy%2Fcoding-llms-from-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblaizzy%2Fcoding-llms-from-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblaizzy%2Fcoding-llms-from-scratch/lists"}