{"id":28166220,"url":"https://github.com/shreydan/scratchformers","last_synced_at":"2025-07-12T02:34:34.436Z","repository":{"id":212074306,"uuid":"674274880","full_name":"shreydan/scratchformers","owner":"shreydan","description":"building various transformer model architectures and its modules from scratch.","archived":false,"fork":false,"pushed_at":"2025-03-14T18:40:18.000Z","size":14195,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-15T13:11:32.108Z","etag":null,"topics":["computer-vision","multimodal","nlp","pytorch","transformers"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shreydan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-03T14:41:23.000Z","updated_at":"2025-03-14T18:40:21.000Z","dependencies_parsed_at":"2025-03-14T19:27:49.910Z","dependency_job_id":"e1251a41-0bdc-43fa-8de2-1167243c2736","html_url":"https://github.com/shreydan/scratchformers","commit_stats":null,"previous_names":["shreydan/scratchformers"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shreydan/scratchformers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreydan%2Fscratchformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreydan%2Fscratchformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreydan%2Fscratchformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreydan%2Fscratchformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shreydan","download_url":"https://codeload.github.com/shreydan/scratchformers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreydan%2Fscratchformers/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264925821,"owners_count":23684211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","multimodal","nlp","pytorch","transformers"],"created_at":"2025-05-15T13:11:28.800Z","updated_at":"2025-07-12T02:34:34.377Z","avatar_url":"https://github.com/shreydan.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ScratchFormers\n### implementing transformers from scratch.\n\n\u003e Attention is all you need.\n\n## Modules\n\n- **[einops starter](./_modules/einops.ipynb)**\n  \n- **[attentions](./_modules/attentions.ipynb)**\n  - multi-head causal attention\n  - multi-head cross attention\n  - multi-head grouped query attention (torch + einops)\n  \n- **positional embeddings**\n  - [rotary positional embeddings (RoPE)](./_modules/rope.ipynb)\n  \n- **[Low-Rank Adaptation (LoRA)](./_modules/LoRA/)**\n  - implementing LoRA based on this wonderful [tutorial by Sebastian Raschka](https://lightning.ai/lightning-ai/studios/code-lora-from-scratch?view=public\u0026section=all)\n  - finetuning LoRA adapted `deberta-v3-base` on IMDb dataset\n\n- **[KV Cache](./_modules/KV-Cache/)**\n  - implemented KV Cache that supports RoPE\n  - Works and verified with Llama (RoPE + GQA) \n\n## Models\n\n- **LlaMA**\n  - for process, check [building_llama_complete.ipynb](./LLaMA/building_llama_complete.ipynb)\n  - model [implementation](./LLaMA/llama.py)\n  - inference (used [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) which is based on LlaMA architecture but super small) [code](./LLaMA/llama-inference.ipynb) [kaggle](https://www.kaggle.com/code/shreydan/llama/)\n  - super cool resource: [LLMs From Scratch by Sebastian Raschka](https://github.com/rasbt/LLMs-from-scratch)\n  - added KV Caching support: [llama_with_kv_caching.ipynb](./_modules/KV-Cache/llama_with_kv_caching.ipynb)\n\n- **simple Vision Transformer**\n  - for process, check [building_ViT.ipynb](./ViT/building_ViT.ipynb)\n  - model [implementation](./ViT/vit.py)\n  - used `mean` pooling instead of `[class]` token\n\n\n- **GPT2**\n  - for process, check [buildingGPT2.ipynb](./GPT2/buildingGPT2.ipynb)\n  - model [implementation](./GPT2/gpt2.py)\n  - built in such a way that it supports loading pretrained openAI/huggingface weights [gpt2-load-via-hf.ipynb](./GPT2/gpt2-load-via-hf.ipynb)\n  - for my own custom trained causal LM, checkout [shakespeareGPT](https://github.com/shreydan/shakespeareGPT) which is although a bit more like GPT-1.\n\n\n- **OpenAI CLIP**\n  - implemented `ViT-B/32` variant\n  - for process, check [building_clip.ipynb](./OpenAI-CLIP/building_clip.ipynb)\n  - inference req: install clip for tokenization and preprocessing: `pip install git+https://github.com/openai/CLIP.git`\n  - model [implementation](./OpenAI-CLIP/model.py)\n  - zero-shot inference [code](./OpenAI-CLIP/zeroshot.py)\n  - built in such a way that it supports loading pretrained openAI weights and IT WORKS!!!\n  - My lighter implementation of this using existing image and language models trained on Flickr8k dataset is available here: [liteCLIP](https://github.com/shreydan/liteclip)\n\n\n- **Encoder Decoder Transformer**\n  - for process, check [building_encoder-decoder.ipynb](./encoder-decoder/building_encoder-decoder.ipynb)\n  - model [implementation](./encoder-decoder/model.py)\n  - src_mask for encoder is optional but is nice to have since it is used to mask out the pad tokens so attention is not considered for those tokens.\n  - used learned embeddings for position instead of sin/cos as per the OG.\n  - I trained a model for multilingual machine translation.\n    - Translates english to hindi and telugu.\n    - change: single encoder \u0026 decoder embedding layer since I used a single tokenizer.\n    - for the code and results check: [shreydan/multilingual-translation](https://github.com/shreydan/multilingual-translation)\n\n\n- **BERT - MLM**\n  - for process of masked language modeling, check [masked-language-modeling.ipynb](./BERT-MLM/masked-language-modeling.ipynb)\n  - model [implementation](./BERT-MLM/model.py)\n  - simplification: for pre-training no use of [CLS] \u0026 [SEP] tokens since I only built the model for masked language modeling and not for next sentence prediction. \n  - I trained an entire model on the wikipedia dataset, more info in [shreydan/masked-language-modeling](https://github.com/shreydan/masked-language-modeling) repo.\n  - once, pretrained the MLM head can be replaced with any other downstream task head.\n\n- **ViT MAE**\n  - Paper: [Masked autoencoders are scalable vision learners](https://arxiv.org/abs/2111.06377)  \n  - model [implementation](./vitmae/model.py)\n  - for process, check: [building-vitmae.ipynb](./vitmae/building-vitmae.ipynb)\n  - Quite reliant on the original code released by authors.\n  - Only simplification: No [CLS] token so used mean pooling\n  - The model can be trained 2 ways:\n    - For pretraining: the decoder can be thrown away and the encoder can be used for downstream tasks\n    - For visualization: can be used to reconstruct masked images.\n  - I trained a smaller model for reconstruction visualization: [ViTMAE on Animals Dataset](./vitmae/animals-vitmae.ipynb)\n\n- **UNETR**\n  - 3D segmentation model for medical domain\n  - Transformer based architecture, more [info](https://paperswithcode.com/method/unetr)\n  - process: [building_unetr](./UNETR/building_unetr.ipynb) \n\n\n### Requirements\n```\neinops\ntorch\ntorchvision\nnumpy\nmatplotlib\npandas\n```\n\n---\n\n\nHere's my puppy's picture:\n![sumo](sumo.jpg)\n\n---\n\n```\nGod is our refuge and strength, a very present help in trouble.\nPsalm 46:1\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreydan%2Fscratchformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshreydan%2Fscratchformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreydan%2Fscratchformers/lists"}