https://github.com/shreydan/scratchformers

building various transformer model architectures and its modules from scratch.
https://github.com/shreydan/scratchformers

computer-vision multimodal nlp pytorch transformers

Last synced: 6 months ago
JSON representation

building various transformer model architectures and its modules from scratch.

Host: GitHub
URL: https://github.com/shreydan/scratchformers
Owner: shreydan
Created: 2023-08-03T14:41:23.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-03-14T18:40:18.000Z (10 months ago)
Last Synced: 2025-05-15T13:11:32.108Z (8 months ago)
Topics: computer-vision, multimodal, nlp, pytorch, transformers
Language: Jupyter Notebook
Homepage:
Size: 13.5 MB
Stars: 9
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # ScratchFormers

### implementing transformers from scratch.

> Attention is all you need.

## Modules

- **[einops starter](./_modules/einops.ipynb)**

  

- **[attentions](./_modules/attentions.ipynb)**

  - multi-head causal attention

  - multi-head cross attention

  - multi-head grouped query attention (torch + einops)

  

- **positional embeddings**

  - [rotary positional embeddings (RoPE)](./_modules/rope.ipynb)

  

- **[Low-Rank Adaptation (LoRA)](./_modules/LoRA/)**

  - implementing LoRA based on this wonderful [tutorial by Sebastian Raschka](https://lightning.ai/lightning-ai/studios/code-lora-from-scratch?view=public&section=all)

  - finetuning LoRA adapted `deberta-v3-base` on IMDb dataset

- **[KV Cache](./_modules/KV-Cache/)**

  - implemented KV Cache that supports RoPE

  - Works and verified with Llama (RoPE + GQA) 

## Models

- **LlaMA**

  - for process, check [building_llama_complete.ipynb](./LLaMA/building_llama_complete.ipynb)

  - model [implementation](./LLaMA/llama.py)

  - inference (used [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) which is based on LlaMA architecture but super small) [code](./LLaMA/llama-inference.ipynb) [kaggle](https://www.kaggle.com/code/shreydan/llama/)

  - super cool resource: [LLMs From Scratch by Sebastian Raschka](https://github.com/rasbt/LLMs-from-scratch)

  - added KV Caching support: [llama_with_kv_caching.ipynb](./_modules/KV-Cache/llama_with_kv_caching.ipynb)

- **simple Vision Transformer**

  - for process, check [building_ViT.ipynb](./ViT/building_ViT.ipynb)

  - model [implementation](./ViT/vit.py)

  - used `mean` pooling instead of `[class]` token

- **GPT2**

  - for process, check [buildingGPT2.ipynb](./GPT2/buildingGPT2.ipynb)

  - model [implementation](./GPT2/gpt2.py)

  - built in such a way that it supports loading pretrained openAI/huggingface weights [gpt2-load-via-hf.ipynb](./GPT2/gpt2-load-via-hf.ipynb)

  - for my own custom trained causal LM, checkout [shakespeareGPT](https://github.com/shreydan/shakespeareGPT) which is although a bit more like GPT-1.

- **OpenAI CLIP**

  - implemented `ViT-B/32` variant

  - for process, check [building_clip.ipynb](./OpenAI-CLIP/building_clip.ipynb)

  - inference req: install clip for tokenization and preprocessing: `pip install git+https://github.com/openai/CLIP.git`

  - model [implementation](./OpenAI-CLIP/model.py)

  - zero-shot inference [code](./OpenAI-CLIP/zeroshot.py)

  - built in such a way that it supports loading pretrained openAI weights and IT WORKS!!!

  - My lighter implementation of this using existing image and language models trained on Flickr8k dataset is available here: [liteCLIP](https://github.com/shreydan/liteclip)

- **Encoder Decoder Transformer**

  - for process, check [building_encoder-decoder.ipynb](./encoder-decoder/building_encoder-decoder.ipynb)

  - model [implementation](./encoder-decoder/model.py)

  - src_mask for encoder is optional but is nice to have since it is used to mask out the pad tokens so attention is not considered for those tokens.

  - used learned embeddings for position instead of sin/cos as per the OG.

  - I trained a model for multilingual machine translation.

    - Translates english to hindi and telugu.

    - change: single encoder & decoder embedding layer since I used a single tokenizer.

    - for the code and results check: [shreydan/multilingual-translation](https://github.com/shreydan/multilingual-translation)

- **BERT - MLM**

  - for process of masked language modeling, check [masked-language-modeling.ipynb](./BERT-MLM/masked-language-modeling.ipynb)

  - model [implementation](./BERT-MLM/model.py)

  - simplification: for pre-training no use of [CLS] & [SEP] tokens since I only built the model for masked language modeling and not for next sentence prediction. 

  - I trained an entire model on the wikipedia dataset, more info in [shreydan/masked-language-modeling](https://github.com/shreydan/masked-language-modeling) repo.

  - once, pretrained the MLM head can be replaced with any other downstream task head.

- **ViT MAE**

  - Paper: [Masked autoencoders are scalable vision learners](https://arxiv.org/abs/2111.06377)  

  - model [implementation](./vitmae/model.py)

  - for process, check: [building-vitmae.ipynb](./vitmae/building-vitmae.ipynb)

  - Quite reliant on the original code released by authors.

  - Only simplification: No [CLS] token so used mean pooling

  - The model can be trained 2 ways:

    - For pretraining: the decoder can be thrown away and the encoder can be used for downstream tasks

    - For visualization: can be used to reconstruct masked images.

  - I trained a smaller model for reconstruction visualization: [ViTMAE on Animals Dataset](./vitmae/animals-vitmae.ipynb)

- **UNETR**

  - 3D segmentation model for medical domain

  - Transformer based architecture, more [info](https://paperswithcode.com/method/unetr)

  - process: [building_unetr](./UNETR/building_unetr.ipynb) 

### Requirements

```

einops

torch

torchvision

numpy

matplotlib

pandas

```

---

Here's my puppy's picture:

![sumo](sumo.jpg)

---

```

God is our refuge and strength, a very present help in trouble.

Psalm 46:1

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shreydan/scratchformers

Awesome Lists containing this project

README