https://github.com/ssahas/implementing-gpt-from-scratch

Building a decoder-only (GPT-style) LLM from scratch using PyTorch and training it for text generation.
https://github.com/ssahas/implementing-gpt-from-scratch

datacleaning dataprocessing large-language-models llm llm-inference llm-training python

Last synced: 8 months ago
JSON representation

Building a decoder-only (GPT-style) LLM from scratch using PyTorch and training it for text generation.

Host: GitHub
URL: https://github.com/ssahas/implementing-gpt-from-scratch
Owner: SSahas
Created: 2024-09-07T04:30:48.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-06-08T10:47:28.000Z (about 1 year ago)
Last Synced: 2025-10-14T18:04:32.634Z (8 months ago)
Topics: datacleaning, dataprocessing, large-language-models, llm, llm-inference, llm-training, python
Language: Python
Homepage:
Size: 352 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Building and PreTraining Decoder only LLM Model (GPT style) from scratch with PyTorch
- Pretraining a LLM model for Text generation, used Salesforce/wikitext for training. The model was trained for 30000 iterations with a batch size of 8 for ~3 hours on a 16GB Tesla P100 (Kaggle Free gpu support). The training loss is around 3.7. After training, the model is generating english with understandable grammar.

- To train the model , clone the repository

```
git clone https://github.com/SSahas/Implementing-GPT-From-Scratch.git
```
## Training
- create tokenized data

```
python data/load_data.py
```
- Train the model

```
python train.py --config config/config.json
```

## Inference
- To generate text using a trained model
```
python sample.py --model_path path/to/saved/model --prompt "Your prompt here"
```

## Model Details and Loss curves
```
n_embd = 512
vocab_size = 50257
n_layers = 6
n_heads = 8
block_size = 512 # number to previous tokens to attend to perform attention
batch_size = 8
learning rate = 5e-4
```
- The x-axis represents iterations in hundreds. The model was trained for a total of 30,000 training steps.

Train Loss | Test loss
:-------------------------:|:-------------------------:
![](https://github.com/SSahas/Implementing-GPT-From-Scratch/blob/main/assets/train.png) | ![](https://github.com/SSahas/Implementing-GPT-From-Scratch/blob/main/assets/test.png)

# Sample Generations
> *This is used for its purpose . The castle has its most extensive military value , with its new weapons and the ability to draw guns against and destroy obstacles ,
but it has always been used for long - duration.*

> *Once there was no threat to the United States who are expecting asylum to the United States government . The National Hurricane Center issued the same day the agency requested them to the Washington National Weather Service agencies at any request . By 1997 , the agency also considered the agency had a $ 20 , 000 fine ( equivalent to $ 15 , 060 , 061 in 2016 ) for an upcoming hurricane.*

> *This is to be called the " great leader of all the major things and the most beautiful leader of all the time " he is " not so happy " if he and his co - workers will be able to accomplish the truth they are in vain when him to death .*

# References
- [Andrej karpathy-nanoGPT](https://github.com/karpathy/nanoGPT)
- [t5-pytorch](https://github.com/conceptofmind/t5-pytorch)
- [nanoT5](https://github.com/PiotrNawrot/nanoT5)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ssahas/implementing-gpt-from-scratch

Awesome Lists containing this project

README