https://github.com/sohamratnaparkhi/nano-gpt
https://github.com/sohamratnaparkhi/nano-gpt
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/sohamratnaparkhi/nano-gpt
- Owner: SohamRatnaparkhi
- Created: 2024-08-26T21:24:25.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-08-26T21:26:29.000Z (about 1 year ago)
- Last Synced: 2025-02-03T08:36:18.269Z (8 months ago)
- Language: Jupyter Notebook
- Size: 37.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Sure! Below is a sample `README.md` file for your project.
---
This model has been trained following Andrej Karpathy's `Let's build GPT: from scratch, in code, spelled out`. Link: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=3736s
# Transformer Language Model
This repository contains the implementation of a Transformer-based language model built with PyTorch. The model is designed to generate text based on a dataset provided by the user. The architecture consists of multi-head self-attention layers, feed-forward layers, and embedding layers.
## Table of Contents
- [Overview](#overview)
- [Requirements](#requirements)
- [Training](#training)
- [Model Architecture](#model-architecture)
- [Usage](#usage)
- [Parameters](#parameters)
- [Results](#results)
- [License](#license)## Overview
This project implements a Transformer-based model for text generation. The model is trained on a character-level dataset and can generate new sequences of text by predicting the next character given a sequence of previous characters.
## Requirements
- Python 3.7+
- PyTorch 1.7+
- CUDA (optional, for GPU support)Install the required dependencies using `pip`:
```bash
pip install torch
```## Training
To train the model, simply run the `train_model` function defined in `model.py`. This function will train the model for the specified number of epochs and save the model weights to `model_weights.pth`.
```python
python model.py
```The training process includes:
- Loading the dataset from `dataset.txt`.
- Splitting the data into training and validation sets.
- Training the model using the Adam optimizer.
- Saving the trained model weights.## Model Architecture
The model is based on the Transformer architecture and consists of the following components:
- **Embedding Layers**: Maps input characters to dense vectors.
- **Multi-Head Self-Attention**: Captures relationships between different positions in the input sequence.
- **Feed-Forward Layers**: Applies non-linear transformations to the data.
- **Layer Normalization**: Normalizes inputs to the layers for faster convergence.
- **Final Linear Layer**: Maps the output of the last block to the vocabulary size.### Total Parameters
The model has a total of **17,196,424** parameters.
## Usage
To generate text using the trained model, use the `generate` method. This will generate a sequence of text based on the learned distribution from the training data.
```python
model = Model()
model.load_state_dict(torch.load('model_weights.pth'))
print(model.generate(max_tokens=100))
```## Parameters
Key hyperparameters used in the model:
- `batch_size`: 64
- `block_size`: 256
- `epochs`: 5000
- `learning_rate`: 3e-4
- `n_embed`: 384
- `n_heads`: 6
- `n_layers`: 6
- `dropout`: 0.2
- `train_ratio`: 0.9These parameters can be adjusted in `model.py` to optimize performance.
## Results
During training, the model's loss is printed every 500 epochs. The final model is capable of generating coherent sequences of text based on the training data.
Sample output from the trained model:
```
Epoch: 5000, Loss: 1.8405
Generated Text: results.txt
```## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
---
This `README.md` provides a comprehensive overview of your project, including how to train and use the model, its architecture, and key parameters.