https://github.com/sohamratnaparkhi/nano-gpt

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/sohamratnaparkhi/nano-gpt
Owner: SohamRatnaparkhi
Created: 2024-08-26T21:24:25.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-08-26T21:26:29.000Z (about 1 year ago)
Last Synced: 2025-02-03T08:36:18.269Z (8 months ago)
Language: Jupyter Notebook
Size: 37.8 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Sure! Below is a sample `README.md` file for your project.

---

This model has been trained following Andrej Karpathy's `Let's build GPT: from scratch, in code, spelled out`. Link: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=3736s

# Transformer Language Model

This repository contains the implementation of a Transformer-based language model built with PyTorch. The model is designed to generate text based on a dataset provided by the user. The architecture consists of multi-head self-attention layers, feed-forward layers, and embedding layers.

## Table of Contents

- [Overview](#overview)
- [Requirements](#requirements)
- [Training](#training)
- [Model Architecture](#model-architecture)
- [Usage](#usage)
- [Parameters](#parameters)
- [Results](#results)
- [License](#license)

## Overview

This project implements a Transformer-based model for text generation. The model is trained on a character-level dataset and can generate new sequences of text by predicting the next character given a sequence of previous characters.

## Requirements

- Python 3.7+
- PyTorch 1.7+
- CUDA (optional, for GPU support)

Install the required dependencies using `pip`:

```bash
pip install torch
```

## Training

To train the model, simply run the `train_model` function defined in `model.py`. This function will train the model for the specified number of epochs and save the model weights to `model_weights.pth`.

```python
python model.py
```

The training process includes:

- Loading the dataset from `dataset.txt`.
- Splitting the data into training and validation sets.
- Training the model using the Adam optimizer.
- Saving the trained model weights.

## Model Architecture

The model is based on the Transformer architecture and consists of the following components:

- **Embedding Layers**: Maps input characters to dense vectors.
- **Multi-Head Self-Attention**: Captures relationships between different positions in the input sequence.
- **Feed-Forward Layers**: Applies non-linear transformations to the data.
- **Layer Normalization**: Normalizes inputs to the layers for faster convergence.
- **Final Linear Layer**: Maps the output of the last block to the vocabulary size.

### Total Parameters

The model has a total of **17,196,424** parameters.

## Usage

To generate text using the trained model, use the `generate` method. This will generate a sequence of text based on the learned distribution from the training data.

```python
model = Model()
model.load_state_dict(torch.load('model_weights.pth'))
print(model.generate(max_tokens=100))
```

## Parameters

Key hyperparameters used in the model:

- `batch_size`: 64
- `block_size`: 256
- `epochs`: 5000
- `learning_rate`: 3e-4
- `n_embed`: 384
- `n_heads`: 6
- `n_layers`: 6
- `dropout`: 0.2
- `train_ratio`: 0.9

These parameters can be adjusted in `model.py` to optimize performance.

## Results

During training, the model's loss is printed every 500 epochs. The final model is capable of generating coherent sequences of text based on the training data.

Sample output from the trained model:

```
Epoch: 5000, Loss: 1.8405
Generated Text: results.txt
```

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

---

This `README.md` provides a comprehensive overview of your project, including how to train and use the model, its architecture, and key parameters.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sohamratnaparkhi/nano-gpt

Awesome Lists containing this project

README