https://github.com/forquosh/gpt

Generative Pretrained Transformer built from scratch using PyTorch.
https://github.com/forquosh/gpt

feedforward-neural-network generative-ai gpt jupyter-notebook large-language-models learning-by-doing neural-network python pytorch self-attention transformers

Last synced: 8 months ago
JSON representation

Generative Pretrained Transformer built from scratch using PyTorch.

Host: GitHub
URL: https://github.com/forquosh/gpt
Owner: Forquosh
License: mit
Created: 2025-02-11T10:25:44.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-02-11T15:59:16.000Z (8 months ago)
Last Synced: 2025-02-11T17:23:00.383Z (8 months ago)
Topics: feedforward-neural-network, generative-ai, gpt, jupyter-notebook, large-language-models, learning-by-doing, neural-network, python, pytorch, self-attention, transformers
Language: Jupyter Notebook
Homepage:
Size: 16.2 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# GPT

A PyTorch implementation of a GPT-like language model with text preprocessing utilities.

## Overview

This project implements a transformer-based language model similar to GPT, designed for character-level text generation. It includes utilities for vocabulary generation and dataset splitting.

In this example, I tested it on the fabulous book **The Brothers Karamazov**, downloaded from **Project Gutenberg**. Feel free to change the text file or even try training it on an consacrated dataset (like OpenWebText for example), though on larger datasets the `vocab.py` and `split.py` might not work properly.

## Features

- Character-level language modeling
- Multi-head self-attention mechanism
- Memory-efficient data loading using memory mapping
- Text preprocessing utilities
- Configurable model architecture

## Requirements

- Python 3.9+
- PyTorch
- Jupyter Notebooks
- CUDA (optional, for GPU acceleration, **on Windows**)

## Project Structure

- `vocab.py` - Generates vocabulary from input text
- `split.py` - Splits text data into training and validation sets
- `GPT.ipynb` - Main model implementation and training

## Usage

### 1. Initialization

### INITIALIZATION STEPS FOR **MAC OS**

Run the terminal in a directory of choice.

Create a _Python Virtual Environment_ and _activate_ it:

```bash
python3 -m venv venv
source ./venv/bin/activate
```

Install the _MacOS requirements_:

```bash
pip3 install -r requirements_macos.txt
```

### INITIALIZATION STEPS FOR **WINDOWS**

Install [Python](https://www.python.org/downloads/) on your system. If you have it already, skip this step.

Install [Anaconda](https://www.anaconda.com/download). Follow the steps from this link.

Once installed, run **Anaconda Prompt** in a directory of choice.

Create a _Python Virtual Environment_ and _activate_ it:

```bash
python3 -m venv venv
venv\Scripts\activate
```

Install the _Windows requirements_:

```bash
pip3 install -r requirements_windows.txt
```

### ! These requirements are different. On Windows, PyTorch is installed with _CUDA_ support, if available

### 2. Prepare Your Data

First, _add your desired data file and generate the vocabulary from your text_:

```bash
python3 vocab.py
```

Then, _split your data into training and validation sets_:

```bash
python3 split.py
```

### 2. Train the Model

Install a _new kernel_ to use in your Jupyter Notebook:

```bash
python3 -m ipykernel install --user --name=venv --display-name "GPTKernel"
```

Run _Jupyter Notebook_:

```bash
jupyter notebook
```

Open `GPT.ipynb`.

Select `GPTKernel` and run the cells _sequentially_. The notebook contains:

- Model architecture implementation
- Training loop
- Text generation functionality

### Model Parameters

The default hyperparameters are:

- Batch size: 32
- Block size: 128
- Maximum training iterations: 300
- Learning Rate: 2e-5
- Evaluation: every 50 iterations
- Embedding dimension: 300
- Number of heads: 4
- Number of layers: 4
- Dropout: 0.2

These can be adjusted based on your hardware capabilities and requirements.

## Model Architecture

The model implements a transformer architecture with:

- Multi-head self-attention
- Position embeddings
- Layer normalization
- Feed-forward networks

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/forquosh/gpt

Awesome Lists containing this project

README