https://github.com/giuseppebellamacina/little_language_model

Implementation of a Transformer and training on Dante's Divina Commedia
https://github.com/giuseppebellamacina/little_language_model

llm pythorch transformer

Last synced: about 1 year ago
JSON representation

Implementation of a Transformer and training on Dante's Divina Commedia

Host: GitHub
URL: https://github.com/giuseppebellamacina/little_language_model
Owner: GiuseppeBellamacina
Created: 2024-11-13T21:12:48.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-13T21:18:34.000Z (over 1 year ago)
Last Synced: 2025-02-01T10:42:30.254Z (over 1 year ago)
Topics: llm, pythorch, transformer
Language: Jupyter Notebook
Homepage:
Size: 263 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # LittleLanguageModel

LittleLanguageModel is a Transformer-based language model, designed for text generation and sequence modeling tasks. This repository contains the implementation of a transformer architecture, with a specific instance called DanteGPT, trained on a dataset of Italian text.

## Overview

The model is based on the Transformer architecture, utilizing multiple layers of self-attention heads and feedforward neural networks to generate human-like text. DanteGPT is a variant of LittleLanguageModel, specifically trained to generate text inspired by Dante Alighieri’s *Divine Comedy*. 

### Features:

- **Transformer Architecture**: Multiple attention heads and layers to handle complex relationships within the input sequence.

- **Text Generation**: Capability to generate text based on an initial prompt using a temperature-controlled sampling technique.

- **Training**: The model is trained on a large corpus of Italian text to capture language structure and style.

## Installation

To set up the environment, clone the repository and install the dependencies:

```bash

git clone https://github.com/GiuseppeBellamacina/LittleLanguageModel.git

cd LittleLanguageModel

pip install -r requirements.txt

```

## Usage

### Training the Model

To train the model, use the `train_model()` function with the desired parameters. This function will train the model on a dataset, periodically evaluating and logging the loss.

```python

from model import LittleLanguageModel

from train import train_model

import torch

file = 'file.txt'

text = open(file, 'r', encoding='utf-8').read()

vocab = sorted(list(set(text)))

encode = lambda s: [vocab.index(c) for c in s]

decode = lambda l: "".join([vocab[c] for c in l]) 

# Split the dataset into training and validation sets

x = int(0.9*len(text))

text = torch.tensor(encode(text), dtype=torch.long)

train, val = text[:x], text[x:]

# Define the model parameters

vocab_size = len(vocab)  # Based on the dataset

embed_size = 512

num_heads = 8

head_size = embed_size // num_heads

num_layers = 6

block_size = 128  # Example sequence length

batch_size = 64

# Find the device (GPU or CPU)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Instantiate the model

model = LittleLanguageModel(vocab_size, head_size, embed_size, block_size, num_heads, num_layers, device).to(device)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)

# Train the model

EPOCHS = 5000

train_losses, val_losses = train_model(model, train, val, block_size, batch_size, device, optimizer, EPOCHS)

```

### Generating Text

To generate text using the trained model, you can use the `generate()` method. You provide an initial prompt and the number of tokens to generate.

```python

# Example for generating text

ids = torch.tensor(

    encode("Nel mezzo del cammin di nostra vita"),

    dtype=torch.long

).unsqueeze(0).to(device)

generated_ids = model.generate(

    ids,

    max_new_tokens=2000,

    temperature=0.8

)

print(decode(generated_ids[0].tolist()))

```

### Available Functions:

- **`train_model()`**: Trains the model on the training dataset and evaluates on the validation set.

- **`generate()`**: Generates text based on an initial prompt.

## Model Architecture

- **Embedding Layer**: Encodes input tokens into dense vectors.

- **Positional Encoding**: Adds positional information to input tokens to maintain the sequential nature of the data.

- **Multi-Head Attention**: Multiple self-attention heads to capture different types of relationships in the data.

- **Feedforward Layers**: A fully connected neural network for further learning.

- **Layer Normalization**: Normalizes the input to each layer to speed up training and improve stability.

## Training Data

DanteGPT is trained on a text corpus that includes a selection of classical Italian literature, primarily focusing on Dante Alighieri’s *Divine Comedy*.

## Acknowledgments

- The Transformer model architecture is based on the paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762).

- Special thanks to the open-source community for the contributions that made this project possible.

---

You can customize the sections (especially the usage and requirements) based on the specific details of your project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/giuseppebellamacina/little_language_model

Awesome Lists containing this project

README