https://github.com/eonu/transformers-from-scratch

Modular Python implementation of encoder-only, decoder-only and encoder-decoder transformer architectures from scratch, as detailed in Attention Is All You Need.
https://github.com/eonu/transformers-from-scratch

attention-is-all-you-need attention-mechanism generation generative-ai gpt llm nlp nlu summarization torch transformer transformer-from-scratch translation

Last synced: 26 days ago
JSON representation

Modular Python implementation of encoder-only, decoder-only and encoder-decoder transformer architectures from scratch, as detailed in Attention Is All You Need.

Host: GitHub
URL: https://github.com/eonu/transformers-from-scratch
Owner: eonu
License: mit
Created: 2024-07-16T10:21:46.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-24T22:26:47.000Z (about 1 year ago)
Last Synced: 2025-02-09T01:43:00.019Z (8 months ago)
Topics: attention-is-all-you-need, attention-mechanism, generation, generative-ai, gpt, llm, nlp, nlu, summarization, torch, transformer, transformer-from-scratch, translation
Language: Python
Homepage:
Size: 28 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Transformers From Scratch

^{Contents:

Features ·

Example ·

Details ·

Datasets ·

Models and notebooks ·

Repository structure ·

Installation ·

Running ·

References}

The repository contains a modular Python implementation of transformer architectures for natural language understanding and generation tasks, according to:

- The seminal paper _Attention Is All You Need_ by Vaswani et al.^[1] that details the novel attention-based transformer architecture and its application to sequence-to-sequence tasks, demonstrating its effectiveness by achieving state-of-the-art performance in machine translation, surpassing previous LSTM and CNN based neural machine translation architectures.
- The chapter on _Transformers and Large Language Models_ from _Speech and Language Processing_ by Jurafsky & Martin^[2] which provides a more comprehensive and illustrative look into some of the high-level details discussed in _Attention Is All You Need_.

## Features

- Generic encoder-only, decoder-only and encoder-decoder transformer architectures.
- Wrappers for causal language modelling, sequence-to-sequence generation and classification/regression.
- Various decoding methods for causal/sequence-to-sequence generation:
- Search-based (greedy and beam search)
- Sampling-based (nucleus, temperature and top-k sampling)
- Example applications to real-world datasets.

### PyTorch restrictions

This project is implemented using [PyTorch](https://pytorch.org/) and [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/).

As PyTorch provides a number of transformer and attention related layers in its [`torch.nn`](https://pytorch.org/docs/stable/nn.html) submodule, this project explicitly avoids the use of:

- [`torch.nn.Transformer`](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html#torch.nn.Transformer)
- [`torch.nn.TransformerEncoder`](https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html#torch.nn.TransformerEncoder)/[`torch.nn.TransformerEncoderLayer`](https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoderLayer.html#torch.nn.TransformerEncoderLayer)
- [`torch.nn.TransformerDecoder`](https://pytorch.org/docs/stable/generated/torch.nn.TransformerDecoder.html#torch.nn.TransformerDecoder)/[`torch.nn.TransformerDecoderLayer`](https://pytorch.org/docs/stable/generated/torch.nn.TransformerDecoderLayer.html#torch.nn.TransformerDecoderLayer)
- [`torch.nn.MultiHeadAttention`](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html#torch.nn.MultiheadAttention)
- [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html#torch.nn.functional.scaled_dot_product_attention)

All other layers provided by `torch.nn` are allowed, including:

- [`nn.Embedding`](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html#torch.nn.Embedding): For token embedding look-up by vocabulary ID.
- [`nn.LayerNorm`](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html#torch.nn.LayerNorm): For layer normalization as implemented in _Attention Is All You Need_.

### Other restrictions

- Transformer models implemented and made available in other libraries such as HuggingFace's [`transformers`](https://huggingface.co/docs/transformers/en/index) are not used in this project.
- However, the tokenizers provided by `transformers` were used, as developing tokenization algorithms was not the primary objective of this project.
- No existing _"x from scratch"_ resources were used, such as the famous _Let's build GPT: from scratch, in code, spelled out._ by Andrej Karpathy^[3].
- No other online resources were used, apart from official documentation for packages such as [PyTorch](https://pytorch.org/docs/stable/index.html), [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/) and [Huggingface Tokenizers](https://huggingface.co/docs/transformers/en/main_classes/tokenizer).

## Example

Training a causal language model to generate "Florida man"-style news headlines.

```python
from transformers import LlamaTokenizer

from transformer.params import TransformerParams, TemperatureSamplingParams
from transformer.models import CausalLM
from transformer.decoding import TemperatureSamplingDecoder

# initialize HuggingFace tokenizer
tokenizer = LlamaTokenizer.from_pretrained(
"huggyllama/llama-7b", add_eos_token=True, legacy=False
)
tokenizer.add_special_tokens({"pad_token": ""})

# initialize the causal language model
model = CausalLM(
params=TransformerParams(context_length=64),
tokenizer=tokenizer,
)

# train the language model
model.train(...)

# initialize decoder for sequence generation
decoder = TemperatureSamplingDecoder(
params=TemperatureSamplingParams(max_length=100, temperature=0.5, k=5),
model=model,
)

# generation without context
decoder.generate()
'Florida man arrested after baby alligator, guns, drugs found inside truck'

# generation with context
decoder.generate("Florida man shot")
'Florida man shot and killed while attempting to steal pizza and Pokemon cards from Target'
```

## Details

While the original architecture described in _Attention Is All You Need_ is an encoder-decoder based architecture using transformers for neural machine translation which is a sequence-to-sequence learning task, this project was designed to be more general, allowing for a variety of natural language tasks by implementing encoder-only, decoder-only and encoder-decoder architectures.

Encoder-only
Decoder-only
Encoder-decoder

Diagram

Tasks
Contextualized embedding and supervised inference
Autoregressive generation
Sequence-to-sequence generation

Example use-cases

Producing contextualized token embeddings

Sentiment classification

Intent classification

Text generation

Machine translation

Text summarization

## Datasets

The following datasets were used to test the above transformer implementations on various tasks.

- [arXiv Paper Abstracts](https://www.kaggle.com/datasets/spsayakpaul/arxiv-paper-abstracts): arXiv manuscripts and their metadata including titles, abstracts and categories.
- [CommonLit Readability Prize](https://www.kaggle.com/competitions/commonlitreadabilityprize): Literary passages and their associated "readability" score for use in grade 3-12 classrooms.
- [Reddit r/FloridaMan](https://www.kaggle.com/datasets/bcruise/reddit-rfloridaman): News headlines about various (often funny and irrational) actions performed by Florida men and women.
- [Europarl](https://www.kaggle.com/datasets/nltkdata/europarl): Transcriptions of European Parliament proceedings between 1996-2006, collected in 11 languages.

## Models and notebooks

### Encoder-only models

- [`ClassifierLM`](transformer/models/classifier.py): A generic transformer-based language model for assigning classes to text.
- [`notebooks/arxiv_categorization.ipynb`](notebooks/arxiv_categorization.ipynb) applies this model to the _arXiv Paper Abstracts_ dataset to categorize arXiv manuscripts based on their titles.
- [`RegressorLM`](transformer/models/regressor.py): A generic transformer-based language model for assigning scores to text.
- [`notebooks/commonlit_readability.ipynb`](notebooks/commonlit_readability.ipynb) applies this model to the _CommonLit Readability Prize_ dataset to rate the complexity of literary passages for grade 3-12 students.

### Decoder-only models

- [`CausalLM`](transformer/models/causal.py): A generic transformer-based language model for generating text in an autoregressive manner.
- [`notebooks/florida_man_generation.ipynb`](notebooks/florida_man.ipynb) applies this model to the _Reddit r/FloridaMan_ dataset to generate humorous news headlines involving the (mis)adventures of Florida men and women.

### Encoder-decoder models

- [`Seq2SeqLM`](transformer/models/seq2seq.py): A generic transformer-based language model for generating output text given an input text.
- [`notebooks/arxiv_summarization.ipynb`](notebooks/arxiv_summarization.ipynb) applies this model to the _arxiv Paper Abstracts_ dataset to generate arXiv paper titles by summarizing their corresponding abstracts.
- [`notebooks/europarl_translation.ipynb`](notebooks/europarl_translation.ipynb) applies this model to the _Europarl_ dataset to translate transcribed parliamentiary proceedings from French to English.

## Repository structure

- [**`notebooks/`**](notebooks/): Notebooks applying the models in [`transformer.models`](transformer/models/) to various datasets.
- [**`transformer/`**](transformer/): Core package containing the transformer implementations.
- [**`dataloaders/`**](transformer/dataloaders/): [`LightningDataModule`](https://lightning.ai/docs/pytorch/stable/data/datamodule.html)s for each model in [`transformer.models`](transformer/models/).
- [**`decoding/`**](transformers/decoding/): Decoding method implementations for causal and sequence-to-sequence LMs.
- [**`models/`**](transformer/models/): Task-specific transformers implemented using [`transformer.modules.transformers`](transformer/modules/transformers/).
- [**`modules/`**](transformer/modules/): [`LightningModule`](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html)s used within the transformers in [`transformer.models`](transformer/models/).
- [**`transformers/`**](transformer/modules/transformers/): Encoder-only, decoder-only and encoder-decoder transformer definitions.
- [`attention.py`](transformer/modules/attention.py): Masked/unmasked multi-head self attention definition.
- [`block.py`](transformer/modules/block.py): Transformer block definition.
- [`embedding.py`](transformer/modules/embedding.py): Positional encoding and input embedding definition.
- [**`params/`**](transformer/params/): Pydantic hyper-parameter classes.
- [**`utils/`**](transformer/utils/): Supporting custom layers, functions and constants.

## Installation

The transformer implementation is installable as a local Python package, named `transformer`.

```console
pip install -e .
```

To run the notebooks, you will need additional dependencies which can be installed with the `notebooks` extra.

```console
pip install -e ".[notebooks]"
```

**This package was developed on Python 3.11.8, so it is recommended to use a virtual environment with the same version.**

## Running

You should be able to simply run the Jupyter notebooks in the [`notebooks/`](notebooks/) folder.

_Beware, they take time – even with a good GPU (especially the sequence-to-sequence ones)!_

## References

[1]

Vaswani et al., "Attention Is All You Need", Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), 6000-6010.

[2]

Dan Jurafsky & James H. Martin, "Transformers and Large Language Models", Speech and Language Processing, 3rd ed. draft (2024), ch. 10.

[3]

Andrej Karpathy "Let's build GPT: from scratch, in code, spelled out.", YouTube (2023)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eonu/transformers-from-scratch

Awesome Lists containing this project

README

Transformers From Scratch