https://github.com/taxborn/betsi

A light implementation of the 2017 Google paper 'Attention is all you need'.
https://github.com/taxborn/betsi

attention-is-all-you-need attention-mechanism transformer-model

Last synced: 5 months ago
JSON representation

A light implementation of the 2017 Google paper 'Attention is all you need'.

Host: GitHub
URL: https://github.com/taxborn/betsi
Owner: taxborn
License: mit
Created: 2023-10-31T22:03:28.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-11-27T21:20:21.000Z (almost 2 years ago)
Last Synced: 2025-06-03T04:11:24.584Z (5 months ago)
Topics: attention-is-all-you-need, attention-mechanism, transformer-model
Language: Jupyter Notebook
Homepage:
Size: 2.35 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # BETSI

[arXiv paper](https://arxiv.org/abs/1706.03762)

A light implementation of the 2017 Google paper 'Attention is all you need'. BETSI is the name of the model,

which is a [recursive acronym](https://en.wikipedia.org/wiki/Recursive_acronym) standing for

**BETSI: English to shitty Italian**, as the training time I allowed on my graphics card did not give enough

time for amazing results.

For this implementation I will implement a translation from English to Italian, as Tranformer models are exceptional at language 

translation and this seems to be a common use of light implementations of this paper.

The dataset I will be using is the [opus books](https://opus.nlpl.eu/Books.php) dataset which is a collection of copyright free books.

The book content of these translations are free for personal, educational, and research use. 

[OPUS language resource paper](http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf).

## Notes

I'm creating notes as I go, which can be found in [NOTES.md](./NOTES.md).

## Transformer model architecture

![Transformer model](./resources/transformer-model.png)

## Requirements

There is a [requirements.txt](./requirements.txt) that has the packages needed to run this.

I used PyTorch with [ROCm](https://en.wikipedia.org/wiki/ROCm) as this sped up training A LOT. Training this model on CPU on 

my laptop takes around 5.5 hours per epoch, while training the model on GPU on my desktop takes around 13.5 minutes (24.4 times faster!).

## TODO and tenative timeline:

- [X] Input Embeddings

- [X] Positional Encoding

- [X] Layer Normalization **- Due by 11/1**

- [X] Feed forward

- [X] Multi-Head attention

- [X] Residual Connection

- [X] Encoder

- [X] Decoder **- Due by 11/8**

- [X] Linear Layer

- [X] Transformer

- [X] Tokenizer **- Due by 11/15**

- [X] Dataset

- [X] Training loop

- [X] Visualization of the model **- Due by 11/22**

- [X] Install AMD RocM to train with GPU **- Attempt to do by end**

## References used

- [Dropout information](https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/)

- [Input embedding and positional encoding video](https://www.youtube.com/watch?v=3mTsYm9qQFA)

- [arXiv paper](https://arxiv.org/abs/1706.03762)

- [Transformer model overview](https://www.youtube.com/watch?v=4Bdc55j80l8)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/taxborn/betsi

Awesome Lists containing this project

README