https://github.com/mrdbourke/learn-transformers
Work in progress. Simple repository to learn Transformers (and transformers).
https://github.com/mrdbourke/learn-transformers
Last synced: about 2 months ago
JSON representation
Work in progress. Simple repository to learn Transformers (and transformers).
- Host: GitHub
- URL: https://github.com/mrdbourke/learn-transformers
- Owner: mrdbourke
- License: mit
- Created: 2023-06-09T03:27:26.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-27T03:49:47.000Z (over 1 year ago)
- Last Synced: 2025-03-05T18:58:56.129Z (about 2 months ago)
- Language: Jupyter Notebook
- Size: 688 KB
- Stars: 41
- Watchers: 3
- Forks: 11
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Learn Transformers (work-in-progress)
When I was growing up Transformers were cars that turned into robots.
Now they're the backbone of every machine learning and AI app.
The goal of this repo will be to learn (for myself) and provide simple resources for others on*:
1. The attention mechanism and the original Transformer architecture.
2. Various Transformer-based models (e.g. GPT).
3. The [`transformers`](https://huggingface.co/docs/transformers/index) library by Hugging Face (many different types of models here but why not?).1 & 2 will be more research focused where as 3 will be very practically applicable.
\*Outline subject to change.
## Prerequsites
Assumes basic knowledge of PyTorch (or any other ML framework) and deep learning in general.
See [learnpytorch.io](https://learnpytorch.io) for a beginner-friendly intro.
Or my [Learn PyTorch in a day video on YouTube](https://youtu.be/Z_ikDlimN6A) to get up to speed and then come back here.
## Resources
Some of the resources I've found useful (this will grow overtime).
* Transformer paper: https://arxiv.org/abs/1706.03762
* The annotated transformer: http://nlp.seas.harvard.edu/2018/04/01/attention.html
* Transformers from scratch: https://peterbloem.nl/blog/transformers
* Attention functions in PyTorch code: https://github.com/sooftware/attentions/blob/master/attentions.py
* xFormers by Facebook Research, an in-depth implementation of many Transformer Architecture components: https://github.com/facebookresearch/xformers
* https://lilianweng.github.io/posts/2018-06-24-attention/#self-attention
* https://jaykmody.com/blog/attention-intuition/## Extras
* Modifications to original Transformer architecture (warning: there are lots) - https://arxiv.org/abs/2102.11972