https://github.com/rish-16/ma4270-project
Source code for MA4270: Data Modelling and Computation on Transformers and Nadaraya-Watson Kernel Regression
https://github.com/rish-16/ma4270-project
Last synced: about 1 year ago
JSON representation
Source code for MA4270: Data Modelling and Computation on Transformers and Nadaraya-Watson Kernel Regression
- Host: GitHub
- URL: https://github.com/rish-16/ma4270-project
- Owner: rish-16
- License: mit
- Created: 2024-03-28T08:51:39.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-29T00:21:36.000Z (about 2 years ago)
- Last Synced: 2024-05-29T14:36:16.798Z (about 2 years ago)
- Language: Python
- Size: 8.79 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Self-Attention and Nadaraya-Watson Kernel Regression
Here, we show connections between the **Transformer** and the **Kernel Regression**. We show how the dot-product between queries $\mathbf{q}_i$ and keys $\mathbf{k}_i$ can be swapped out with miscellaneous kernel operations $\alpha(\cdot, \cdot)$, chief among them being the _Nadaraya-Watson kernel_ $K$. We also empirically show how Self-attention variants can successfully learn on sequential data like periodic and aperiodic functions.
> This is a class project for _MA4270: Data Modelling and Computation_ by Rishabh Anand (A0220603Y) and Ryan Chung Yi Sheng (A0219702J). [[`pdf`](https://github.com/rish-16/ma4270-project/blob/main/MA4270_Final_Report.pdf)]