Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ischlag/fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
https://github.com/ischlag/fast-weight-transformers
Last synced: 4 months ago
JSON representation
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
- Host: GitHub
- URL: https://github.com/ischlag/fast-weight-transformers
- Owner: ischlag
- License: mit
- Created: 2021-02-11T11:50:33.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-06-10T13:22:22.000Z (over 3 years ago)
- Last Synced: 2024-08-01T13:24:49.576Z (7 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 934 KB
- Stars: 95
- Watchers: 5
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Linear Transformers Are Secretly Fast Weight Programmers
This repository contains the code accompanying the paper [*Linear Transformers Are Secretly Fast Weight Programmers*](https://arxiv.org/abs/2102.11174) which is published at ICML'21.
It also contains the logs of all synthetic experiments.
## Synthetic Experiments### Requirements
```bash
$ cat req.txt
jupyter==1.0.0
pandas==1.0.1
seaborn==0.10.0
torch==1.6.0
matplotlib==3.1.3
numpy==1.17.2
``````bash
pip3 install -r req.txt
```### Rerun Experiments
Logs are provided in the ```synthetic/logs``` folder.
The files in that folder are a result of running the following commands:Setting 1 (capacity):
```bash
python3 main.py --begin=20 --end=600 --step=20 --attn_name=softmax --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=linear --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=2 --update_rule=sumpython3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=3 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=64 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=128 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=512 --update_rule=sum
```Setting 2 (update rule):
```bash
python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=sum --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=tanh --update_rule=fwm --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=fwm --replacepython3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=2 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=linear --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=favor --attn_arg=64 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=favor --attn_arg=128 --update_rule=ours --replace
```Generate figures from the logs using the following notebooks:
```
synthetic/setting1_generate_figure.ipynb
synthetic/setting2_generate_figure.ipynb
```## Language Modelling & Machine Translation
The toolkit and scripts for language modeling experiments can be found at [IDSIA/lmtool-fwms](https://github.com/IDSIA/lmtool-fwms).For machine translation experiments, we ported the different attention functions implemented in the language modeling toolkit to the multi-head attention implementation in [FAIRSEQ](https://github.com/pytorch/fairseq).
## Citation
```
@inproceedings{schlag2021linear,
title={Linear Transformers Are Secretly Fast Weight Programmers},
author={Imanol Schlag and Kazuki Irie and J\"urgen Schmidhuber},
booktitle={Proc. Int. Conf. on Machine Learning (ICML)},
address = {Virtual only},
month = jul,
year={2021}
}
```