https://github.com/kyegomez/shallowff
Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers"
https://github.com/kyegomez/shallowff
artificial-intelligence attention attention-is-all-you-need attention-mechanism attention-mechanisms feedforward transformer transformer-encoder transformer-models transformers-models
Last synced: 3 months ago
JSON representation
Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers"
- Host: GitHub
- URL: https://github.com/kyegomez/shallowff
- Owner: kyegomez
- License: mit
- Created: 2023-11-20T03:49:13.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-19T12:53:19.000Z (6 months ago)
- Last Synced: 2025-04-19T20:16:56.561Z (6 months ago)
- Topics: artificial-intelligence, attention, attention-is-all-you-need, attention-mechanism, attention-mechanisms, feedforward, transformer, transformer-encoder, transformer-models, transformers-models
- Language: Python
- Homepage: https://discord.gg/Yx5y5VBahs
- Size: 36.2 MB
- Stars: 10
- Watchers: 2
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# ALR Transformer
ALR Transformer that replaces the original transformer implementation of an joint encoder + decoder block with a feedforward/alr block with a decoder block## Install
`pip install alr-transformer`## Usage
```python
import torch
from alr_transformer import ALRTransformerx = torch.randint(0, 100000, (1, 2048))
model = ALRTransformer(
dim = 512,
depth = 6,
num_tokens = 100000,
dim_head = 64,
heads = 8,
ff_mult = 4
)out = model(x)
print(out)
print(out.shape)```
## Train
- First git clone the repo then download and then run the following
```
python3 train.py
```## Citation
```bibtex
@misc{bozic2023rethinking,
title={Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers},
author={Vukasin Bozic and Danilo Dordervic and Daniele Coppola and Joseph Thommes},
year={2023},
eprint={2311.10642},
archivePrefix={arXiv},
primaryClass={cs.CL}
}```