https://github.com/kyegomez/shallowff

Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers"
https://github.com/kyegomez/shallowff

artificial-intelligence attention attention-is-all-you-need attention-mechanism attention-mechanisms feedforward transformer transformer-encoder transformer-models transformers-models

Last synced: 3 months ago
JSON representation

Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers"

Host: GitHub
URL: https://github.com/kyegomez/shallowff
Owner: kyegomez
License: mit
Created: 2023-11-20T03:49:13.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-04-19T12:53:19.000Z (6 months ago)
Last Synced: 2025-04-19T20:16:56.561Z (6 months ago)
Topics: artificial-intelligence, attention, attention-is-all-you-need, attention-mechanism, attention-mechanisms, feedforward, transformer, transformer-encoder, transformer-models, transformers-models
Language: Python
Homepage: https://discord.gg/Yx5y5VBahs
Size: 36.2 MB
Stars: 10
Watchers: 2
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# ALR Transformer

ALR Transformer that replaces the original transformer implementation of an joint encoder + decoder block with a feedforward/alr block with a decoder block

## Install

`pip install alr-transformer`

## Usage

```python

import torch

from alr_transformer import ALRTransformer

x = torch.randint(0, 100000, (1, 2048))

model = ALRTransformer(

    dim = 512,

    depth = 6,

    num_tokens = 100000,

    dim_head = 64,

    heads = 8,

    ff_mult = 4

)

out = model(x)

print(out)

print(out.shape)

```

## Train

- First git clone the repo then download and then run the following

```

python3 train.py

```

## Citation

```bibtex

@misc{bozic2023rethinking,

    title={Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers}, 

    author={Vukasin Bozic and Danilo Dordervic and Daniele Coppola and Joseph Thommes},

    year={2023},

    eprint={2311.10642},

    archivePrefix={arXiv},

    primaryClass={cs.CL}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kyegomez/shallowff

Awesome Lists containing this project

README