https://github.com/jaketae/realformer

PyTorch implementation of RealFormer: Transformer Likes Residual Attention
https://github.com/jaketae/realformer

natural-language-processing pytorch transformer transformer-encoder

Last synced: 9 months ago
JSON representation

PyTorch implementation of RealFormer: Transformer Likes Residual Attention

Host: GitHub
URL: https://github.com/jaketae/realformer
Owner: jaketae
License: mit
Created: 2021-05-11T13:44:02.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2021-05-17T10:59:41.000Z (about 5 years ago)
Last Synced: 2025-06-30T11:05:37.631Z (about 1 year ago)
Topics: natural-language-processing, pytorch, transformer, transformer-encoder
Language: Python
Homepage:
Size: 7.81 KB
Stars: 11
Watchers: 2
Forks: 7
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# RealFormer

PyTorch implementation of [RealFormer: Transformer Likes Residual Attention](https://arxiv.org/abs/2012.11747).

## Quickstart

Clone this repository.

```
git clone https://github.com/jaketae/realformer.git
```

Navigate to the cloned directory. You can start using the model via

```python
>>> from realformer import RealFormerEncoder
>>> model = RealFormerEncoder()
```

By default, the model comes with the following parameters:

```python
RealFormerEncoder(
d_model=512,
num_heads=8,
expansion_factor=2,
dropout=0.5,
max_len=512,
num_layers=6,
)
```

## Summary

Residual Attention Layer Transformer, shortened as RealFormer, is a transformer variant that incorporatess residual skip connections to allow previous attention scores to pass through the entire network. It outperforms canonical transformers on a variety of tasks and datasets, including masked language modeling (MLM), [GLUE](https://gluebenchmark.com), and [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/).

## Implementation Notes

- Just like `torch.nn.TransformerEncoder`, the `RealFormerEncoder` does not include any embedding layers. It is recommended that you implemenet positional encoding schemes (e.g. sinusodial tables, learnable embeddings) as needed.
- The authors mention that RealFormer layers can be used as drop-in replacements for any transformer model, whether they be autoencoding (encoders) or auto-regressive (decoders). We closely follow the flow of the paper and include only an encoder version of the implementation for now.

## Resources

- [Original Paper](https://arxiv.org/abs/2012.11747)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jaketae/realformer

Awesome Lists containing this project

README