Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
https://github.com/lucidrains/feedback-transformer-pytorch
artifiical-intelligence attention-mechanism deep-learning memory transformer
Last synced: about 1 month ago
JSON representation
Implementation of Feedback Transformer in Pytorch
- Host: GitHub
- URL: https://github.com/lucidrains/feedback-transformer-pytorch
- Owner: lucidrains
- License: mit
- Created: 2021-02-02T18:51:31.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-03-02T15:18:04.000Z (almost 4 years ago)
- Last Synced: 2024-11-27T13:21:55.356Z (about 2 months ago)
- Topics: artifiical-intelligence, attention-mechanism, deep-learning, memory, transformer
- Language: Python
- Homepage:
- Size: 50.8 KB
- Stars: 104
- Watchers: 4
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Feedback Transformer - Pytorch
Simple implementation of Feedback Transformer in Pytorch. They improve on Transformer-XL by having each token have access to the representations of all previous layers through time. This is achieved by aggregating the outputs of all layers into a shared memory, which each token across layers can attend to at each time step.
The main drawback is longer training time, due to its non-parallel nature. But I thought I'd build it to further exploration and research into this line of work.
I also took the liberty to add some various enhancements, including pre-normalization, GLU gated feedforwards, as well as simplified T5 relative positional embeddings.
## Install
```bash
$ pip install feedback-transformer-pytorch
```## Usage
```python
import torch
from feedback_transformer_pytorch import FeedbackTransformermodel = FeedbackTransformer(
num_tokens = 20000, # number of tokens
dim = 512, # dimension
depth = 6, # depth
seq_len = 2, # the sequence length of each segment or window
mem_len = 256, # length of the memory buffer
dim_head = 64, # dimension of each head
heads = 8, # number of heads
attn_dropout = 0.1, # attention dropout
ff_dropout = 0.1 # feedforward dropout
).cuda()x = torch.randint(0, 20000, (2, 64)).cuda()
model(x) # (2, 64, 20000)
```If you would like to have fine control over the memory (when to detach, etc), you can do it with some extra keyword arguments on `.forward`
```python
import torch
from feedback_transformer_pytorch import FeedbackTransformermodel = FeedbackTransformer(
num_tokens = 20000,
dim = 512,
depth = 6,
seq_len = 32,
mem_len = 256
).cuda()x1 = torch.randint(0, 20000, (2, 32)).cuda()
x2 = torch.randint(0, 20000, (2, 32)).cuda()
x3 = torch.randint(0, 20000, (2, 32)).cuda()out1, mem1 = model(x1, return_memory = True)
out2, mem2 = model(x2, memory = mem1, return_memory = True)
out3, mem3 = model(x3, memory = mem2, return_memory = True) # (2, 32, 20000)
```## Citations
```bibtex
@misc{fan2021addressing,
title = {Addressing Some Limitations of Transformers with Feedback Memory},
author = {Angela Fan and Thibaut Lavril and Edouard Grave and Armand Joulin and Sainbayar Sukhbaatar},
year = {2021},
eprint = {2002.09402},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
```