https://github.com/lukashedegaard/continual-transformers-tf
TensorFlow implementation of Continual Transformer building blocks
https://github.com/lukashedegaard/continual-transformers-tf
Last synced: 2 months ago
JSON representation
TensorFlow implementation of Continual Transformer building blocks
- Host: GitHub
- URL: https://github.com/lukashedegaard/continual-transformers-tf
- Owner: LukasHedegaard
- License: apache-2.0
- Created: 2022-02-24T08:00:20.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-02-25T14:45:56.000Z (about 3 years ago)
- Last Synced: 2025-01-06T16:14:51.041Z (4 months ago)
- Language: Python
- Size: 521 KB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Continual Transformers TensorFlow
TensorFlow implementation of Continual Transformer building blocks, which augment regular transformer layers with the ability to compute the attention output _per token step_.
The layers are modelled on the `tf.keras.layers.MultiHeadAttention` and should work as drop-in replacements in most cases.
## Setup
Continual Transformers and its modules can be installed in in your project using:
```bash
pip install git+https://github.com/LukasHedegaard/continual-transformers-tf.git
```## Layers
### [Continual Single-output Multi Head Attention](tests/test_co_si_mha.py)
```python
from continual_transformers_tf import CoSiMultiHeadAttentionlayer = CoSiMultiHeadAttention(seq_len=10, num_heads=2, key_dim=4)
```
![]()
Fig. 1: Continual Single-Output Dot-Product Attention.
The key (K) and value (V) matrices are aggregated over time by caching the step vectors k_n and v_n in a FIFO queue. During each step, only the attention output associated with q is computed.
### [Circular Positional Embedding](tests/test_circular_embedding.py)
```python
from continual_transformers_tf import CircularPositionalEncodinglayer = CircularPositionalEncoding(max_len=10, embed_dim=4)
```
![]()
Fig. 2: Circular Positional Encoding.
At each step, a positional encoding is added in a round-robin fashion.
### [Continual Single-output Transformer Encoder](tests/test_co_si_trans_enc.py)
```python
from continual_transformers_tf import CoSiTransformerEncoderlayer = CoSiTransformerEncoder(
seq_len=10,
embed_dim=4,
num_heads=2,
ff_dim=16,
dropout_rate=0.1,
)
```## Citation
```
@article{hedegaard2022cotrans,
title={Continual Transformers: Redundancy-Free Attention for Online Inference},
author={Lukas Hedegaard and Alexandros Iosifidis},
journal={preprint, arXiv:2201.06268},
year={2022}
}
```