Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/facebookresearch/transformer-sequential
Trains Transformer model variants. Data isn't shuffled between batches.
https://github.com/facebookresearch/transformer-sequential
Last synced: about 1 month ago
JSON representation
Trains Transformer model variants. Data isn't shuffled between batches.
- Host: GitHub
- URL: https://github.com/facebookresearch/transformer-sequential
- Owner: facebookresearch
- License: other
- Archived: true
- Created: 2021-05-06T15:27:31.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-05T18:24:36.000Z (about 2 years ago)
- Last Synced: 2024-06-23T05:59:57.425Z (5 months ago)
- Language: Python
- Homepage:
- Size: 51.8 KB
- Stars: 138
- Watchers: 11
- Forks: 18
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- StarryDivineSky - facebookresearch/transformer-sequential - Span。用于使用类似 Transformer 的架构进行长序列建模。 (时间序列 / 网络服务_其他)
README
# transformer-sequential
This repo contains the code for three papers:
- Feedback Transformer
- Expire-Span
- Staircase TransformerThe training code is structured for long sequential modeling with Transformer-like architectures.
## Requirements
You will need a CUDA-enabled GPU to run the code.
## Setup
Run the following:
```
pip install -r requirements.txt
```## Feedback Transformer
Introduced in [Addressing Some Limitations of Transformers with Feedback Memory](https://arxiv.org/abs/2002.09402v3).
### Running Experiments from the Paper
#### enwik8
|Model|Params|Valid|Test|
|-|-|-|-|
|Feedback Transformer|77M|0.984|0.962|_Numbers are Bits-Per-Character_
```
bash experiments/feedback/enwik8.sh
```#### Algorithmic
|Model|3 Variable|5 Variable|
|-|-|-|
|Transformer|33.7|37.5|
|Feedback Transformer|99.1|92.6|_Numbers are % Accuracy on Test_
```
bash experiments/feedback/algorithmic_3var.sh
bash experiments/feedback/algorithmic_5var.sh
```## Expire-Span
Introduced in [Not All Memories are Created Equal: Learning to Expire](https://ai.facebook.com/research/publications/not-all-memories-are-created-equal).
### Running Experiments from the Paper
#### enwik8
|Model|Params|Valid|Test|
|-|-|-|-|
|Expire-Span 12L|38M|1.014|0.994|_Numbers are Bits-Per-Character_
```
bash experiments/expire_span/enwik8.sh
```#### Object Collision
|Model|Maximum Span|Test Error (%)|
|-|-|-|
|Expire-Span|16k|52.2|
|Expire-Span|32k|36.7|
|Expire-Span|64k|26.7|```
bash experiments/expire_span/object_collision_16k.sh
bash experiments/expire_span/object_collision_32k.sh
bash experiments/expire_span/object_collision_64k.sh
```## Staircase
Introduced in [Staircase Attention for Recurrent Processing of Sequences](https://arxiv.org/pdf/2106.04279.pdf).
Note this algorithmic task in this repo is slightly different from what was used in the paper, while the number might not exactly match, it does show the same trend as in the paper. And the model implementation / hyperparameter remains the same.### Running Experiments from the Paper
#### Algorithmic
|Model|Test|
|-|-|
|Transformer|58.44%|
|Staircase Transformer| 3.6%|_Numbers are % error rate on Test_
```
bash experiments/staircase/algorithmic_3var.sh
```## License
The code is licensed under CC-BY-NC license. See the LICENSE file for more details.