Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/facebookresearch/transformer-sequential

Trains Transformer model variants. Data isn't shuffled between batches.
https://github.com/facebookresearch/transformer-sequential

Last synced: 10 days ago
JSON representation

Trains Transformer model variants. Data isn't shuffled between batches.

Host: GitHub
URL: https://github.com/facebookresearch/transformer-sequential
Owner: facebookresearch
License: other
Archived: true
Created: 2021-05-06T15:27:31.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-10-05T18:24:36.000Z (over 2 years ago)
Last Synced: 2024-12-17T01:38:06.552Z (about 2 months ago)
Language: Python
Homepage:
Size: 51.8 KB
Stars: 139
Watchers: 10
Forks: 18
Open Issues: 2
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

StarryDivineSky - facebookresearch/transformer-sequential - Span。用于使用类似 Transformer 的架构进行长序列建模。 (时间序列 / 网络服务_其他)

README

        # transformer-sequential

This repo contains the code for three papers:

- Feedback Transformer

- Expire-Span

- Staircase Transformer

The training code is structured for long sequential modeling with Transformer-like architectures.

## Requirements

You will need a CUDA-enabled GPU to run the code.

## Setup

Run the following:

```

pip install -r requirements.txt

```

## Feedback Transformer

Introduced in [Addressing Some Limitations of Transformers with Feedback Memory](https://arxiv.org/abs/2002.09402v3).

### Running Experiments from the Paper

#### enwik8

|Model|Params|Valid|Test|

|-|-|-|-|

|Feedback Transformer|77M|0.984|0.962|

_Numbers are Bits-Per-Character_

```

bash experiments/feedback/enwik8.sh

```

#### Algorithmic

|Model|3 Variable|5 Variable|

|-|-|-|

|Transformer|33.7|37.5|

|Feedback Transformer|99.1|92.6|

_Numbers are % Accuracy on Test_

```

bash experiments/feedback/algorithmic_3var.sh

bash experiments/feedback/algorithmic_5var.sh

```

## Expire-Span

Introduced in [Not All Memories are Created Equal: Learning to Expire](https://ai.facebook.com/research/publications/not-all-memories-are-created-equal).

### Running Experiments from the Paper

#### enwik8

|Model|Params|Valid|Test|

|-|-|-|-|

|Expire-Span 12L|38M|1.014|0.994|

_Numbers are Bits-Per-Character_

```

bash experiments/expire_span/enwik8.sh

```

#### Object Collision

|Model|Maximum Span|Test Error (%)|

|-|-|-|

|Expire-Span|16k|52.2|

|Expire-Span|32k|36.7|

|Expire-Span|64k|26.7|

```

bash experiments/expire_span/object_collision_16k.sh

bash experiments/expire_span/object_collision_32k.sh

bash experiments/expire_span/object_collision_64k.sh

```

## Staircase

Introduced in [Staircase Attention for Recurrent Processing of Sequences](https://arxiv.org/pdf/2106.04279.pdf).

Note this algorithmic task in this repo is slightly different from what was used in the paper, while the number might not exactly match, it does show the same trend as in the paper. And the model implementation / hyperparameter remains the same.

### Running Experiments from the Paper

#### Algorithmic

|Model|Test|

|-|-|

|Transformer|58.44%|

|Staircase Transformer| 3.6%|

_Numbers are % error rate on Test_

```

bash experiments/staircase/algorithmic_3var.sh

```

## License

The code is licensed under CC-BY-NC license. See the LICENSE file for more details.