https://github.com/andreamad8/Universal-Transformer-Pytorch

Implementation of Universal Transformer in Pytorch
https://github.com/andreamad8/Universal-Transformer-Pytorch

pytorch universal-transformer

Last synced: 3 months ago
JSON representation

Implementation of Universal Transformer in Pytorch

Host: GitHub
URL: https://github.com/andreamad8/Universal-Transformer-Pytorch
Owner: andreamad8
Created: 2018-10-23T05:20:51.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2018-11-19T14:32:16.000Z (over 6 years ago)
Last Synced: 2024-08-02T20:46:34.023Z (12 months ago)
Topics: pytorch, universal-transformer
Language: Python
Size: 1.46 MB
Stars: 254
Watchers: 8
Forks: 31
Open Issues: 10
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

ATPapers - andreamad8 / Universal-Transformer-Pytorch - Universal Transformer PyTorch implementation (Transformer / Repositories)

README

        # Universal-Transformer-Pytorch

Simple and self-contained implementation of the [Universal Transformer](https://arxiv.org/abs/1807.03819) (Dehghani, 2018) in Pytorch. Please open issues if you find bugs, and send pull request if you want to contribuite. 

![](file.gif)

GIF taken from: [https://twitter.com/OriolVinyalsML/status/1017523208059260929](https://twitter.com/OriolVinyalsML/status/1017523208059260929)

## Universal Transformer 

The basic Transformer model has been taken from [https://github.com/kolloldas/torchnlp](https://github.com/kolloldas/torchnlp). For now it has been implemented:

- Universal Transformer Encoder Decoder, with position and time embeddings.

- [Adaptive Computation Time](https://arxiv.org/abs/1603.08983) (Graves, 2016) as describe in Universal Transformer paper. 

- Universal Transformer for bAbI data.  

 

## Dependendency

```

python3

pytorch 0.4

torchtext

argparse

```

## How to run

To run standard Universal Transformer on bAbI run:

```

python main.py --task 1

```

To run Adaptive Computation Time: 

```

python main.py --task 1 --act

```

## Results

10k over 10 run, get the maximum.

In task 16 17 18 19 I notice that are very hard to converge also in training set. 

The problem seams to be the lr rate scheduling. Moreover, on 1K setting the results

are very bad yet, maybe I have to tune some hyper-parameters. 

|Task  | Uni-Trs| + ACT  | Original |

|  --- |---     |---     |---       |     

|  1   | 0.0    |  0.0   | 0.0      |

|  2   | 0.0    |  0.2   | 0.0      |

|  3   | 0.8    |  2.4   | 0.4      |

|  4   | 0.0    |  0.0   | 0.0      |

|  5   | 0.4    |  0.1   | 0.0      |

|  6   | 0.0    |  0.0   | 0.0      |

|  7   | 0.4    |  0.0   | 0.0      |

|  8   | 0.2    |  0.1   |  0.0     |

|  9   | 0.0    |  0.0   |  0.0     |

| 10   | 0.0    |  0.0   |  0.0     |

| 11   | 0.0    |  0.0   |  0.0     |

| 12   | 0.0    |  0.0   |  0.0     |

| 13   | 0.0    |  0.0   |  0.0     |

| 14   | 0.0    |  0.0   |  0.0     |

| 15   | 0.0    |  0.0   |  0.0     |

| 16   | 50.5   |  50.6  |  0.4     |

| 17   | 13.7   |  14.1  |  0.6     |

| 18   | 4      |    6.9 |  0.0     |

| 19   | 79.2   |  65.2  |  2.8     |

| 20   | 0.0    |  0.0   |  0.0     |

|---   | ---    | ---    |  ---     |

| avg  | 7.46   | 6.98   |  0.21    |

| fail | 3      | 3      |  0       |

## TODO

- Visualize ACT on different tasks

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andreamad8/Universal-Transformer-Pytorch

Awesome Lists containing this project

README