https://github.com/phohenecker/pytorch-transformer

A PyTorch implementation of the Transformer model from "Attention Is All You Need".
https://github.com/phohenecker/pytorch-transformer

attention-is-all-you-need deep-learning python3 pytorch

Last synced: 6 months ago
JSON representation

A PyTorch implementation of the Transformer model from "Attention Is All You Need".

Host: GitHub
URL: https://github.com/phohenecker/pytorch-transformer
Owner: phohenecker
License: other
Created: 2018-10-31T14:01:58.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2019-07-13T19:56:46.000Z (about 6 years ago)
Last Synced: 2025-04-12T22:15:21.259Z (6 months ago)
Topics: attention-is-all-you-need, deep-learning, python3, pytorch
Language: Python
Homepage:
Size: 59.6 KB
Stars: 59
Watchers: 4
Forks: 10
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          pytorch-transformer

===================

This repository provides a PyTorch implementation of the *Transformer* model that has been introduced in the paper

*Attention Is All You Need* (Vaswani et al. 2017).

Installation

------------

The easiest way to install this package is via pip:

```bash

pip install git+https://github.com/phohenecker/pytorch-transformer

```

Usage

-----

```python

import transformer

model = transformer.Transformer(...)

```

##### 1. Computing Predictions given a Target Sequence

This is the default behaviour of a

[`Transformer`](src/main/python/transformer/transformer.py),

and is implemented in its

[`forward`](src/main/python/transformer/transformer.py#L205)

method:

```python

predictions = model(input_seq, target_seq)

```

##### 2. Evaluating the Probability of a Target Sequence

The probability of an output sequence given an input sequence under an already trained model can be evaluated by means

of the function

[`eval_probability`](src/main/python/transformer/transformer_tools.py#L46):

```python

probabilities = transformer.eval_probability(model, input_seq, target_seq, pad_index=...)

```

##### 3. Sampling an Output Sequence

Sampling a random output given an input sequence under the distribution computed by a model is realized by the function

[`sample_output`](src/main/python/transformer/transformer_tools.py#L115):

```python

output_seq = transformer.sample_output(model, input_seq, eos_index, pad_index, max_len)

```

Pretraining Encoders with BERT

------------------------------

For pretraining the encoder part of the transformer

(i.e.,[`transformer.Encoder`](src/main/python/transformer/encoder.py))

with BERT (Devlin et al., 2018), the class [`MLMLoss`](src/main/python/transformer/bert/mlm_loss.py) provides an

implementation of the masked language-model loss function.

A full example of how to implement pretraining with BERT can be found in

[`examples/bert_pretraining.py`](examples/bert_pretraining.py).

References

----------

> Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. (2017).

> Attention Is All You Need.  

> Preprint at http://arxiv.org/abs/1706.03762.

> Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018).  

> BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.  

> Preprint at http://arxiv.org/abs/1810.04805.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/phohenecker/pytorch-transformer

Awesome Lists containing this project

README