Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Lsdefine/attention-is-all-you-need-keras

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need
https://github.com/Lsdefine/attention-is-all-you-need-keras

attention-is-all-you-need attention-seq2seq deep-learning keras keras-tensorflow

Last synced: about 2 months ago
JSON representation

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need

Awesome Lists containing this project

README

        

# The Transformer model in Attention is all you need:a Keras implementation.
A Keras+TensorFlow Implementation of the Transformer: "[Attention is All You Need](https://arxiv.org/abs/1706.03762)" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)

# Usage
Please refer to *en2de_main.py* and *pinyin_main.py*
### en2de_main.py
- This task is same as in [jadore801120/attention-is-all-you-need-pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch): WMT'16 Multimodal Translation: Multi30k (de-en) [(http://www.statmt.org/wmt16/multimodal-task.html)](http://www.statmt.org/wmt16/multimodal-task.html). We borrowed the data preprocessing step 0 and 1 in the repository, and then construct the input file *en2de.s2s.txt*
#### Results
- The code achieves near results as in the repository: about 70% valid accuracy.
If using smaller model parameters, such as *layers=2* and *d_model=256*, the valid accuracy is better since the task is quite small.
### For your own data
- Just preprocess your source and target sequences as the format in *en2de.s2s.txt* and *pinyin.corpus.examples.txt*.
### Some notes
- For larger number of layers, the special learning rate scheduler reported in the papar is necessary.
- In *pinyin_main.py*, I tried another method to train the deep network. I train the first layer and the embedding layer first, then train a 2-layers model, and then train a 3-layers, etc. It works in this task.

### Upgrades
- Reconstruct some classes.
- It is easier to use the components in other models, just import transformer.py
- A fast step-by-step decoder is added, including an upgraded beam-search. But they should be modified to be reuseable.
- Updated for tensorflow 2.6.0

# Acknowledgement
- Some model structures and some scripts are borrowed from [jadore801120/attention-is-all-you-need-pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch).