Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Lsdefine/attention-is-all-you-need-keras
A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need
https://github.com/Lsdefine/attention-is-all-you-need-keras
attention-is-all-you-need attention-seq2seq deep-learning keras keras-tensorflow
Last synced: about 2 months ago
JSON representation
A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need
- Host: GitHub
- URL: https://github.com/Lsdefine/attention-is-all-you-need-keras
- Owner: lsdefine
- Created: 2018-03-16T06:07:11.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2021-09-24T06:48:44.000Z (over 3 years ago)
- Last Synced: 2024-08-03T02:09:23.778Z (5 months ago)
- Topics: attention-is-all-you-need, attention-seq2seq, deep-learning, keras, keras-tensorflow
- Language: Python
- Size: 1.33 MB
- Stars: 702
- Watchers: 26
- Forks: 190
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# The Transformer model in Attention is all you need:a Keras implementation.
A Keras+TensorFlow Implementation of the Transformer: "[Attention is All You Need](https://arxiv.org/abs/1706.03762)" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017)# Usage
Please refer to *en2de_main.py* and *pinyin_main.py*
### en2de_main.py
- This task is same as in [jadore801120/attention-is-all-you-need-pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch): WMT'16 Multimodal Translation: Multi30k (de-en) [(http://www.statmt.org/wmt16/multimodal-task.html)](http://www.statmt.org/wmt16/multimodal-task.html). We borrowed the data preprocessing step 0 and 1 in the repository, and then construct the input file *en2de.s2s.txt*
#### Results
- The code achieves near results as in the repository: about 70% valid accuracy.
If using smaller model parameters, such as *layers=2* and *d_model=256*, the valid accuracy is better since the task is quite small.
### For your own data
- Just preprocess your source and target sequences as the format in *en2de.s2s.txt* and *pinyin.corpus.examples.txt*.
### Some notes
- For larger number of layers, the special learning rate scheduler reported in the papar is necessary.
- In *pinyin_main.py*, I tried another method to train the deep network. I train the first layer and the embedding layer first, then train a 2-layers model, and then train a 3-layers, etc. It works in this task.### Upgrades
- Reconstruct some classes.
- It is easier to use the components in other models, just import transformer.py
- A fast step-by-step decoder is added, including an upgraded beam-search. But they should be modified to be reuseable.
- Updated for tensorflow 2.6.0# Acknowledgement
- Some model structures and some scripts are borrowed from [jadore801120/attention-is-all-you-need-pytorch](https://github.com/jadore801120/attention-is-all-you-need-pytorch).