https://github.com/andreimoraru123/neural-machine-translation
Modern Eager TensorFlow implementation of Attention Is All You Need
https://github.com/andreimoraru123/neural-machine-translation
attention beam-search bleu-score byte-pair-encoding deep-learning dot-product-attention einops embedding-projector embeddings encoder-decoder keras label-smoothing language language-model nlp self-attention tensorflow tokenization transformers translation
Last synced: 4 months ago
JSON representation
Modern Eager TensorFlow implementation of Attention Is All You Need
- Host: GitHub
- URL: https://github.com/andreimoraru123/neural-machine-translation
- Owner: AndreiMoraru123
- Created: 2023-07-12T05:44:34.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-07T10:16:45.000Z (8 months ago)
- Last Synced: 2025-01-10T17:50:50.388Z (5 months ago)
- Topics: attention, beam-search, bleu-score, byte-pair-encoding, deep-learning, dot-product-attention, einops, embedding-projector, embeddings, encoder-decoder, keras, label-smoothing, language, language-model, nlp, self-attention, tensorflow, tokenization, transformers, translation
- Language: Python
- Homepage:
- Size: 1.02 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Attention Is All You Need
  
[](https://github.com/psf/black) [](https://mypy-lang.org/) [](https://github.com/AndreiMoraru123/Neural-Machine-Translation/actions/workflows/python-app.yml)
> [!NOTE]\
> I adapted the code from [this awesome PyTorch version](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Machine-Translation/tree/master). Please check it out as well.> [!IMPORTANT]\
> I am using python `3.9` with tensorflow `2.10` as this is their last available version for native-Windows on GPU.## Steps
1. `pip install -r requirements.txt`
2. [download.py](https://github.com/AndreiMoraru123/Neural-Machine-Translation/blob/main/download.py) downloads all the data (`en`-`de` file pairs from Europarl, Common Crawl and News Commentary) to the specified folder as argument.
3. [encode.py](https://github.com/AndreiMoraru123/Neural-Machine-Translation/blob/main/encode.py) filters the data based on the arguments (origin, maximum length etc.) and trains the BPE model, saving it to a file.
4. [train.py](https://github.com/AndreiMoraru123/Neural-Machine-Translation/blob/main/train.py) runs the whole training pipeline with top-down logic found in the file. Everything is managed by the `Trainer` from [trainer.py](https://github.com/AndreiMoraru123/Neural-Machine-Translation/blob/main/trainer.py) (logging embeddings, checkpointing etc.).
5. [translate.py](https://github.com/AndreiMoraru123/Neural-Machine-Translation/blob/main/translate.py) runs the model inference and optionally evaluates it with `sacrebleu` using the `Evaluator` from [evaluator.py](https://github.com/AndreiMoraru123/Neural-Machine-Translation/blob/main/evaluator.py).
6. [docs](https://github.com/AndreiMoraru123/Neural-Machine-Translation/tree/main/docs) contains notes with svg drawings from [the original repo](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Machine-Translation/tree/master) and markdown files explaining the choices I had to make for adaptating from one framework to another.The code itself is heavily commented and you can get a feel for how language models work by looking at the [tests](https://github.com/AndreiMoraru123/Neural-Machine-Translation/tree/main/test).
## Overfitting on one sentence
Input sequence:
```
"I declare resumed the session of the European Parliament "
"adjourned on Friday 17 December 1999, and I would like "
"once again to wish you a happy new year in the hope that "
"you enjoyed a pleasant festive period."
```Results in the following generated hypotheses (all should to be the top one and the exact label for this sentence):
Top generated sequence:
```
('Ich erkläre die am Freitag, dem 17. Dezember unterbrochene Sitzungsperiode '
'des Europäischen Parlaments für wiederaufgenommen, wünsche Ihnen nochmals '
'alles Gute zum Jahreswechsel und hoffe, daß Sie schöne Ferien hatten.')
```
All generated sequences in the beam (k=5) search:
```
[{'hypothesis': 'Ich die am Freitag, dem 17. Dezember unterbrochene '
'Sitzungsperiode des Europäischen Parlaments für '
'wiederaufgenommen, wünsche Ihnen nochmals alles Gute zum '
'Jahreswechsel und hoffe, daß Sie schöne Ferien hatten.',
'score': -3.3601136207580566},
{'hypothesis': 'Ich erkläre die am Freitag, dem 17. Dezember unterbrochene '
'Sitzungsperiode des Europäischen Parlaments für '
'wiederaufgenommen, wünsche Ihnen nochmals alles Gute zum '
'Jahreswechsel und hoffe, daß Sie schöne Ferien hatten.',
'score': -1.4448045492172241},
{'hypothesis': 'Ich Ich erkläre die am Freitag, dem 17. Dezember '
'unterbrochene Sitzungsperiode des Europäischen Parlaments für '
'wiederaufgenommen, wünsche Ihnen nochmals alles Gute zum '
'Jahreswechsel und hoffe, daß Sie schöne Ferien hatten.',
'score': -3.1513545513153076},
{'hypothesis': 'Ich erkläre die die am Freitag, dem 17. Dezember '
'unterbrochene Sitzungsperiode des Europäischen Parlaments für '
'wiederaufgenommen, wünsche Ihnen nochmals alles Gute zum '
'Jahreswechsel und hoffe, daß Sie schöne Ferien hatten.',
'score': -3.3080737590789795},
{'hypothesis': 'Ich erkläre erkläre die am Freitag, dem 17. Dezember '
'unterbrochene Sitzungsperiode des Europäischen Parlaments für '
'wiederaufgenommen, wünsche Ihnen nochmals alles Gute zum '
'Jahreswechsel und hoffe, daß Sie schöne Ferien hatten.',
'score': -3.3361663818359375}]
```These are negative as they are log probabilities, the closest to zero being the top sequence
As a sanity check, the BLEU score should be a perfect `100/100` in all cases:
```
INFO:root:13a tokenization, cased
INFO:root:BLEU = 100.00 100.0/100.0/100.0/100.0 (BP = 1.000 ratio = 1.000 hyp_len = 34 ref_len = 34)
INFO:root:13a tokenization, caseless
INFO:root:BLEU = 100.00 100.0/100.0/100.0/100.0 (BP = 1.000 ratio = 1.000 hyp_len = 34 ref_len = 34)
INFO:root:International tokenization, cased
INFO:root:BLEU = 100.00 100.0/100.0/100.0/100.0 (BP = 1.000 ratio = 1.000 hyp_len = 34 ref_len = 34)
INFO:root:International tokenization, caseless
INFO:root:BLEU = 100.00 100.0/100.0/100.0/100.0 (BP = 1.000 ratio = 1.000 hyp_len = 34 ref_len = 34)
```## Embeddings
After training for a while, some interesting patterns arise. This project integrates them into the [Embedding Projector](https://www.tensorflow.org/tensorboard/tensorboard_projector_plugin).
In the shared vocabulary between the encoder (english) and decoder (german) we can see some cosine similarities:
#### British with *britischen* and *nationaler*
#### will and *wollte* (& *konnte*, *mochte*, *wurde*)
#### *Bedenken* (pondering) is closest to *glaube* (believe)
#### *Entschließung* (resolution) gets associated with *completed*

#### *gessammelt* (collected) maps to *decision* and *Bestimmung* (determination) as well as *verstärkt* (strenghtened)

#### *Change* also gets associated with *neuer* (new)
