https://github.com/ashly1991/transformer-nmt-tf2
Transformer neural machine translation in TensorFlow 2 with tensorflow-text; tutorial-based build with masks, positional encodings, and experiments.
https://github.com/ashly1991/transformer-nmt-tf2
attention encoder-decoder jupyter-notebook neural-machine-translation nmt tensorflow tensorflow-text transformer
Last synced: 20 days ago
JSON representation
Transformer neural machine translation in TensorFlow 2 with tensorflow-text; tutorial-based build with masks, positional encodings, and experiments.
- Host: GitHub
- URL: https://github.com/ashly1991/transformer-nmt-tf2
- Owner: Ashly1991
- License: mit
- Created: 2025-09-23T14:07:12.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-09-23T14:14:01.000Z (8 months ago)
- Last Synced: 2025-09-23T14:42:42.583Z (8 months ago)
- Topics: attention, encoder-decoder, jupyter-notebook, neural-machine-translation, nmt, tensorflow, tensorflow-text, transformer
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Transformer Neural Machine Translation (TensorFlow 2)
Self-contained implementation of a **Transformer** for **neural machine translation** in TensorFlow 2. The project covers tokenization, positional encodings, masking (padding & look-ahead), multi-head self-attention, encoder–decoder stacks, training, and inference/decoding.
## Highlights
- **Tokenization & vocab** (with `tensorflow-text`) and **positional encodings**.
- **Masks**: padding mask for loss/attention; look-ahead mask for the decoder.
- **Transformer blocks**: scaled dot-product attention, **multi-head attention**, FFN, residuals + layer norm.
- **Training loop** with cross-entropy + accuracy; masked loss to ignore padding.
- **Inference** (greedy by default; extendable to beam search).
- **Reproducibility**: seeds set in the notebook; notes on deterministic decoding.
## What I learned (from this build)
- How a Transformer uses self-attention, multi-head attention, and positional encodings to model sequences.
- Why positional encodings are needed (attention is permutation-invariant).
- How masks (padding & look-ahead) affect attention and the loss.
- Encoder–decoder structure; **teacher forcing** at training vs **auto-regressive** decoding at inference.
- Practical setup (tokenization, vocabularies, training loop, decoding).
## How to run
```bash
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
jupyter lab transformer-nmt.ipynb
```
## Requirements
Pinned for stability/performance with this implementation:
```
tensorflow==2.14.0
tensorflow-text==2.14.0
tensorflow-datasets
numpy
matplotlib
jupyterlab
```
## Notes
- Greedy decoding is deterministic with fixed weights and dropout disabled.
To ensure repeatable translations, set seeds and avoid sampling at inference.
- Swap tokenizers/vocabs + final projection size to use a different language pair; core architecture stays the same.
## License
MIT — see `LICENSE`.