https://github.com/ashly1991/transformer-nmt-tf2

Transformer neural machine translation in TensorFlow 2 with tensorflow-text; tutorial-based build with masks, positional encodings, and experiments.
https://github.com/ashly1991/transformer-nmt-tf2

attention encoder-decoder jupyter-notebook neural-machine-translation nmt tensorflow tensorflow-text transformer

Last synced: about 2 months ago
JSON representation

Transformer neural machine translation in TensorFlow 2 with tensorflow-text; tutorial-based build with masks, positional encodings, and experiments.

Host: GitHub
URL: https://github.com/ashly1991/transformer-nmt-tf2
Owner: Ashly1991
License: mit
Created: 2025-09-23T14:07:12.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-09-23T14:14:01.000Z (10 months ago)
Last Synced: 2025-09-23T14:42:42.583Z (10 months ago)
Topics: attention, encoder-decoder, jupyter-notebook, neural-machine-translation, nmt, tensorflow, tensorflow-text, transformer
Language: Jupyter Notebook
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Transformer Neural Machine Translation (TensorFlow 2)

Self-contained implementation of a **Transformer** for **neural machine translation** in TensorFlow 2. The project covers tokenization, positional encodings, masking (padding & look-ahead), multi-head self-attention, encoder–decoder stacks, training, and inference/decoding.

## Highlights
- **Tokenization & vocab** (with `tensorflow-text`) and **positional encodings**.
- **Masks**: padding mask for loss/attention; look-ahead mask for the decoder.
- **Transformer blocks**: scaled dot-product attention, **multi-head attention**, FFN, residuals + layer norm.
- **Training loop** with cross-entropy + accuracy; masked loss to ignore padding.
- **Inference** (greedy by default; extendable to beam search).
- **Reproducibility**: seeds set in the notebook; notes on deterministic decoding.

## What I learned (from this build)
- How a Transformer uses self-attention, multi-head attention, and positional encodings to model sequences.
- Why positional encodings are needed (attention is permutation-invariant).
- How masks (padding & look-ahead) affect attention and the loss.
- Encoder–decoder structure; **teacher forcing** at training vs **auto-regressive** decoding at inference.
- Practical setup (tokenization, vocabularies, training loop, decoding).

## How to run
```bash
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
jupyter lab transformer-nmt.ipynb
```

## Requirements
Pinned for stability/performance with this implementation:
```
tensorflow==2.14.0
tensorflow-text==2.14.0
tensorflow-datasets
numpy
matplotlib
jupyterlab
```

## Notes
- Greedy decoding is deterministic with fixed weights and dropout disabled.
To ensure repeatable translations, set seeds and avoid sampling at inference.
- Swap tokenizers/vocabs + final projection size to use a different language pair; core architecture stays the same.

## License
MIT — see `LICENSE`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ashly1991/transformer-nmt-tf2

Awesome Lists containing this project

README