{"id":13689837,"url":"https://github.com/joeynmt/joeynmt","last_synced_at":"2025-10-21T05:15:45.155Z","repository":{"id":34248887,"uuid":"153133227","full_name":"joeynmt/joeynmt","owner":"joeynmt","description":"Minimalist NMT for educational purposes","archived":false,"fork":false,"pushed_at":"2024-01-29T01:27:42.000Z","size":39590,"stargazers_count":683,"open_issues_count":12,"forks_count":213,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-01-14T03:44:36.734Z","etag":null,"topics":["education","joey-nmt","machine-translation","neural-machine-translation","nmt","nmt-frameworks","nmt-tutorial","pytorch-transformers","rnn-pytorch","seq2seq","seq2seq-pytorch","transformer","transformer-architecture"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/joeynmt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-15T15:00:57.000Z","updated_at":"2025-01-12T21:26:23.000Z","dependencies_parsed_at":"2024-06-19T02:48:47.337Z","dependency_job_id":"918f1eec-f701-4f21-a23d-105013b54f27","html_url":"https://github.com/joeynmt/joeynmt","commit_stats":{"total_commits":680,"total_committers":38,"mean_commits":"17.894736842105264","dds":0.5838235294117646,"last_synced_commit":"0f57e93604b58cefadf48583f47a322819c59850"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joeynmt%2Fjoeynmt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joeynmt%2Fjoeynmt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joeynmt%2Fjoeynmt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joeynmt%2Fjoeynmt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/joeynmt","download_url":"https://codeload.github.com/joeynmt/joeynmt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251998471,"owners_count":21677987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["education","joey-nmt","machine-translation","neural-machine-translation","nmt","nmt-frameworks","nmt-tutorial","pytorch-transformers","rnn-pytorch","seq2seq","seq2seq-pytorch","transformer","transformer-architecture"],"created_at":"2024-08-02T16:00:28.280Z","updated_at":"2025-10-21T05:15:40.101Z","avatar_url":"https://github.com/joeynmt.png","language":"Python","funding_links":[],"categories":["Frameworks 🖼","Python","Neuronale Übersetzungstools"],"sub_categories":["Open-Source-Übersetzungsmodelle"],"readme":"# \u0026nbsp; ![Joey-NMT](joey2-small.png) Joey NMT\n[![build](https://github.com/joeynmt/joeynmt/actions/workflows/main.yml/badge.svg)](https://github.com/joeynmt/joeynmt/actions/workflows/main.yml)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![arXiv](https://img.shields.io/badge/arXiv-1907.12484-b31b1b.svg)](https://arxiv.org/abs/1907.12484)\n\n\n## Goal and Purpose\n:koala: Joey NMT framework is developed for educational purposes.\nIt aims to be a **clean** and **minimalistic** code base to help novices \nfind fast answers to the following questions.\n- :grey_question: How to implement classic NMT architectures (RNN and Transformer) in PyTorch?\n- :grey_question: What are the building blocks of these architectures and how do they interact?\n- :grey_question: How to modify these blocks (e.g. deeper, wider, ...)?\n- :grey_question: How to modify the training procedure (e.g. add a regularizer)?\n\nIn contrast to other NMT frameworks, we will **not** aim for the most recent features\nor speed through engineering or training tricks since this often goes in hand with an\nincrease in code complexity and a decrease in readability. :eyes:\n\nHowever, Joey NMT re-implements baselines from major publications.\n\nCheck out the detailed [documentation](https://joeynmt.readthedocs.io) :books: and our\n[paper](https://arxiv.org/abs/1907.12484). :newspaper:\n\n\n## Contributors\nJoey NMT was initially developed and is maintained by [Jasmijn Bastings](https://bastings.github.io/) (University of Amsterdam)\nand [Julia Kreutzer](https://juliakreutzer.github.io/) (Heidelberg University), now both at Google Research.\n[Mayumi Ohta](https://www.isi.fraunhofer.de/en/competence-center/innovations-wissensoekonomie/mitarbeiter/ohta.html)\nat Fraunhofer Institute is continuing the legacy.\n\nWelcome to our new contributors :hearts:, please don't hesitate to open a PR or an issue\nif there's something that needs improvement!\n\n\n## Features\nJoey NMT implements the following features (aka the minimalist toolkit of NMT :wrench:):\n- Recurrent Encoder-Decoder with GRUs or LSTMs\n- Transformer Encoder-Decoder\n- Attention Types: MLP, Dot, Multi-Head, Bilinear\n- Word-, BPE- and character-based tokenization\n- BLEU, ChrF evaluation\n- Beam search with length penalty and greedy decoding\n- Customizable initialization\n- Attention visualization\n- Learning curve plotting\n- Scoring hypotheses and references\n- Multilingual translation with language tags\n\n\n## Installation\nJoey NMT is built on [PyTorch](https://pytorch.org/). Please make sure you have a compatible environment.\nWe tested Joey NMT v2.3 with\n- python 3.11\n- torch 2.1.2\n- cuda 12.1\n\n\u003e :warning: **Warning**\n\u003e When running on **GPU** you need to manually install the suitable PyTorch version \n\u003e for your [CUDA](https://developer.nvidia.com/cuda-zone) version.\n\u003e For example, you can install PyTorch 2.1.2 with CUDA v12.1 as follows:\n\u003e ```\n\u003e python -m pip install --upgrade torch==2.1.2 --index-url https://download.pytorch.org/whl/cu121\n\u003e ```\n\u003e See [PyTorch installation instructions](https://pytorch.org/get-started/locally/).\n\nYou can install Joey NMT either A. via [pip](https://pypi.org/project/joeynmt/) or B. from source.\n\n### A. Via pip (the latest stable version)\n```bash\npython -m pip install joeynmt\n```\n\n### B. From source (for local development)\n```bash\ngit clone https://github.com/joeynmt/joeynmt.git  # Clone this repository\ncd joeynmt\npython -m pip install -e .  # Install Joey NMT and it's requirements\npython -m unittest  # Run the unit tests\n```\n\n\u003e :memo: **Info**\n\u003e For Windows users, we recommend to check whether txt files (i.e. `test/data/toy/*`) have utf-8 encoding.\n\n\n## Changelog\n\n### v2.3\n- introduced [DistributedDataParallel](https://pytorch.org/tutorials/beginner/dist_overview.html).\n- implemented language tags, see [notebooks/torchhub.ipynb](notebooks/torchhub.ipynb)\n- released a [iwslt14 de-en-fr multilingual model](https://huggingface.co/may-ohta/iwslt14_prompt) (trained using DDP)\n- special symbols definition refactoring\n- configuration refactoring\n- autocast refactoring\n- bugfixes\n- upgrade to python 3.11, torch 2.1.2\n- documentation refactoring\n\n\u003cdetails\u003e\u003csummary\u003eprevious releases\u003c/summary\u003e\n\n### v2.2.1\n- compatibility with torch 2.0 tested\n- configurable activation function [#211](https://github.com/joeynmt/joeynmt/pull/211)\n- bug fix [#207](https://github.com/joeynmt/joeynmt/pull/207)\n\n### v2.2\n- compatibility with torch 1.13 tested\n- torchhub introduced\n- bugfixes, minor refactoring\n\n### v2.1\n- upgrade to python 3.10, torch 1.12\n- replace Automated Mixed Precision from NVIDA's amp to Pytorch's amp package\n- replace [discord.py](https://github.com/Rapptz/discord.py) with [pycord](https://github.com/Pycord-Development/pycord) in the Discord Bot demo\n- data iterator refactoring\n- add wmt14 ende / deen benchmark trained on v2 from scratch\n- add tokenizer tutorial\n- minor bugfixes\n\n### v2.0 *Breaking change!*\n- upgrade to python 3.9, torch 1.11\n- `torchtext.legacy` dependencies are completely replaced by `torch.utils.data`\n- `joeynmt/tokenizers.py`: handles tokenization internally (also supports bpe-dropout!)\n- `joeynmt/datasets.py`: loads data from plaintext, tsv, and huggingface's [datasets](https://github.com/huggingface/datasets)\n- `scripts/build_vocab.py`: trains subwords, creates joint vocab\n- enhancement in decoding\n  - scoring with hypotheses or references\n  - repetition penalty, ngram blocker\n  - attention plots for transformers\n- yapf, isort, flake8 introduced\n- bugfixes, minor refactoring\n\n\u003e :warning: **Warning**\n\u003e The models trained with Joey NMT v1.x can be decoded with Joey NMT v2.0.\n\u003e But there is no guarantee that you can reproduce the same score as before.\n\n### v1.4\n- upgrade to sacrebleu 2.0, python 3.7, torch 1.8\n- bugfixes\n\n### v1.3\n- upgrade to torchtext 0.9 (torchtext -\u003e torchtext.legacy)\n- n-best decoding\n- demo colab notebook\n\n### v1.0\n- Multi-GPU support\n- fp16 (half precision) support\n\n\u003c/details\u003e\n\n\n## Documentation \u0026 Tutorials\nWe also updated the [documentation](https://joeynmt.readthedocs.io) thoroughly for Joey NMT 2.0!\n\nFor details, follow the tutorials in [notebooks](notebooks) dir.\n\n#### v2.x\n- [quick start with joeynmt2](notebooks/joey_v2_demo.ipynb) This quick start guide walks you step-by-step through the installation, data preparation, training, and evaluation.\n- [torch hub interface](notebooks/torchhub.ipynb) How to generate translation from a pretrained model\n\n#### v1.x\n- [demo notebook](notebooks/joey_v1_demo.ipynb)\n- [starter notebook](https://github.com/masakhane-io/masakhane-mt/blob/master/starter_notebook-custom-data.ipynb) Masakhane - Machine Translation for African Languages in [masakhane-io](https://github.com/masakhane-io/masakhane-mt)\n- [joeynmt toy models](https://github.com/bricksdont/joeynmt-toy-models) Collection of Joey NMT scripts by [@bricksdont](https://github.com/bricksdont)\n\n## Usage\n\u003e :warning: **Warning**\n\u003e For Joey NMT v1.x, please refer the archive [here](docs/JoeyNMT_v1.md).\n\nJoey NMT has 3 modes: `train`, `test`, and `translate`, and all of them takes a\n[YAML](https://yaml.org/)-style config file as argument.\nYou can find examples in the `configs` directory.\n`transformer_small.yaml` contains a detailed explanation of configuration options.\n\nMost importantly, the configuration contains the description of the model architecture\n(e.g. number of hidden units in the encoder RNN), paths to the training, development and\ntest data, and the training hyperparameters (learning rate, validation frequency etc.).\n\n\u003e :memo: **Info**\n\u003e Note that subword model training and joint vocabulary creation is not included\n\u003e in the 3 modes above, has to be done separately.\n\u003e We provide a script that takes care of it: `scritps/build_vocab.py`.\n\u003e ```bash\n\u003e python scripts/build_vocab.py configs/transformer_small.yaml --joint\n\u003e ```\n\n### `train` mode\nFor training, run \n```bash\npython -m joeynmt train configs/transformer_small.yaml\n```\nThis will train a model on the training data, validate on validation data, and store\nmodel parameters, vocabularies, validation outputs. All needed information should be\nspecified in the `data`, `training` and `model` sections of the config file (here\n`configs/transformer_small.yaml`).\n\n```\nmodel_dir/\n├── *.ckpt          # checkpoints\n├── *.hyps          # translated texts at validation\n├── config.yaml     # config file\n├── spm.model       # sentencepiece model / subword-nmt codes file\n├── src_vocab.txt   # src vocab\n├── trg_vocab.txt   # trg vocab\n├── train.log       # train log\n└── validation.txt  # validation scores\n```\n\n\u003e :bulb: **Tip**\n\u003e Be careful not to overwrite `model_dir`, set `overwrite: False` in the config file.\n\n### `test` mode\nThis mode will generate translations for validation and test set (as specified in the\nconfiguration) in `model_dir/out.[dev|test]`.\n```bash\npython -m joeynmt test configs/transformer_small.yaml\n```\nYou can specify the ckpt path explicitly in the config file. If `load_model` is not given\nin the config, the best model in `model_dir` will be used to generate translations.\n\nYou can specify i.e. [sacrebleu](https://github.com/mjpost/sacrebleu) options in the\n`test` section of the config file.\n\n\u003e :bulb: **Tip**\n\u003e `scripts/average_checkpoints.py` will generate averaged checkpoints for you.\n\u003e ```bash\n\u003e python scripts/average_checkpoints.py --inputs model_dir/*00.ckpt --output model_dir/avg.ckpt\n\u003e ```\n\nIf you want to output the log-probabilities of the hypotheses or references, you can\nspecify `return_score: 'hyp'` or `return_score: 'ref'` in the testing section of the\nconfig. And run `test` with `--output_path` and `--save_scores` options.\n```bash\npython -m joeynmt test configs/transformer_small.yaml --output-path model_dir/pred --save-scores\n```\nThis will generate `model_dir/pred.{dev|test}.{scores|tokens}` which contains scores and corresponding tokens.\n\n\u003e :memo: **Info**\n\u003e - If you set `return_score: 'hyp'` with greedy decoding, then token-wise scores will be returned. The beam search will return sequence-level scores, because the scores are summed up per sequence during beam exploration.\n\u003e - If you set `return_score: 'ref'`, the model looks up the probabilities of the given ground truth tokens, and both decoding and evaluation will be skipped.\n\u003e - If you specify `n_best` \u003e1 in config, the first translation in the nbest list will be used in the evaluation.\n\n### `translate` mode\nThis mode accepts inputs from stdin and generate translations.\n\n- File translation\n  ```bash\n  python -m joeynmt translate configs/transformer_small.yaml \u003c my_input.txt \u003e output.txt\n  ```\n\n- Interactive translation\n  ```bash\n  python -m joeynmt translate configs/transformer_small.yaml\n  ```\n  You'll be prompted to type an input sentence. Joey NMT will then translate with the \n  model specified in the config file.\n\n  \u003e :bulb: **Tip**\n  \u003e Interactive `translate` mode doesn't work with Multi-GPU.\n  \u003e Please run it on single GPU or CPU.\n\n\n## Benchmarks \u0026 pretrained models\n\n### iwslt14 de/en/fr multilingual\nWe trained this multilingual model with JoeyNMT v2.3.0 using DDP.\n\nDirection | Architecture | tok | dev | test | #params | download\n--------- | :----------: | :-- | --: | ---: | ------: | :-------\nen-\u003ede    | Transformer  | sentencepiece | - | 28.88 | 200M | [iwslt14_prompt](https://huggingface.co/may-ohta/iwslt14_prompt)\nde-\u003een    |  |  | - | 35.28 |  |\nen-\u003efr    |  |  | - | 38.86 |  |\nfr-\u003een    |  |  | - | 40.35 |  |\n\nsacrebleu signature: `nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.4.0`\n\n### wmt14 ende / deen\nWe trained the models with JoeyNMT v2.1.0 from scratch.  \ncf) [wmt14 deen leaderboard](https://paperswithcode.com/sota/machine-translation-on-wmt2014-german-english) in paperswithcode\n\nDirection | Architecture | tok | dev | test | #params | download\n--------- | :----------: | :-- | --: | ---: | ------: | :-------\nen-\u003ede | Transformer | sentencepiece | 24.36 | 24.38 | 60.5M | [wmt14_ende.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/wmt14_ende.tar.gz) (766M)\nde-\u003een | Transformer | sentencepiece | 30.60 | 30.51 | 60.5M | [wmt14_deen.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/wmt14_deen.tar.gz) (766M)\n\nsacrebleu signature: `nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.2.0`\n\n---\n\n\u003e :warning: **Warning**\n\u003e The following models are trained with JoeynNMT v1.x, and decoded with Joey NMT v2.0. \n\u003e See `config_v1.yaml` and `config_v2.yaml` in the linked zip, respectively.\n\u003e Joey NMT v1.x benchmarks are archived [here](docs/benchmarks_v1.md).\n\n### iwslt14 deen\nPre-processing with Moses decoder tools as in [this script](scripts/get_iwslt14_bpe.sh).\n\nDirection | Architecture | tok | dev | test | #params | download\n--------- | :----------: | :-- | --: | ---: | ------: | :-------\nde-\u003een | RNN | subword-nmt | 31.77 | 30.74 | 61M | [rnn_iwslt14_deen_bpe.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/rnn_iwslt14_deen_bpe.tar.gz) (672MB)\nde-\u003een | Transformer | subword-nmt | 34.53 | 33.73 | 19M | [transformer_iwslt14_deen_bpe.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/transformer_iwslt14_deen_bpe.tar.gz) (221MB)\n\nsacrebleu signature: `nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.0.0`\n\n\u003e :memo: **Info**\n\u003e For interactive translate mode, you should specify `pretokenizer: \"moses\"` in the both src's and trg's `tokenizer_cfg`,\n\u003e so that you can input raw sentence. Then `MosesTokenizer` and `MosesDetokenizer` will be applied internally.\n\u003e For test mode, we used the preprocessed texts as input and set `pretokenizer: \"none\"` in the config.\n\n### Masakhane JW300 afen / enaf\nWe picked the pretrained models and configs (bpe codes file etc.) from [masakhane.io](https://github.com/masakhane-io/masakhane-mt).\n\nDirection | Architecture | tok | dev | test | #params | download\n--------- | :----------: | :-- | --: | ---: | ------: | :-------\naf-\u003een | Transformer | subword-nmt | - | 57.70 | 46M | [transformer_jw300_afen.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/transformer_jw300_afen.tar.gz) (525MB)\nen-\u003eaf | Transformer | subword-nmt | 47.24 | 47.31 | 24M | [transformer_jw300_enaf.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/transformer_jw300_enaf.tar.gz) (285MB)\n\nsacrebleu signature: `nrefs:1|case:mixed|eff:no|tok:intl|smooth:exp|version:2.0.0`\n\n### JParaCrawl enja / jaen\nFor training, we split JparaCrawl v2 into train and dev set and trained a model on them.\nPlease check the preprocessing script [here](https://github.com/joeynmt/joeynmt/blob/v2.2/scripts/get_jparacrawl.sh).\nWe tested then on [kftt](http://www.phontron.com/kftt/) test set and [wmt20](https://data.statmt.org/wmt20/translation-task/) test set, respectively. \n\nDirection | Architecture | tok | wmt20 | kftt | #params | download\n--------- | ------------ | :-- | ---: | ------: | ------: | :-------\nen-\u003eja | Transformer | sentencepiece | 17.66 | 14.31 | 225M | [jparacrawl_enja.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/jparacrawl_enja.tar.gz) (2.3GB)\nja-\u003een | Transformer | sentencepiece | 14.97 | 11.49 | 221M | [jparacrawl_jaen.tar.gz](https://cl.uni-heidelberg.de/statnlpgroup/joeynmt2/jparacrawl_jaen.tar.gz) (2.2GB)\n\nsacrebleu signature: \n- en-\u003eja `nrefs:1|case:mixed|eff:no|tok:ja-mecab-0.996-IPA|smooth:exp|version:2.0.0`\n- ja-\u003een `nrefs:1|case:mixed|eff:no|tok:intl|smooth:exp|version:2.0.0`\n\n*Note: In wmt20 test set, `newstest2020-enja` has 1000 examples, `newstest2020-jaen` has 993 examples.*\n\n\n## Coding\nIn order to keep the code clean and readable, we make use of:\n- Style checks:\n  - [pylint](https://pylint.pycqa.org/) with (mostly) PEP8 conventions, see `.pylintrc`.\n  - [yapf](https://github.com/google/yapf), [isort](https://github.com/PyCQA/isort),\n    and [flake8](https://flake8.pycqa.org/); see `.style.yapf`, `setup.cfg` and `Makefile`.\n- Typing: Every function has documented input types.\n- Docstrings: Every function, class and module has docstrings describing their purpose and usage.\n- Unittests: Every module has unit tests, defined in `test/unit/`.\n- Documentation: Update documentation in `docs/source/` accordingly.\n\nTo ensure the repository stays clean, unittests and linters are triggered by github's\nworkflow on every push or pull request to `main` branch. Before you create a pull request,\nyou can check the validity of your modifications with the following commands:\n```bash\nmake test\nmake check\nmake -C docs clean html\n```\n\n## Contributing\nSince this codebase is supposed to stay clean and minimalistic, contributions addressing\nthe following are welcome:\n- code correctness\n- code cleanliness\n- documentation quality\n- speed or memory improvements\n- resolving issues\n- providing pre-trained models\n\nCode extending the functionalities beyond the basics will most likely not end up in the\nmain branch, but we're curious to learn what you used Joey NMT for.\n\n\n## Projects and Extensions\nHere we'll collect projects and repositories that are based on Joey NMT, so you can find\ninspiration and examples on how to modify and extend the code.\n\n### Joey NMT v2.x\n- :ear: **JoeyS2T**. Joey NMT is extended for Speech-to-Text tasks! Checkout the [code](https://github.com/may-/joeys2t) and the [EMNLP 2022 Paper](https://arxiv.org/abs/2210.02545).\n- :right_anger_bubble: **Discord Joey**. This script demonstrates how to deploy Joey NMT models as a Chatbot on Discord. [Code](scripts/discord_joey.py)\n\n### Joey NMT v1.x\n- :spider_web: **Masakhane Web**. [@CateGitau](https://github.com/categitau), [@Kabongosalomon](https://github.com/Kabongosalomon), [@vukosim](https://github.com/vukosim) and team built a whole web translation platform for the African NMT models that Masakhane built with Joey NMT. The best is: it's completely open-source, so anyone can contribute new models or features. Try it out [here](http://translate.masakhane.io/), and check out the [code](https://github.com/dsfsi/masakhane-web).\n- :gear: **MutNMT**. [@sjarmero](https://github.com/sjarmero) created a web application to train NMT: it lets the user train, inspect, evaluate and translate with Joey NMT --- perfect for NMT newbies! Code [here](https://github.com/Prompsit/mutnmt). The tool was developed by [Prompsit](https://www.prompsit.com/) in the framework of the European project [MultiTraiNMT](http://www.multitrainmt.eu/).\n- :star2: **Cantonese-Mandarin Translator**. [@evelynkyl](https://github.com/evelynkyl/) trained different NMT models for translating between the low-resourced Cantonese and Mandarin,  with the help of some cool parallel sentence mining tricks! Check out her work [here](https://github.com/evelynkyl/yue_nmt).\n- :book: **Russian-Belarusian Translator**. [@tsimafeip](https://github.com/tsimafeip) built a translator from Russian to Belarusian and adapted it to legal and medical domains. The code can be found [here](https://github.com/tsimafeip/Translator/).\n- :muscle: **Reinforcement Learning**. [@samuki](https://github.com/samuki/) implemented various policy gradient variants in Joey NMT: here's the [code](https://github.com/samuki/reinforce-joey), could the logo be any more perfect? :muscle: :koala:\n- :hand: **Sign Language Translation**. [@neccam](https://github.com/neccam/) built a sign language translator that continuosly recognizes sign language and translates it. Check out the [code](https://github.com/neccam/slt) and the [CVPR 2020 paper](https://openaccess.thecvf.com/content_CVPR_2020/html/Camgoz_Sign_Language_Transformers_Joint_End-to-End_Sign_Language_Recognition_and_Translation_CVPR_2020_paper.html)!\n- :abc: [@bpopeters](https://github.com/bpopeters/) built [Possum-NMT](https://github.com/deep-spin/sigmorphon-seq2seq) for multilingual grapheme-to-phoneme transduction and morphologic inflection. Read their [paper](https://www.aclweb.org/anthology/2020.sigmorphon-1.4.pdf) for SIGMORPHON 2020!\n- :camera: **Image Captioning**. [@pperle](https://github.com/pperle) and [@stdhd](https://github.com/stdhd) built an image captioning tool on top of Joey NMT, check out the [code](https://github.com/stdhd/image_captioning) and the [demo](https://image2caption.pascalperle.de/)!\n- :bulb: **Joey Toy Models**. [@bricksdont](https://github.com/bricksdont) built a [collection of scripts](https://github.com/bricksdont/joeynmt-toy-models) showing how to install Joey NMT, preprocess data, train and evaluate models. This is a great starting point for anyone who wants to run systematic experiments, tends to forget python calls, or doesn't like to run notebook cells! \n- :earth_africa: **African NMT**. [@jaderabbit](https://github.com/jaderabbit) started an initiative at the Indaba Deep Learning School 2019 to [\"put African NMT on the map\"](https://twitter.com/alienelf/status/1168159616167010305). The goal is to build and collect NMT models for low-resource African languages. The [Masakhane repository](https://github.com/masakhane-io/masakhane-mt) contains and explains all the code you need to train Joey NMT and points to data sources. It also contains benchmark models and configurations that members of Masakhane have built for various African languages. Furthermore, you might be interested in joining the [Masakhane community](https://github.com/masakhane-io/masakhane-community) if you're generally interested in low-resource NLP/NMT. Also see the [EMNLP Findings paper](https://arxiv.org/abs/2010.02353).\n- :speech_balloon: **Slack Joey**. [Code](https://github.com/juliakreutzer/slack-joey) to locally deploy a Joey NMT model as chat bot in a Slack workspace. It's a convenient way to probe your model without having to implement an API. And bad translations for chat messages can be very entertaining, too ;)\n- :globe_with_meridians: **Flask Joey**. [@kevindegila](https://github.com/kevindegila) built a [flask interface to Joey](https://github.com/kevindegila/flask-joey), so you can deploy your trained model in a web app and query it in the browser. \n- :busts_in_silhouette: **User Study**. We evaluated the code quality of this repository by testing the understanding of novices through quiz questions. Find the details in Section 3 of the [Joey NMT paper](https://arxiv.org/abs/1907.12484).\n- :pencil: **Self-Regulated Interactive Seq2Seq Learning**. Julia Kreutzer and Stefan Riezler. Published at ACL 2019. [Paper](https://arxiv.org/abs/1907.05190) and [Code](https://github.com/juliakreutzer/joeynmt/tree/acl19). This project augments the standard fully-supervised learning regime by weak and self-supervision for a better trade-off of quality and supervision costs in interactive NMT.\n- :camel: **Hieroglyph Translation**. Joey NMT was used to translate hieroglyphs in [this IWSLT 2019 paper](https://www.cl.uni-heidelberg.de/statnlpgroup/publications/IWSLT2019.pdf) by Philipp Wiesenbach and Stefan Riezler. They gave Joey NMT multi-tasking abilities. \n\nIf you used Joey NMT for a project, publication or built some code on top of it, let us know and we'll link it here.\n\n\n## Contact\nPlease leave an issue if you have questions or issues with the code.\n\nFor general questions, email us at `joeynmt \u003cat\u003e gmail.com`. :love_letter:\n\n\n## Reference\nIf you use Joey NMT in a publication or thesis, please cite the following [paper](https://arxiv.org/abs/1907.12484):\n\n```\n@inproceedings{kreutzer-etal-2019-joey,\n    title = \"Joey {NMT}: A Minimalist {NMT} Toolkit for Novices\",\n    author = \"Kreutzer, Julia  and\n      Bastings, Jasmijn  and\n      Riezler, Stefan\",\n    booktitle = \"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations\",\n    month = nov,\n    year = \"2019\",\n    address = \"Hong Kong, China\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/D19-3019\",\n    doi = \"10.18653/v1/D19-3019\",\n    pages = \"109--114\",\n}\n```\n\n## Naming\nJoeys are [infant marsupials](https://en.wikipedia.org/wiki/Marsupial#Early_development). :koala:\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoeynmt%2Fjoeynmt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoeynmt%2Fjoeynmt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoeynmt%2Fjoeynmt/lists"}