{"id":19876938,"url":"https://github.com/eurus-holmes/mnmt","last_synced_at":"2025-10-30T08:02:48.522Z","repository":{"id":95490637,"uuid":"214981688","full_name":"Eurus-Holmes/MNMT","owner":"Eurus-Holmes","description":"Pytorch implementation of Multimodal Neural Machine Translation(MNMT).","archived":false,"fork":false,"pushed_at":"2021-01-21T03:57:13.000Z","size":37846,"stargazers_count":12,"open_issues_count":2,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-27T00:54:29.795Z","etag":null,"topics":["multimodal","nmt","pytorch-implementation"],"latest_commit_sha":null,"homepage":"https://chenfeiyang.top/MNMT/","language":"Smalltalk","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Eurus-Holmes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-14T07:53:48.000Z","updated_at":"2022-05-26T00:40:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"f81f451d-fb99-4aa6-8ab2-e183f880aadb","html_url":"https://github.com/Eurus-Holmes/MNMT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eurus-Holmes%2FMNMT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eurus-Holmes%2FMNMT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eurus-Holmes%2FMNMT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eurus-Holmes%2FMNMT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Eurus-Holmes","download_url":"https://codeload.github.com/Eurus-Holmes/MNMT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241304294,"owners_count":19941101,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["multimodal","nmt","pytorch-implementation"],"created_at":"2024-11-12T16:34:49.382Z","updated_at":"2025-10-30T08:02:48.368Z","avatar_url":"https://github.com/Eurus-Holmes.png","language":"Smalltalk","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multimodal Neural Machine Translation\n\n\u003e Reference: [Iacer Calixto's MultimodalNMT](https://github.com/iacercalixto/MultimodalNMT)\n\n\u003e This is the implementation of **four different multi-modal neural machine translation models** described in the research papers [(1)](http://aclweb.org/anthology/D17-1105) and [(2)](https://aclweb.org/anthology/P/P17/P17-1175.pdf).\nThey are based on the [Pytorch](https://github.com/pytorch/pytorch) port of [OpenNMT](https://github.com/OpenNMT/OpenNMT), an open-source (MIT) neural machine translation system.\n\n## Related Work\n\n - [papers](https://github.com/Eurus-Holmes/MNMT/tree/master/papers)\n \n## Dataset\n\n - [Multi30k Dataset](https://github.com/multi30k/dataset)\n\n \n## Requirements\n\nThe code is successfully tested on `PyTorch=1.3.1` and `torchtext=0.2.3`. If you have any questions, you could see my [issues](https://github.com/Eurus-Holmes/MNMT/issues?q=is%3Aissue+is%3Aclosed) or feel free to add your [issue](https://github.com/Eurus-Holmes/MNMT/issues).\n\nIn case one of the two are missing or not up-to-date and assuming you installed pytorch using the conda package manager and torchtext using pip, you might want to run the following:\n\n```bash\nconda install -c soumith pytorch\npip install torchtext==0.2.3\npip install -r requirements.txt\n```\n\n\n## Run the Code\n\n### Step 0: Extract the image features for the Multi30k data set.\n\nIf you are using image features extracted by someone else, you can skip this step.\n\nWe assume you have downloaded the [Multi30k data set](http://www.statmt.org/wmt16/multimodal-task.html) and have the training, validation and test images locally (make sure you download the `test2016` test set). Together with the image files, you need text files with the image file names in the training, validation, and test sets, respectively. These are named `train_images.txt`,`val_images.txt`, and `test_images.txt`, and are part of the original Flickr30k data set. If you download them from the [WMT Multi-modal MT shared task website](http://www.statmt.org/wmt16/multimodal-task.html), you might need to adjust the file names accordingly.\n\nIn order to extract the image features, run the following script:\n\n```bash\npython extract_image_features.py --gpuid 0 --pretrained_cnn vgg19_bn --splits=train,valid,test --images_path ./path/to/flickr30k/images/ --train_fnames ./path/to/flickr30k/train_images.txt --valid_fnames ./path/to/flickr30k/val_images.txt --test_fnames ./path/to/flickr30k/test2016_images.txt\n```\n\nThis will use GPU 0 to extract features with the pre-trained VGG19 with batch normalisation, for the training, validation and test sets of the Flickr30k. Change the name of the pre-trained CNN to any of the CNNs available under [this repository](https://github.com/Cadene/pretrained-models.pytorch), and the model will automatically use this CNN to extract features. **This script will extract both global and local visual features**.\n\n\n### Step 1: Preprocess the data\n\nThat is the same way as you would do with a text-only NMT model. **Important**: *the preprocessing script only uses the textual portion of the multi-modal machine translation data set*!\n\nIn here, we assume you have downloaded the [Multi30k data set](http://www.statmt.org/wmt16/multimodal-task.html) and extracted the sentences in its training, validation and test sets. After pre-processing them (e.g. tokenising, lowercasing, and applying a [BPE model](https://github.com/rsennrich/subword-nmt)), feed the training and validation sets to the `preprocess.py` script, as below.\n\n```bash\npython preprocess.py -train_src ./path/to/flickr30k/train.norm.tok.lc.10000bpe.en -train_tgt ./path/to/flickr30k/train.norm.tok.lc.10000bpe.de -valid_src ./path/to/flickr30k/val.norm.tok.lc.10000bpe.en -valid_tgt ./path/to/flickr30k/val.norm.tok.lc.10000bpe.de -save_data ./data/m30k\n```\n\n\n### Step 2: Train the model\n\nTo train a multi-modal NMT model, use the `train_mm.py` script. In addition to the parameters accepted by the standard `train.py` (that trains a text-only NMT model), this script expects the path to the training and validation image features, as well as the multi-modal model type (one of `imgd`, `imge`, `imgw`, or `src+img`).\n\nFor a complete description of the different multi-modal NMT model types, please refer to the papers where they are described [(1)](http://aclweb.org/anthology/D17-1105) and [(2)](https://aclweb.org/anthology/P/P17/P17-1175.pdf).\n\n```bash\npython train_mm.py -data data/m30k -save_model model_snapshots/IMGD_ADAM -gpuid 0 -epochs 25 -batch_size 40 -path_to_train_img_feats ./flickr30k_train_vgg19_bn_cnn_features.hdf5 -path_to_valid_img_feats ./flickr30k_valid_vgg19_bn_cnn_features.hdf5 -optim adam -learning_rate 0.002 -use_nonlinear_projection --multimodal_model_type imgd\n```\n\nIn case you want to continue training from a previous checkpoint, simply run (for example):\n\n```bash\nMODEL_SNAPSHOT=IMGD_ADAM_acc_60.79_ppl_8.38_e4.pt\npython train_mm.py -data data/m30k -save_model model_snapshots/IMGD_ADAM -gpuid 0 -epochs 25 -batch_size 40 -path_to_train_img_feats /path/to/flickr30k/features/flickr30k_train_vgg19_bn_cnn_features.hdf5 -path_to_valid_img_feats /path/to/flickr30k/features/flickr30k_valid_vgg19_bn_cnn_features.hdf5 -optim adam -learning_rate 0.002 -use_nonlinear_projection --multimodal_model_type imgd -train_from model_snapshots/${MODEL_SNAPSHOT}\n```\n\nAs an example, if you wish to train a doubly-attentive NMT model (referred to as `src+img`), try the following command:\n\n```bash\npython train_mm.py -data data/m30k -save_model model_snapshots/NMT-src-img_ADAM -gpuid 0 -epochs 25 -batch_size 40 -path_to_train_img_feats /path/to/flickr30k/features/flickr30k_train_vgg19_bn_cnn_features.hdf5 -path_to_valid_img_feats /path/to/flickr30k/features/flickr30k_valid_vgg19_bn_cnn_features.hdf5 -optim adam -learning_rate 0.002 -use_nonlinear_projection --decoder_type doubly-attentive-rnn --multimodal_model_type src+img\n```\n\n\n### Step 3: Translate new sentences\n\nTo translate a new test set, simply use `translate_mm.py` similarly as you would use the original `translate.py` script, with the addition of the path to the file containing the test image features. In the example below, we translate the Multi30k test set used in the 2016 run of the WMT Multi-modal MT Shared Task.\n\n```bash\nMODEL_SNAPSHOT=IMGD_ADAM_acc_60.79_ppl_8.38_e4.pt\npython translate_mm.py -src ~/exp/opennmt_imgd/data_multi30k/test2016.norm.tok.lc.bpe10000.en -model model_snapshots/${MODEL_SNAPSHOT} -path_to_test_img_feats ~/resources/multi30k/features/flickr30k_test_vgg19_bn_cnn_features.hdf5 -output model_snapshots/${MODEL_SNAPSHOT}.translations-test2016\n```\n\n### Todo\n\n - [Strange Results](https://github.com/Eurus-Holmes/MNMT/issues/8)\n\n## Citation\n\nIf you use the multi-modal NMT models in this repository, please consider citing the research papers where they are described [(1)](http://aclweb.org/anthology/D17-1105) and [(2)](https://aclweb.org/anthology/P/P17/P17-1175.pdf):\n\n```\n@InProceedings{CalixtoLiu2017EMNLP,\n  Title                    = {{Incorporating Global Visual Features into Attention-Based Neural Machine Translation}},\n  Author                   = {Iacer Calixto and Qun Liu},\n  Booktitle                = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},\n  Year                     = {2017},\n  Address                  = {Copenhagen, Denmark},\n  Url                      = {http://aclweb.org/anthology/D17-1105}\n}\n```\n\n```\n@InProceedings{CalixtoLiuCampbell2017ACL,\n  author    = {Calixto, Iacer  and  Liu, Qun  and  Campbell, Nick},\n  title     = {{Doubly-Attentive Decoder for Multi-modal Neural Machine Translation}},\n  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},\n  month     = {July},\n  year      = {2017},\n  address   = {Vancouver, Canada},\n  publisher = {Association for Computational Linguistics},\n  pages     = {1913--1924},\n  url       = {http://aclweb.org/anthology/P17-1175}\n}\n```\n\nIf you use OpenNMT, please cite as below.\n\n[OpenNMT technical report](https://doi.org/10.18653/v1/P17-4012)\n\n```\n@inproceedings{opennmt,\n  author    = {Guillaume Klein and\n               Yoon Kim and\n               Yuntian Deng and\n               Jean Senellart and\n               Alexander M. Rush},\n  title     = {OpenNMT: Open-Source Toolkit for Neural Machine Translation},\n  booktitle = {Proc. ACL},\n  year      = {2017},\n  url       = {https://doi.org/10.18653/v1/P17-4012},\n  doi       = {10.18653/v1/P17-4012}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feurus-holmes%2Fmnmt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feurus-holmes%2Fmnmt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feurus-holmes%2Fmnmt/lists"}