https://github.com/THUNLP-MT/MT-Reading-List

A machine translation reading list maintained by Tsinghua Natural Language Processing Group
https://github.com/THUNLP-MT/MT-Reading-List
machine-translation reading-list
Last synced: 10 months ago
JSON representation
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
Host: GitHub
URL: https://github.com/THUNLP-MT/MT-Reading-List
Owner: THUNLP-MT
License: bsd-3-clause
Created: 2018-12-03T10:45:15.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-08-09T14:48:06.000Z (over 1 year ago)
Last Synced: 2025-03-26T09:24:46.213Z (11 months ago)
Topics: machine-translation, reading-list
Language: TeX
Homepage:
Size: 997 KB
Stars: 2,441
Watchers: 164
Forks: 447
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

awesome-ai-list-guide - MT-Reading-List
awesome-machine-learning-resources - **[List - MT/MT-Reading-List?style=social) (Table of Contents)
awesome-machine-translation - MT-Reading-List - A machine translation reading list maintained by the Tsinghua Natural Language Processing Group. (Other MT Lists 📝)
fucking-machine-learning-tutorials - Machine Translation Reading List
Machine-Learning-Tutorials - Machine Translation Reading List
README

          # Machine Translation Reading List

This is a machine translation reading list maintained by the Tsinghua Natural Language Processing Group. 

The past three decades have witnessed the rapid development of machine translation, especially for data-driven approaches such as statistical machine translation (SMT) and neural machine translation (NMT). Due to the dominance of NMT at the present time, priority is given to collecting important, up-to-date NMT papers; the [Edinburgh/JHU MT research survey wiki](http://www.statmt.org/survey/) has good coverage of older papers and a brief description for each sub-topic of MT. Our list is still incomplete and the categorization might be inappropriate. We will keep adding papers and improving the list. Any suggestions are welcome!

* [10 Must Reads](#10_must_reads)

* [Tutorials and Surveys](#surveys)

* [Statistical Machine Translation](#statistical_machine_translation)

    * [Word-based Models](#word_based_models)

    * [Phrase-based Models](#phrase_based_models)

    * [Syntax-based Models](#syntax_based_models)

    * [Discriminative Training](#discriminative_training)

    * [System Combination](#system_combination)

    * [Human-centered SMT](#human_centered_smt)

        * [Interactive SMT](#interactive)

        * [Adaptation](#adaptation_smt)

 * [Evaluation](#evaluation)

 * [Neural Machine Translation](#neural_machine_translation)

    * [Model Architecture](#model_architecture)

    * [Attention Mechanism](#attention_mechanism)

    * [Open Vocabulary](#open_vocabulary)

    * [Training Objectives and Frameworks](#training)

    * [Decoding](#decoding)

    * [Low-resource Language Translation](#low_resource_language_translation)

        * [Semi-supervised Learning](#semi_supervised)

        * [Unsupervised Learning](#unsupervised)

        * [Pivot-based Methods](#pivot_based)

        * [Data Augmentation](#data_augmentation)

        * [Data Selection](#data_selection)

        * [Transfer Learning](#transfer_learning)

        * [Meta Learning](#meta_learning)

    * [Multilingual Machine Translation](#multilingual)

    * [Prior Knowledge Integration](#prior_knowledge_integration)

        * [Word/Phrase Constraints](#word_phrase_constraints)

        * [Syntactic/Semantic Constraints](#syntactic_semantic_constraints)

        * [Coverage Constraints](#coverage_constraints)

    * [Document-level Translation](#document_level_translation)

    * [Robustness](#robustness)

    * [Interpretability](#interpretability)

    * [Linguistic Interpretation](#linguistic_interpretation)

    * [Fairness and Diversity](#fairness_and_diversity)

    * [Efficiency](#efficiency)

    * [Non-Autoregressive Translation](#NAT)

    * [Speech Translation and Simultaneous Translation](#speech_translation_and_simultaneous_translation)

    * [Multi-modality](#multi_modality)

    * [Ensemble and Reranking](#ensemble_reranking)

    * [Pre-training](#pre_training)

    * [Domain Adaptation](#domain_adaptation)

    * [Quality Estimation](#quality_estimation)

    * [Human-centered NMT](#human_centered)

        * [Interactive NMT](#interactive_nmt)

        * [Automatic Post-Editing](#ape)

    * [Poetry Translation](#poetry_translation)  

    * [Eco-friendly](#eco_friendly)    

    * [Compositional Generalization](#compositional_generalization)

    * [Endangered Language Revitalization](#endangered)

* [Word Translation (Bilingual Lexicon Induction)](#word_translation)

* [WMT Winners](#wmt_winners)

    * [WMT 2019](#wmt19)

    * [WMT 2018](#wmt18)

    * [WMT 2017](#wmt17)

    * [WMT 2016](#wmt16)

10 Must Reads
 

* Peter E. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. [The Mathematics of Statistical Machine Translation: Parameter Estimation](http://aclweb.org/anthology/J93-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=2259057253133260714&as_sdt=2005&sciodt=0,5&hl=en): 5,218)

* Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. [BLEU: a Method for Automatic Evaluation of Machine Translation](http://aclweb.org/anthology/P02-1040). In *Proceedings of ACL 2002*. ([Citation](https://scholar.google.com/scholar?cites=9019091454858686906&as_sdt=2005&sciodt=0,5&hl=en): 10,700)

* Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. [Statistical Phrase-Based Translation](http://aclweb.org/anthology/N03-1017). In *Proceedings of NAACL 2003*. ([Citation](https://scholar.google.com/scholar?cites=11796378766060939113&as_sdt=2005&sciodt=0,5&hl=en): 3,713)

* Franz Josef Och. 2003. [Minimum Error Rate Training in Statistical Machine Translation](http://aclweb.org/anthology/P03-1021). In *Proceedings of ACL 2003*. ([Citation](https://scholar.google.com/scholar?cites=15358949031331886708&as_sdt=2005&sciodt=0,5&hl=en): 3,115)

* David Chiang. 2007. [Hierarchical Phrase-Based Translation](http://aclweb.org/anthology/J07-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=17074501474509484516&as_sdt=2005&sciodt=0,5&hl=en): 1,235)

* Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. [Sequence to Sequence Learning

with Neural Networks](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf). In *Proceedings of NIPS 2014*. ([Citation](https://scholar.google.com/scholar?cites=13133880703797056141&as_sdt=2005&sciodt=0,5&hl=en): 9,432)

* Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473.pdf). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com/scholar?cites=9430221802571417838&as_sdt=2005&sciodt=0,5&hl=en): 10,479)

* Diederik P. Kingma, Jimmy Ba. 2015. [Adam: A Method for Stochastic Optimization](https://arxiv.org/pdf/1412.6980). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com/scholar?cites=16194105527543080940&as_sdt=2005&sciodt=0,5&hl=en): 37,480)

* Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=1307964014330144942&as_sdt=2005&sciodt=0,5&hl=en): 1,679)

* Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. [Attention is All You Need](https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com/scholar?cites=2960712678066186980&as_sdt=2005&sciodt=0,5&hl=en): 6,112)

Tutorials and Surveys
 

* Zhixing Tan, Shuo Wang, Zonghan Yang, Gang Chen, Xuancheng Huang, Maosong Sun, and Yang Liu. 2020. [Neural Machine Translation: A Review of Methods, Resources, and Tools](https://arxiv.org/abs/2012.15515). *AI Open*.

* Felix Stahlberg. 2020. [Neural Machine Translation: A Review and Survey](https://arxiv.org/abs/1912.02047). *Journal of Artificial Intelligence Research*.

* Philipp Koehn and Rebecca Knowles. 2017. [Six Challenges for Neural Machine Translation](http://www.aclweb.org/anthology/W17-3204). In *Proceedings of the First Workshop on Neural Machine Translation*.

* Philipp Koehn. 2017. [Neural Machine Translation](https://arxiv.org/abs/1709.07809). *arxiv:1709.07809*. 

* Oriol Vinyals and Navdeep Jaitly. 2017. [Seq2Seq ICML Tutorial](https://docs.google.com/presentation/d/1quIMxEEPEf5EkRHc2USQaoJRC4QNX6_KomdZTBMBWjk/present?slide=id.p). *ICML 2017 Tutorial*.

* Raj Dabre, Chenhui Chu, and Anoop Kunchukuttan. 2020. [A Survey of Multilingual Neural Machine Translation](https://doi.org/10.1145/3406095). *ACM Computing Surveys. Surv. 53, 5, Article 99 (October 2020)*. 

   * Tutorial on [Multilingual Neural Machine Translation](https://github.com/anoopkunchukuttan/multinmt_tutorial_coling2020) at COLING 2020.

* Graham Neubig. 2017. [Neural Machine Translation and Sequence-to-sequence Models: A Tutorial](https://arxiv.org/pdf/1703.01619.pdf). *arXiv:1703.01619*. ([Citation](https://scholar.google.com/scholar?cites=17621873290135947085&as_sdt=2005&sciodt=0,5&hl=en): 45)

* Thang Luong, Kyunghyun Cho, and Christopher Manning. 2016. [Neural Machine Translation](https://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf). *ACL 2016 Tutorial*.  

* Adam Lopez. 2008. [Statistical Machine Translation](http://delivery.acm.org/10.1145/1390000/1380586/a8-lopez.pdf?ip=101.5.129.50&id=1380586&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2E587F3204F5B62A59%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1546058891_981e84a24804f2dbc0549b9892a2ea1d). *ACM Computing Surveys*.

* Philipp Koehn. 2006. [Statistical Machine Translation: the Basic, the Novel, and the Speculative](http://homepages.inf.ed.ac.uk/pkoehn/publications/tutorial2006.pdf). *EACL 2006 Tutorial*.

Statistical Machine Translation


Word-based Models


* Peter E. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. [The Mathematics of Statistical Machine Translation: Parameter Estimation](http://aclweb.org/anthology/J93-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=2259057253133260714&as_sdt=2005&sciodt=0,5&hl=en): 4,965)

* Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. [HMM-Based Word Alignment in Statistical Translation](http://aclweb.org/anthology/C96-2141). In *Proceedings of COLING 1996*. ([Citation](https://scholar.google.com.hk/scholar?cites=6742027174667056165&as_sdt=2005&sciodt=0,5&hl=en): 940)

* Franz Josef Och and Hermann Ney. 2003. [A Systematic Comparison of Various Statistical Alignment Models](http://aclweb.org/anthology/J03-1002). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=7906670690027479083&as_sdt=2005&sciodt=0,5&hl=en): 3,980)

* Percy Liang, Ben Taskar, and Dan Klein. 2006. [Alignment by Agreement](https://cs.stanford.edu/~pliang/papers/alignment-naacl2006.pdf). In *Proceedings of NAACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=10766838746666771394&as_sdt=2005&sciodt=0,5&hl=en): 452)

* Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. [A Simple, Fast, and Effective Reparameterization of IBM Model 2](http://www.aclweb.org/anthology/N13-1073). In *Proceedings of NAACL 2013*. ([Citation](https://scholar.google.com.hk/scholar?cites=13560076980956479370&as_sdt=2005&sciodt=0,5&hl=en): 310)

Phrase-based Models


* Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. [Statistical Phrase-Based Translation](http://aclweb.org/anthology/N03-1017). In *Proceedings of NAACL 2003*. ([Citation](https://scholar.google.com.hk/scholar?cites=11796378766060939113&as_sdt=2005&sciodt=0,5&hl=en): 3,516)

* Michel Galley and Christopher D. Manning. 2008. [A Simple and Effective Hierarchical Phrase Reordering Model](https://nlp.stanford.edu/pubs/emnlp08-lexorder.pdf). In *Proceedings of EMNLP 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=14572547803642015856&as_sdt=2005&sciodt=0,5&hl=en): 275)

Syntax-based Models


* Dekai Wu. 1997. [Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora](http://aclweb.org/anthology/J97-3002). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=7926725626202301933&as_sdt=2005&sciodt=0,5&hl=en): 1,009)

* Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. [Scalable Inference and Training of Context-Rich Syntactic Translation Models](http://aclweb.org/anthology/P06-1121). In *Proceedings of COLING/ACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=2650671041278094269&as_sdt=2005&sciodt=0,5&hl=en): 475)

* Yang Liu, Qun Liu, and Shouxun Lin. 2006. [Tree-to-String Alignment Template for Statistical Machine Translation](http://aclweb.org/anthology/P06-1077). In *Proceedings of COLING/ACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=8683308453323663525&as_sdt=2005&sciodt=0,5&hl=en): 391)

* Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. [Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation](https://aclanthology.info/pdf/P/P06/P06-1066.pdf). In *Proceedings of COLING/ACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=11896300896063367737&as_sdt=2005&sciodt=0,5&hl=en): 299)

* David Chiang. 2007. [Hierarchical Phrase-Based Translation](http://aclweb.org/anthology/J07-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=17074501474509484516&as_sdt=2005&sciodt=0,5&hl=en): 1,192)

* Liang Huang and David Chiang. 2007. [Forest Rescoring: Faster Decoding with Integrated Language Models](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.5058&rep=rep1&type=pdf). In *Proceedings of ACL 2007*. ([Citation](https://scholar.google.com.hk/scholar?cites=2826188279623417237&as_sdt=2005&sciodt=0,5&hl=en): 280)

* Haitao Mi, Liang Huang, and Qun Liu. 2008. [Forest-based Translation](http://aclweb.org/anthology/P08-1023). *In Proceedings of ACL 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=11263493281241243162&as_sdt=2005&sciodt=0,5&hl=en): 239)

* Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. [A Tree Sequence Alignment-based Tree-to-Tree Translation Model](http://www.aclweb.org/anthology/P08-1064). In *Proceedings of ACL 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=4828105603038412208&as_sdt=2005&sciodt=0,5&hl=en): 124)

* Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. [A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model](http://aclweb.org/anthology/P08-1066). In *Proceedings of ACL 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=15082517325172081801&as_sdt=2005&sciodt=0,5&hl=en): 278)

* Haitao Mi and Liang Huang. 2008. [Forest-based Translation Rule Extraction](http://aclweb.org/anthology/D08-1022). In *Proceedings of EMNLP 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=11263493281241243162&as_sdt=2005&sciodt=0,5&hl=en): 239)

* Yang Liu, Yajuan Lü, and Qun Liu. 2009. [Improving Tree-to-Tree Translation with Packed Forests](http://aclweb.org/anthology/P09-1063). In *Proceedings of ACL/IJNLP 2009*. ([Citation](https://scholar.google.com.hk/scholar?cites=3907324274083528908&as_sdt=2005&sciodt=0,5&hl=en): 93)

* David Chiang. 2010. [Learning to Translate with Source and Target Syntax](http://aclweb.org/anthology/P10-1146). In *Proceedings of ACL 2010*. ([Citation](https://scholar.google.com.hk/scholar?cites=18270412258769590027&as_sdt=2005&sciodt=0,5&hl=en): 118)

Discriminative Training


* Franz Josef Och and Hermann Ney. 2002. [Discriminative Training and Maximum Entropy Models for Statistical Machine Translation](http://aclweb.org/anthology/P02-1038). In *Proceedings of ACL 2002*. ([Citation](https://scholar.google.com.hk/scholar?cites=2845378992177918439&as_sdt=2005&sciodt=0,5&hl=en): 1,258)

* Franz Josef Och. 2003. [Minimum Error Rate Training in Statistical Machine Translation](http://aclweb.org/anthology/P03-1021). In *Proceedings of ACL 2003*. ([Citation](https://scholar.google.com.hk/scholar?cites=15358949031331886708&as_sdt=2005&sciodt=0,5&hl=en): 2,984)

* Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. [Online Large-Margin Training for Statistical Machine Translation](http://aclweb.org/anthology/D07-1080). In *Proceedings of EMNLP-CoNLL 2007*. ([Citation](https://scholar.google.com.hk/scholar?cites=6690339336101573833&as_sdt=2005&sciodt=0,5&hl=en): 197)

* David Chiang, Kevin Knight, and Wei Wang. 2009. [11,001 New Features for Statistical Machine Translation](http://aclweb.org/anthology/N09-1025). In *Proceedings of NAACL 2009*. ([Citation](https://scholar.google.com.hk/scholar?cites=14062409519286340147&as_sdt=2005&sciodt=0,5&hl=en): 251)

System Combination


* Antti-Veikko Rosti, Spyros Matsoukas, and Richard Schwartz. 2007. [Improved Word-Level System Combination for Machine Translation](http://aclweb.org/anthology/P07-1040). In *Proceedings of ACL 2007*. ([Citation](https://scholar.google.com.hk/scholar?cites=13310846375895519088&as_sdt=2005&sciodt=0,5&hl=en): 144)

* Xiaodong He, Mei Yang, Jianfeng Gao, Patrick Nguyen, and Robert Moore. 2008. [Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems](http://aclweb.org/anthology/D08-1011). In *Proceedings of EMNLP 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=5843300493006970528&as_sdt=2005&sciodt=0,5&hl=en): 96)

Human-centered SMT


Interactive SMT


* George Foster, Pierre Isabelle and Pierre Plamondon. 1997. [Target-text mediated interactive machine translation](https://sci-hub.tw/10.2307/40009035). *Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=17084037882064721827&as_sdt=2005&sciodt=0,5): 116)

* Philippe Langlais, Guy Lapalme and Marie Lorange. 2002. [TransType: Development-Evaluation Cycles to Boost Translator’s Productivity](https://sci-hub.tw/10.2307/40007093). *Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=7892155138946158318&as_sdt=2005&sciodt=0,5): 74)

* Jesús Tomas and Francisco Casacuberta. 2006. [Statistical phrase-based models for interactive computer-assisted translation](http://aclweb.org/anthology/P06-2107). In *Proceedings of COLING/ACL*. ([Citation](https://scholar.google.com/scholar?cites=2242179645100420046&as_sdt=2005&sciodt=0,5): 31)

* Enrique Vidal, Francisco Casacuberta, Luis Rodríguez-Ruiz, Jorge Civera, Carlos D. Martínez-Hinarejos. 2006. [Computer-Assisted Translation Using Speech Recognition](https://ieeexplore.ieee.org/document/1621206). *IEEE Transaction on Audio, Speech and Language Processing*. ([Citation](https://scholar.google.com/scholar?cites=32625184311110830&as_sdt=2005&sciodt=0,5): 62)

* Shahram Khadivi and Hermann Ney. 2008. [Integration of Speech Recognition and Machine Translation in Computer-Assisted Translation](https://sci-hub.tw/10.1109/tasl.2008.2004301). *IEEE Transaction on Audio, Speech and Language Processing*. ([Citation](https://scholar.google.com/scholar?cites=1690852455408892756&as_sdt=2005&sciodt=0,5): 30)

* Sergio Barrachina, Oliver Bender, Francisco Casacuberta, Jorge Civera, Elsa Cubel, Shahram Khadivi, Antonio L. Lagarda, Hermann Ney, Jesús Tomás and Enrique Vidal. 2009. [Statistical approaches to computer-assisted translation](https://www.mitpressjournals.org/doi/abs/10.1162/coli.2008.07-055-R2-06-29). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=17691637682117292572&as_sdt=2005&sciodt=0,5): 207)

* Francisco Casacuberta, Jorge Civera, Elsa Cubel, Antonio L. Lagarda, Guy Lapalme, Elliott Macklovitch, Enrique Vidal. 2009. [Human interaction for high quality machine translation](https://sci-hub.tw/10.1145/1562764.1562798). *Communications of the ACM*. ([Citation](https://scholar.google.com/scholar?cites=6184654159576071790&as_sdt=2005&sciodt=0,5): 49)

* Vicent Alabau, Alberto Sanchis and Francisco Casacuberta. 2014. [Improving on-line handwritten recognition in interactive machine translation](sci-hub.tw/10.1016/j.patcog.2013.09.035). *Pattern Recognition*. ([Citation](https://scholar.google.com/scholar?cites=11987123133913382404&as_sdt=2005&sciodt=0,5): 18)

* Shanbo Cheng, Shujian Huang, Huadong Chen, Xin-Yu Dai and  Jiajun Chen. 2016. [PRIMT: A Pick-Revise Framework for Interactive Machine Translation](http://www.aclweb.org/anthology/N16-1148). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=3643727460542665178&as_sdt=2005&sciodt=0,5): 9)

* Miguel Domingo, Álvaro Peris and Francisco Casacuberta. 2018. [Segment-based interactive-predictive machine translation](https://www.researchgate.net/publication/322275484_Segment-based_interactive-predictive_machine_translation). *Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=4148585683672959462&as_sdt=2005&sciodt=0,5): 2)

Adaptation


* Pascual Martínez-Gómez, Germán Sanchis-Trilles and Francisco Casacuberta. 2012. [Online adaptation strategies for statistical machine translation in post-editing scenarios](https://sci-hub.tw/10.1016/j.patcog.2012.01.011). *Pattern Recognition*. ([Citation](https://scholar.google.com/scholar?cites=9143628035426486873&as_sdt=2005&sciodt=0,5): 40)

* Jesús González-Rubio and Francisco Casacuberta. 2014. [Cost-Sensitive Active Learning for Computer-Assisted Translation](https://sci-hub.tw/10.1016/j.patrec.2013.06.007). *Pattern Recognition Letters*. ([Citation](https://scholar.google.com/scholar?cites=13196627956841822823&as_sdt=2005&sciodt=0,5): 11)

* Antonio L. Lagarda, Daniel Ortiz-Martínez, Vicent Alabau and Francisco Casacuberta. 2015. [Translating without in-domain corpus: Machine translation post-editing with online learning techniques](https://sci-hub.tw/10.1016/j.csl.2014.10.004). *Computer Speech & Language*. ([Citation](https://scholar.google.com/scholar?cites=6721510771212778605&as_sdt=2005&sciodt=0,5): 10)

* Germán Sanchis-Trilles, Francisco Casacuberta. 2015. [Improving translation quality stability using Bayesian predictive adaptation](https://sci-hub.tw/10.1016/j.csl.2015.03.001). *Computer Speech & Language*. ([Citation](https://scholar.google.com/scholar?q=Improving+translation+quality+stability+using+Bayesian+predictive+adaptation): 1)

* Daniel Ortiz-Martínez. 2016. [Online Learning for Statistical Machine Translation](https://www.mitpressjournals.org/doi/full/10.1162/COLI_a_00244). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=4979468821667106694&as_sdt=2005&sciodt=0,5): 13)

Evaluation


* Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. [BLEU: a Method for Automatic Evaluation of Machine Translation](http://aclweb.org/anthology/P02-1040). In *Proceedings of ACL 2002*. ([Citation](https://scholar.google.com.hk/scholar?cites=9019091454858686906&as_sdt=2005&sciodt=0,5&hl=en): 8,499)

* Philipp Koehn. 2004. [Statistical Significance Tests for Machine Translation Evaluation](http://www.aclweb.org/anthology/W04-3250). In *Proceedings of EMNLP 2004*. ([Citation](https://scholar.google.com.hk/scholar?cites=6141850486206753388&as_sdt=2005&sciodt=0,5&hl=en): 1,015)

* Satanjeev Banerjee and Alon Lavie. 2005. [METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments](http://aclweb.org/anthology/W05-0909). In *Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization*. ([Citation](https://scholar.google.com.hk/scholar?cites=11797833340491598355&as_sdt=2005&sciodt=0,5&hl=en): 1,355)

* Matthew Snover and Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. [A Study of Translation Edit Rate with Targeted Human Annotation](http://mt-archive.info/AMTA-2006-Snover.pdf). In *Proceedings of AMTA 2006*.   ([Citation](https://scholar.google.com.hk/scholar?cites=1809540661740640949&as_sdt=2005&sciodt=0,5&hl=en): 1,713) 

* Maja Popovic. 2015. [chrF: Character n-gram F-score for Automatic MT Evaluation](http://aclweb.org/anthology/W15-3049). In *Proceedings of WMT 2015*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=12169100229181212462): 58) 

* Xin Wang, Wenhu Chen, Yuan-Fang Wang, and William Yang Wang. 2018. [No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling](http://aclweb.org/anthology/P18-1083). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=1809540661740640949&as_sdt=2005&sciodt=0,5&hl=en): 10) 

* Arun Tejasvi Chaganty, Stephen Mussman, and Percy Liang. 2018. [The price of debiasing automatic metrics in natural language evaluation](https://arxiv.org/pdf/1807.02202). In *Proceedings of ACL 2018*.

* Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. 2019. [compare-mt: A Tool for Holistic Comparison of Language Generation Systems](https://arxiv.org/pdf/1903.07926.pdf). In *Proceedings of NAACL 2019*. 

* Robert Schwarzenberg, David Harbecke, Vivien Macketanz, Eleftherios Avramidis, and Sebastian Möller. 2019. [Train, Sort, Explain: Learning to Diagnose Translation Models](https://arxiv.org/pdf/1903.12017.pdf). In *Proceedings of NAACL 2019*. 

* Nitika Mathur, Timothy Baldwin, and Trevor Cohn. 2019. [Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation](https://www.aclweb.org/anthology/P19-1269). In *Proceedings of ACL 2019*. 

* Prathyusha Jwalapuram, Shafiq Joty, Irina Temnikova, and Preslav Nakov. 2019. [Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite](https://arxiv.org/pdf/1909.00131). In *Proceedings of ACL 2019*. 

* Sergey Edunov, Myle Ott, Marc’Aurelio Ranzato and Michael Auli. 2020. [On The Evaluation of Machine Translation Systems Trained With Back-Translation](https://arxiv.org/abs/1908.05204). In *Proceedings of ACL 2020*.

* Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West and Steffen Eger. 2020. [On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation](http://arxiv.org/abs/2005.01196). In *Proceedings of ACL 2020*.

* Marina Fomicheva, Lucia Specia, and Francisco Guzmán. 2020. [Multi-Hypothesis Machine Translation Evaluation](https://www.aclweb.org/anthology/2020.acl-main.113/). In *Proceedings of ACL 2020*.

* Kosuke Takahashi, Katsuhito Sudoh and Satoshi Nakamura. 2020. [Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model](https://www.aclweb.org/anthology/2020.acl-main.327/). In *Proceedings of ACL 2020*.

* Nitika Mathur, Timothy Baldwin and Trevor Cohn. 2020. [Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics](https://www.aclweb.org/anthology/2020.acl-main.448/). In *Proceedings of ACL 2020*.

* Markus Freitag, David Grangier, Isaac Caswell. 2020. [BLEU might be Guilty but References are not Innocent](https://www.aclweb.org/anthology/2020.emnlp-main.5/). In *Proceedings of EMNLP 2020*.

* Yvette Graham, Barry Haddow, Philipp Koehn. 2020. [Statistical Power and Translationese in Machine Translation Evaluation](https://www.aclweb.org/anthology/2020.emnlp-main.6/). In *Proceedings of EMNLP 2020*.

* Brian Thompson, Matt Post. 2020. [Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing](https://www.aclweb.org/anthology/2020.emnlp-main.8/). In *Proceedings of EMNLP 2020*.

* Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie. 2020. [COMET: A Neural Framework for MT Evaluation](https://www.aclweb.org/anthology/2020.emnlp-main.213/). In *Proceedings of EMNLP 2020*.

Neural Machine Translation


Model Architecture


* Nal Kalchbrenner and Phil Blunsom. 2013. [Recurrent Continuous Translation Models](http://aclweb.org/anthology/D13-1176). In *Proceedings of EMNLP 2013*. ([Citation](https://scholar.google.com/scholar?cites=14122455772200752032&as_sdt=2005&sciodt=0,5&hl=en): 623)

* Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. [Sequence to Sequence Learning

with Neural Networks](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf). In *Proceedings of NIPS 2014*. ([Citation](https://scholar.google.com/scholar?cites=13133880703797056141&as_sdt=2005&sciodt=0,5&hl=en): 5,452)

* Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com/scholar?cites=9430221802571417838&as_sdt=2005&sciodt=0,5&hl=en): 5,596)

* Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. [Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation](https://arxiv.org/pdf/1609.08144). In *Proceedings of NIPS 2016*. ([Citation](https://scholar.google.com/scholar?cites=17018428530559089870&as_sdt=2005&sciodt=0,5&hl=en): 1,046)

* Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. 2016. [Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation](http://aclweb.org/anthology/Q16-1027). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=2319930273054317494&as_sdt=2005&sciodt=0,5&hl=en): 73)

* Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. [Incorporating Copying Mechanism in Sequence-to-Sequence Learning](http://aclweb.org/anthology/P16-1154). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=6836221883265474919&as_sdt=2005&sciodt=0,5&hl=en): 254)

* Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. 2016. [Variational Neural Machine Translation](http://aclweb.org/anthology/D16-1050). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?cites=16453011540088245227&as_sdt=2005&sciodt=0,5&hl=en): 38)

* Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. [Neural Machine Translation in Linear Time](https://arxiv.org/pdf/1610.10099). *arXiv:1610.10099*. ([Citation](https://scholar.google.com/scholar?cites=13142156854384740601&as_sdt=5,39&sciodt=0,39&hl=en): 189)

* Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. [Convolutional Sequence to Sequence Learning](https://arxiv.org/pdf/1705.03122.pdf). In *Proceedings of ICML 2017*. ([Citation](https://scholar.google.com/scholar?cites=9032432574575787905&as_sdt=2005&sciodt=0,5&hl=en): 453)

* Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. [A Convolutional Encoder Model for Neural Machine Translation](http://aclweb.org/anthology/P17-1012). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=13078160224216368728&as_sdt=2005&sciodt=0,5&hl=en): 85)

* Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. [Deep Neural Machine Translation with Linear Associative Unit](http://aclweb.org/anthology/P17-1013). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=13710779557836853910&as_sdt=2005&sciodt=0,5&hl=en): 21)

* Matthias Sperber, Graham Neubig, Jan Niehues, and Alex Waibel. 2017. [Neural Lattice-to-Sequence Models for Uncertain Inputs](http://aclweb.org/anthology/D17-1145). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=6601112324222176825&as_sdt=2005&sciodt=0,5&hl=en): 11)

* Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. [Massive Exploration of Neural Machine Translation Architectures](http://aclweb.org/anthology/D17-1151). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=17797498583666145091&as_sdt=2005&sciodt=0,5&hl=en): 114)

* Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. [Attention is All You Need](https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com/scholar?cites=2960712678066186980&as_sdt=2005&sciodt=0,5&hl=en): 1,748)

* Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. [Deliberation Networks: Sequence Generation Beyond One-Pass Decoding](https://papers.nips.cc/paper/6775-deliberation-networks-sequence-generation-beyond-one-pass-decoding.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=5359968740795634948): 38)

* Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, and Hang Li. 2017. [Neural machine translation with reconstruction](https://arxiv.org/pdf/1611.01874). In *Proceedings of AAAI 2017*. ([Citation](https://scholar.google.com/scholar?cites=1310099558617172101&as_sdt=2005&sciodt=0,5&hl=en): 75)

* Lukasz Kaiser, Aidan N. Gomez, and Francois Chollet. 2018. [Depthwise Separable Convolutions for Neural Machine Translation](https://openreview.net/pdf?id=S1jBcueAb). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com/scholar?cites=7520360878420709403&as_sdt=2005&sciodt=0,5&hl=en): 27)

* Yanyao Shen, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu. 2018. [Dense Information Flow for Neural Machine Translation](http://aclweb.org/anthology/N18-1117). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=12417301759540220817&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, and Ming Zhou. 2018. [Generative Bridging Network for Neural Sequence Prediction](http://aclweb.org/anthology/N18-1154). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=16479416225427738693): 3)

* Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. 2018. [The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation](http://aclweb.org/anthology/P18-1008). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=1960239321427735403&as_sdt=2005&sciodt=0,5&hl=en): 22)

* Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. 2018. [Neural Hidden Markov Model for Machine Translation](http://aclweb.org/anthology/P18-2060). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=13737032050194395214&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Jingjing Gong, Xipeng Qiu, Shaojing Wang, and Xuanjing Huang. 2018. [Information Aggregation via Dynamic Routing for Sequence Encoding](http://aclweb.org/anthology/C18-1232). In *COLING 2018*.

* Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li, and Jingbo Zhu. 2018. [Multi-layer Representation Fusion for Neural Machine Translation](http://aclweb.org/anthology/C18-1255). In *Proceedings of COLING 2018*. 

* Yachao Li, Junhui Li, and Min Zhang. 2018. [Adaptive Weighting for Neural Machine Translation](http://aclweb.org/anthology/C18-1257). In *Proceedings of COLING 2018*. 

* Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, and Tie-Yan Liu. 2018. [Double Path Networks for Sequence to Sequence Learning](http://aclweb.org/anthology/C18-1259). In *Proceedings of COLING 2018*. 

* Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Shuming Shi, and Tong Zhang. 2018. [Exploiting Deep Representations for Neural Machine Translation](http://aclweb.org/anthology/D18-1457). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=8760242283445305561&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Biao Zhang, Deyi Xiong, Jinsong Su, Qian Lin, and Huiji Zhang. 2018. [Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks](http://aclweb.org/anthology/D18-1459). In *Proceedings of EMNLP 2018*. 

* Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. [Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures](http://aclweb.org/anthology/D18-1458). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=8994080673363827758&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Ke Tran, Arianna Bisazza, and Christof Monz. 2018. [The Importance of Being Recurrent for Modeling Hierarchical Structure](http://aclweb.org/anthology/D18-1503). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=16387948292048936516&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Parnia Bahar, Christopher Brix, and Hermann Ney. 2018. [Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation](http://aclweb.org/anthology/D18-1335). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=4611047151878523903&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, and Tie-Yan Liu. 2018. [Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation](http://papers.nips.cc/paper/8019-layer-wise-coordination-between-encoder-and-decoder-for-neural-machine-translation.pdf). In *Proceedings of NeurIPS 2018*. ([Citation](https://scholar.google.com/scholar?cites=14258883426797488339&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Harshil Shah and David Barber. 2018. [Generative Neural Machine Translation](http://papers.nips.cc/paper/7409-generative-neural-machine-translation.pdf). In *Proceedings of NeurIPS 2018*. 

* Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. [Achieving Human Parity on Automatic Chinese to English News Translation](https://www.microsoft.com/en-us/research/uploads/prod/2018/03/final-achieving-human.pdf). Technical report. Microsoft AI & Research. ([Citation](https://scholar.google.com/scholar?cites=3670312788898741170&as_sdt=2005&sciodt=0,5&hl=en): 41)

* Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. 2019. [Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks](https://openreview.net/pdf?id=B1l6qiR5F7). In *Proceedings of ICLR 2019*.

* Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, and Michael Auli. 2019. [Pay Less Attention with Lightweight and Dynamic Convolutions](https://openreview.net/pdf?id=SkVhlh09tX). In *Proceedings of ICLR 2019*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=3358231780148394025): 1)

* Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser. 2019. [Universal Transformers](https://openreview.net/pdf?id=HyzdRiR9Y7). In *Proceedings of ICLR 2019*. ([Citation](https://scholar.google.com/scholar?cites=8443376534582904234&as_sdt=2005&sciodt=0,5&hl=en): 12)

* Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, and Tong Zhang. 2019. [Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement](https://arxiv.org/pdf/1902.05770.pdf). In *Proceedings of AAAI 2019*.

* Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/pdf/1901.02860). In *Proceedings of ACL 2019*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=7150055013029036741): 8) 

* Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. [Star-Transformer](https://arxiv.org/pdf/1902.09113.pdf). In *Proceedings of NAACL 2019*.

* Sho Takase and Naoaki Okazaki. 2019. [Positional Encoding to Control Output Sequence Length](https://arxiv.org/pdf/1904.07418.pdf). In *Proceedings of NAACL 2019*. 

* Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu, and Zhaopeng Tu. 2019. [Information Aggregation for Multi-Head Attention with Routing-by-Agreement](https://arxiv.org/pdf/1904.03100.pdf). In *Proceedings of NAACL 2019*.

* Baosong Yang, Longyue Wang, Derek Wong, Lidia S. Chao, and Zhaopeng Tu. 2019. [Convolutional Self-Attention Networks](https://arxiv.org/pdf/1904.03107.pdf). In *Proceedings of NAACL 2019*.

* Jie Hao, Xing Wang, Baosong Yang, Longyue Wang, Jinfeng Zhang, and Zhaopeng Tu. 2019. [Modeling Recurrence for Transformer](https://arxiv.org/pdf/1904.03092.pdf). In *Proceedings of NAACL 2019*.

* Nikolaos Pappas and James Henderson. 2019. [Deep Residual Output Layers for Neural Language Generation](https://arxiv.org/pdf/1905.05513.pdf). In *Proceedings of ICML 2019*.

* David R. So, Chen Liang, and Quoc V. Le. 2019. [The Evolved Transformer](https://arxiv.org/pdf/1901.11117). In *Proceedings of ICML 2019*.

* Ben Peters, Vlad Niculae, and André F.T. Martins. 2019. [Sparse Sequence-to-Sequence Models](https://arxiv.org/pdf/1905.05702). In *Proceedings of ACL 2019*.

* Roberto Dessì and Marco Baroni. 2019. [CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks](https://arxiv.org/pdf/1905.08527). In *Proceedings of ACL 2019*.

* Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. [Adaptive Attention Span in Transformers](https://arxiv.org/pdf/1905.07799). In *Proceedings of ACL 2019*.

* Yi Tay, Aston Zhang, Luu Anh Tuan, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, and Siu Cheung Hui. 2019. [Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks](https://arxiv.org/pdf/1906.04393). In *Proceedings of ACL 2019*. 

* Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, and Lidia S. Chao. 2019. [Learning Deep Transformer Models for Machine Translation](https://arxiv.org/pdf/1906.01787). In *Proceedings of ACL 2019*.

* Fengshun Xiao, Jiangtong Li, Hai Zhao, Rui Wang, and Kehai Chen. 2019. [Lattice-Based Transformer Encoder for Neural Machine Translation](https://arxiv.org/pdf/1906.01282). In *Proceedings of ACL 2019*.

* Matthias Sperber, Graham Neubig, Ngoc-Quan Pham, and Alex Waibel. 2019. [Self-Attentional Models for Lattice Inputs](https://arxiv.org/pdf/1906.01617). In *Proceedings of ACL 2019*.

* Xing Wang, Zhaopeng Tu, Longyue Wang, and Shuming Shi. 2019. [Exploiting Sentential Context for Neural Machine Translation](https://arxiv.org/pdf/1906.01268). In *Proceedings of ACL 2019*.

* Kris Korrel, Dieuwke Hupkes, Verna Dankers, and Elia Bruni. 2019. [Transcoding compositionally: using attention to find more generalizable solutions](https://arxiv.org/pdf/1906.01234). In *Proceedings of ACL 2019*.

* Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2019. [Depth Growing for Neural Machine Translation](https://arxiv.org/pdf/1907.01968). In *Proceedings of ACL 2019*.

* Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2019. [Deep Equilibrium Models](https://arxiv.org/pdf/1909.01377.pdf). In *Proceedings of NeurIPS 2019*.

* Gonçalo M. Correia, Vlad Niculae, and André F.T. Martins. 2019. [Adaptively Sparse Transformers](https://arxiv.org/pdf/1909.00015). In *Proceedings of EMNLP 2019*.

* Yau-Shian Wang, Hung-Yi Lee, and Yun-Nung Chen. 2019. [Tree Transformer: Integrating Tree Structures into Self-Attention](https://arxiv.org/pdf/1909.06639). In *Proceedings of EMNLP 2019*.

* Mingxuan Wang, Jun xie, Zhixing Tan, Jinsong Su, Deyi Xiong and Lei Li. 2019. [Towards Linear Time Neural Machine Translation with Capsule Networks](https://www.aclweb.org/anthology/D19-1074.pdf). In *Proceedings of EMNLP 2019*.

* Biao Zhang, Ivan Titov and Rico Sennrich. 2019. [Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention](https://www.aclweb.org/anthology/D19-1083.pdf). In *Proceedings of EMNLP 2019*.

* Jiatao Gu, Changhan Wang, Junbo Zhao. 2019. [Levenshtein Transformer](https://papers.nips.cc/paper/9297-levenshtein-transformer). In *Proceedings of NeurIPS 2019*.

* Xin Sheng, Linli Xu, Junliang Guo, Jingchang Liu, Ruoyu Zhao and Yinlong Xu. 2020. [IntroVNMT: An Introspective Model for Variational Neural Machine Translation](https://www.aaai.org/ojs/index.php/AAAI/article/view/6411/6267). In *Proceedings of AAAI 2020*.

* Jian Li, Xing Wang, Baosong Yang, Shuming Shi, Michael R. Lyu and Zhaopeng Tu. 2020. [Neuron Interaction Based Representation Composition for Neural Machine Translation](https://arxiv.org/abs/1911.09877). In *Proceedings of AAAI 2020*.

* Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen. 2020. [Encoding word order in complex embeddings](https://openreview.net/forum?id=Hke-WTVtwr). In *Proceedings of ICLR 2020*.

* Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya. 2020. [Reformer: The Efficient Transformer](https://openreview.net/forum?id=rkgNKkHtvB). In *Proceedings of ICLR 2020*.

* Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli. 2020. [Depth-Adaptive Transformer](https://openreview.net/forum?id=SJg7KhVKPH). In *Proceedings of ICLR 2020*.

* Ofir Press, Noah A. Smith and Omer Levy. 2020. [Improving Transformer Models by Reordering their Sublayers](https://arxiv.org/abs/1911.03864). In *Proceedings of ACL 2020*.

* Yekun Chai, Shuo Jin and Xinwen Hou. 2020. [Highway Transformer: Self-Gating Enhanced Self-Attentive Networks](https://arxiv.org/abs/2004.08178). In *Proceedings of ACL 2020*.

* Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng and Weihua Luo. 2020. [Multiscale Collaborative Deep Models for Neural Machine Translation](https://arxiv.org/abs/2004.14021). In *Proceedings of ACL 2020*.

* Hendra Setiawan, Matthias Sperber, Udhyakumar Nallasamy and Matthias Paulik. 2020. [Variational Neural Machine Translation with Normalizing Flows](http://arxiv.org/abs/2005.13978). In *Proceedings of ACL 2020*.

* Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, and Cho-Jui Hsieh. 2020. [Learning to Encode Position for Transformer with Continuous Dynamical Model](https://arxiv.org/abs/2003.09229). In *Proceedings of ICML 2020*.

* Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. 2020. [On Layer Normalization in the Transformer Architecture](https://arxiv.org/abs/2002.04745). In *Proceedings of ICML 2020*. 

* Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, and Julian McAuley. 2020. [ReZero is All You Need: Fast Convergence at Large Depth](https://arxiv.org/abs/2003.04887). *arXiv:2003.04887*.

* Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou, Zhengyuan Yang, Jie Zhou and Jiebo Luo. 2020. [A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.273/). In *Proceedings of ACL 2020*.

* Arya D. McCarthy, Xian Li, Jiatao Gu and Ning Dong. 2020. [Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.753/). In *Proceedings of ACL 2020*.

* Yong Wang, Longyue Wang, Victor Li, Zhaopeng Tu. 2020. [On the Sparsity of Neural Machine Translation Models](https://www.aclweb.org/anthology/2020.emnlp-main.78/). In *Proceedings of EMNLP 2020*.

* Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu. 2020. [Shallow-to-Deep Training for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.72/). In *Proceedings of EMNLP 2020*.

* Jianhao Yan, Fandong Meng, Jie Zhou. 2020. [Multi-Unit Transformers for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.77/). In *Proceedings of EMNLP 2020*.

* Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing, Weihua Luo. 2020. [Uncertainty-Aware Semantic Augmentation for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.216/). In *Proceedings of EMNLP 2020*.

* Xian Li, Asa Cooper Stickland, Yuqing Tang, Xiang Kong. 2020. [Deep Transformers with Latent Depth](https://papers.nips.cc/paper/2020/file/1325cdae3b6f0f91a1b629307bf2d498-Paper.pdf). In *Proceedings of NeurIPS 2020*.

* Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 2020. [Big Bird: Transformers for Longer Sequences](https://papers.nips.cc/paper/2020/file/c8512d142a2d849725f31a9a7a361ab9-Paper.pdf). In *Proceedings of NeurIPS 2020*.

* Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie Zhou, Dong Yu. 2020. [Token-level Adaptive Training for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.76/). In *Proceedings of EMNLP 2020*.

* Yufei Wang , Ian D. Wood , Stephen Wan , Mark Dras , Mark Johnson . 2021. Mention Flags (MF): [Mention Flags (MF): Constraining Transformer-based Text Generators](https://aclanthology.org/2021.acl-long.9/). In *Proceedings of ACL 2021* .

Attention Mechanism


* Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com.sg/scholar?cites=9430221802571417838&as_sdt=2005&sciodt=0,5&hl=en): 5,596)

* Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. [Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/pdf/1508.04025). In *Proceedings of EMNLP 2015*. ([Citation](https://scholar.google.com.sg/scholar?cites=12347446836257434866&as_sdt=2005&sciodt=0,5&hl=en): 1,466)

* Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, and Kenny Q. Zhu. 2016. [Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation](https://www.aclweb.org/anthology/C16-1290). In *Proceedings of COLING 2016*. ([Citation](https://scholar.google.com/scholar?&cites=1624882767342343496&as_sdt=2005&sciodt=0,5&hl=en): 18)

* Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. [Supervised Attentions for Neural Machine Translation](http://aclweb.org/anthology/D16-1249). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.sg/scholar?cites=16345118068023322142&as_sdt=2005&sciodt=0,5&hl=en): 43)

* Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. [A Structured 

Self-attentive Sentence Embedding](https://arxiv.org/abs/1703.03130). In *Proceedings of ICLR 2017*. ([Citation](https://scholar.google.com.sg/scholar?cites=3666844900655302515&as_sdt=2005&sciodt=0,5&hl=en): 216)

* Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. [DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding](https://arxiv.org/pdf/1709.04696.pdf). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=7311258646982866903&as_sdt=2005&sciodt=0,5&hl=en): 60)

* Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, and Chengqi Zhang. 2018. [Bi-directional Block Self-attention for Fast and Memory-efficient Sequence Modeling](https://arxiv.org/abs/1804.00857). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=7203374430207428965&as_sdt=2005&sciodt=0,5&hl=en): 13)

* Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, Chengqi Zhang. 2018.  [Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling](https://arxiv.org/abs/1801.10296). In *Proceedings of IJCAI 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=3809241292668177959&as_sdt=2005&sciodt=0,5&hl=en): 18)

* Peter Shaw, Jakob Uszkorei, and Ashish Vaswani. 2018. [Self-Attention with Relative Position Representations](http://aclweb.org/anthology/N18-2074). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=5563767891081728261&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Lesly Miculicich Werlen, Nikolaos Pappas, Dhananjay Ram, and Andrei Popescu-Belis. 2018. [Self-Attentive Residual Decoder for Neural Machine Translation](http://aclweb.org/anthology/N18-1124). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=10357155207431596394): 3)

* Xintong Li, Lemao Liu, Zhaopeng Tu, Shuming Shi, and Max Meng. 2018. [Target Foresight Based Attention for Neural Machine Translation](http://aclweb.org/anthology/N18-1125). In *Proceedings of NAACL 2018*. 

* Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. [Accelerating Neural Transformer via an Average Attention Network](http://aclweb.org/anthology/P18-1166). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=16436039193082710776&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Tobias Domhan. 2018. [How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures](http://aclweb.org/anthology/P18-1167). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=16338550517026915979&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, and Deyi Xiong. 2018. [Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings](http://aclweb.org/anthology/P18-1164). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=13357719581808108940&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Chaitanya Malaviya, Pedro Ferreira, and André F. T. Martins. 2018. [Sparse and Constrained Attention for Neural Machine Translation](http://aclweb.org/anthology/P18-2059). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=11257363334017043172&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. [Multi-Head Attention with Disagreement Regularization](http://aclweb.org/anthology/D18-1317). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=4230613606718109837&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Wei Wu, Houfeng Wang, Tianyu Liu and Shuming Ma.  2018. [Phrase-level Self-Attention Networks for Universal Sentence Encoding](http://aclweb.org/anthology/D18-1408). In *Proceedings of EMNLP 2018*. 

* Baosong Yang, Zhaopeng Tu, Derek F. Wong, Fandong Meng, Lidia S. Chao, and Tong Zhang. 2018. [Modeling Localness for Self-Attention Networks](https://arxiv.org/abs/1810.10182). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=16651306350908112709&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, and Qi Su. 2018. [Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation](http://aclweb.org/anthology/D18-1331). In *Proceedings of EMNLP 2018*. 

* Shiv Shankar, Siddhant Garg, and Sunita Sarawagi. 2018. [Surprisingly Easy Hard-Attention for Sequence to Sequence Learning](http://aclweb.org/anthology/D18-1065). In *Proceedings of EMNLP 2018*.

* Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. [Training Deeper Neural Machine Translation Models with Transparent Attention](http://aclweb.org/anthology/D18-1338). In *Proceedings of EMNLP 2018*. 

* Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, and Pascal Poupart. 2018. [Variational Attention for Sequence-to-Sequence Models](http://aclweb.org/anthology/C18-1142). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=1653411252630135531): 14)

* Maha Elbayad, Laurent Besacier, and Jakob Verbeek. 2018. [Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction](http://aclweb.org/anthology/K18-1010). In *Proceedings of CoNLL 2018*. ([Citation](https://scholar.google.com/scholar?cites=14016975442337015010&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, and Alexander M. Rush. 2018 [Latent Alignment and Variational Attention](https://papers.nips.cc/paper/8179-latent-alignment-and-variational-attention.pdf). In *Proceedings of NeurIPS 2018*. ([Citation](https://scholar.google.com/scholar?client=safari&rls=en&oe=UTF-8&um=1&ie=UTF-8&lr&cites=6335407498429393003))

* Wenpeng Yin and Hinrich Schütze. 2019. [Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms](https://arxiv.org/pdf/1710.00519). *Transactions of the Association for Computational Linguistics*.

* Shiv Shankar and Sunita Sarawagi. 2019. [Posterior Attention Models for Sequence to Sequence Learning](https://openreview.net/pdf?id=BkltNhC9FX). In *Proceedings of ICLR 2019*.

* Baosong Yang, Jian Li, Derek Wong, Lidia S. Chao, Xing Wang, and Zhaopeng Tu. 2019. [Context-Aware Self-Attention Networks](https://arxiv.org/pdf/1902.05766.pdf). In *Proceedings of AAAI 2019*.

* Reza Ghaeini, Xiaoli Z. Fern, Hamed Shahbazi, and Prasad Tadepalli. 2019. [Saliency Learning: Teaching the Model Where to Pay Attention](https://arxiv.org/pdf/1902.08649.pdf). In *Proceedings of NAACL 2019*.

* Sameen Maruf, André F. T. Martins, and Gholamreza Haffari. 2019. [Selective Attention for Context-aware Neural Machine Translation](https://arxiv.org/pdf/1903.08788.pdf). In *Proceedings of NAACL 2019*.

* Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. [Adaptive Attention Span in Transformers](https://arxiv.org/pdf/1905.07799). In *Proceedings of ACL 2019*.

* Kris Korrel, Dieuwke Hupkes, Verna Dankers, and Elia Bruni. 2019. [Transcoding compositionally: using attention to find more generalizable solutions](https://arxiv.org/pdf/1906.01234). In *Proceedings of ACL 2019*.

* Jesse Vig. 2019. [A Multiscale Visualization of Attention in the Transformer Model](https://arxiv.org/pdf/1906.05714). In *Proceedings of ACL 2019*.

* Sathish Reddy Indurthi, Insoo Chung, and Sangha Kim. 2019. [Look Harder: A Neural Machine Translation Model with Hard Attention](https://www.aclweb.org/anthology/P19-1290). In *Proceedings of ACL 2019*.

* Mingzhou Xu, Derek F. Wong, Baosong Yang, Yue Zhang, and Lidia S. Chao. 2019. [Leveraging Local and Global Patterns for Self-Attention Networks](https://www.aclweb.org/anthology/P19-1295). In *Proceedings of ACL 2019*.

* Sarthak Jain and Byron C. Wallace. 2019. [Attention is not Explanation](https://arxiv.org/pdf/1902.10186.pdf). In *Proceedings of NAACL 2019*.

* Sarah Wiegreffe and Yuval Pinter. 2019. [Attention is not not Explanation](https://arxiv.org/pdf/1908.04626). In *Proceedings of EMNLP 2019*.

* Xing Wang, Zhaopeng Tu, Longyue Wang, and Shuming Shi. 2019. [Self-Attention with Structural Position Representations](https://arxiv.org/pdf/1909.00383). In *Proceedings of EMNLP 2019*.

* Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, and Ruslan Salakhutdinov

. 2019. [Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel](https://arxiv.org/pdf/1908.11775). In *Proceedings of EMNLP 2019*.

* Kehai Chen, Rui Wang, Masao Utiyama and Eiichiro Sumita. 2019. [Recurrent Position Embedding for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1139/). In *Proceedings of EMNLP 2019*.

* Weiqiu You, Simeng Sun and Mohit Iyyer. 2020. [Hard-Coded Gaussian Attention for Neural Machine Translation](https://arxiv.org/abs/2005.00742). In *Proceedings of ACL 2020*.

* Emanuele Bugliarello and Naoaki Okazaki. 2020. [Enhancing Machine Translation with Dependency-Aware Self-Attention](http://arxiv.org/abs/1909.03149). In *Proceedings of ACL 2020*.

* Yann Dubois, Gautier Dagan, Dieuwke Hupkes, Elia Bruni. 2020. [Location Attention for Extrapolation to Longer Sequences](https://www.aclweb.org/anthology/2020.acl-main.39/). In *Proceedings of ACL 2020*.

* Michael Hahn. 2020. [Theoretical Limitations of Self-Attention in Neural Sequence Models](https://transacl.org/ojs/index.php/tacl/article/view/1815). *Transactions of the Association for Computational Linguistics*.

* Apoorv Vyas, Angelos Katharopoulos, François Fleuret. 2020. [Fast Transformers with Clustered Attention](https://papers.nips.cc/paper/2020/file/f6a8dd1c954c8506aadc764cc32b895e-Paper.pdf). In *Proceedings of NeurIPS 2020*. 

* Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar. 2020. [O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers](https://papers.nips.cc/paper/2020/file/9ed27554c893b5bad850a422c3538c15-Paper.pdf). In *Proceedings of NeurIPS 2020*.

* Yu Lu1 , Jiali Zeng , Jiajun Zhang , Shuangzhi Wu ,  Mu Li . 2021 . [Attention Calibration for Transformer in Neural Machine Translation](https://aclanthology.org/2021.acl-long.103.pdf) . In *Proceedings of ACL 2021*.

Open Vocabulary


* Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, and Yoshua Bengio. 2015. [Embedding Word Similarity with Neural Machine Translation](https://arxiv.org/pdf/1412.6448.pdf). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com.hk/scholar?cites=3941248209566557946&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Thang Luong, Ilya Sutskever, Quoc Le, Oriol Vinyals, and Wojciech Zaremba. 2015. [Addressing the Rare Word Problem in Neural Machine Translation](http://aclweb.org/anthology/P15-1002). In *Proceedings of ACL 2015*. ([Citation](https://scholar.google.com.hk/scholar?cites=1855379039969159341&as_sdt=2005&sciodt=0,5&hl=en): 367)

* Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. [On Using Very Large Target Vocabulary for Neural Machine Translation](http://www.aclweb.org/anthology/P15-1001). In *Proceedings of ACL 2015*. ([Citation](https://scholar.google.com.hk/scholar?cites=13222564911222792417&as_sdt=2005&sciodt=0,5&hl=en): 455)

* Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=1307964014330144942&as_sdt=2005&sciodt=0,5&hl=en): 795)

* Minh-Thang Luong and Christopher D. Manning. 2016. [Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models](http://aclweb.org/anthology/P16-1100). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=7652846715026310814&as_sdt=2005&sciodt=0,5&hl=en): 173)

* Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio. 2016. [A Character-level Decoder without Explicit Segmentation for Neural Machine Translation](http://aclweb.org/anthology/P16-1160). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=2193535701900882329&as_sdt=2005&sciodt=0,5&hl=en): 171)

* Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. [Fully Character-Level Neural Machine Translation without Explicit Segmentation](http://aclweb.org/anthology/Q17-1026). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=13463489320810094413&as_sdt=2005&sciodt=0,5&hl=en): 116)

* Yang Feng, Shiyue Zhang, Andi Zhang, Dong Wang, and Andrew Abel. 2017. [Memory-augmented Neural Machine Translation](http://aclweb.org/anthology/D17-1146). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=825727884820810695&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, and Jingbo Zhu. 2017. [Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation](http://aclweb.org/anthology/D17-1150). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=18313642653606285813&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Peyman Passban, Qun Liu, and Andy Way. 2018. [Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation](http://aclweb.org/anthology/N18-1006). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=13968879243228181963&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Huadong Chen, Shujian Huang, David Chiang, Xinyu Dai, and Jiajun Chen. 2018. [Combining Character and Word Information in Neural Machine Translation Using a Multi-Level Attention](http://aclweb.org/anthology/N18-1116). In *Proceedings of NAACL 2018*. 

* Frederick Liu, Han Lu, and Graham Neubig. 2018. [Handling Homographs in Neural Machine Translation](http://aclweb.org/anthology/N18-1121). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=8530214186708420865&as_sdt=2005&sciodt=0,5&hl=en): 8)

* Taku Kudo. 2018. [Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates](http://aclweb.org/anthology/P18-1007). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=10996996628614665108&as_sdt=2005&sciodt=0,5&hl=en): 17)

* Makoto Morishita, Jun Suzuki, and Masaaki Nagata. 2018. [Improving Neural Machine Translation by Incorporating Hierarchical Subword Features](http://aclweb.org/anthology/C18-1052). In *Proceedings of COLING 2018*. 

* Yang Zhao, Jiajun Zhang, Zhongjun He, Chengqing Zong, and Hua Wu. 2018. [Addressing Troublesome Words in Neural Machine Translation](http://aclweb.org/anthology/D18-1036). In *Proceedings of EMNLP 2018*. 

* Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, and Wolfgang Macherey. 2018. [Revisiting Character-Based Neural Machine Translation with Capacity and Compression](http://aclweb.org/anthology/D18-1461). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=1263295983934592415&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Rebecca Knowles and Philipp Koehn. 2018. [Context and Copying in Neural Machine Translation](http://aclweb.org/anthology/D18-1339). In *Proceedings of EMNLP 2018*.

* Matthias Huck, Viktor Hangya, and Alexander Fraser. 2019. [Better OOV Translation with Bilingual Terminology Mining](https://www.aclweb.org/anthology/P19-1581). In *Proceedings of ACL 2019*.

* Changhan Wang, Kyunghyun Cho, Jiatao Gu. 2020. [Neural Machine Translation with Byte-Level Subwords](https://arxiv.org/abs/1909.03341). In *Proceedings of AAAI 2020*

* Duygu Ataman, Wilker Aziz, Alexandra Birch. 2020. [A Latent Morphology Model for Open-Vocabulary Neural Machine Translation](https://openreview.net/forum?id=BJxSI1SKDH). In *Proceedings of ICLR 2020*.

* Xuanli He, Gholamreza Haffari and Mohammad Norouzi. 2020. [Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation](https://arxiv.org/abs/2005.06606). In *Proceedings of ACL 2020*.

* Yingqiang Gao, Nikola I. Nikolov, Yuhuang Hu and Richard H.R. Hahnloser. 2020. [Character-Level Translation with Self-attention](https://arxiv.org/abs/2004.14788). In *Proceedings of ACL 2020*.

* Ivan Provilkov, Dmitrii Emelianenko, Elena Voita. 2020. [BPE-Dropout: Simple and Effective Subword Regularization](https://arxiv.org/abs/1910.13267). In *Proceedings of ACL 2020*.

* Jindřich Libovický, Alexander Fraser. 2020. [Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems](https://www.aclweb.org/anthology/2020.emnlp-main.203/). In *Proceedings of EMNLP 2020*.

Training Objectives and Frameworks


* Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. [Sequence Level Training with Recurrent Neural Networks](https://arxiv.org/pdf/1511.06732). In *Proceedings of ICLR 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=4877899442083611721&as_sdt=2005&sciodt=0,5&hl=en): 373)  

* Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. [Multi-task Sequence to Sequence Learning](https://arxiv.org/pdf/1511.06114). In *Proceedings of ICLR 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=6045967109711129604&as_sdt=2005&sciodt=0,5&hl=en): 282) 

* Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. [Minimum Risk Training for Neural Machine Translation](http://aclweb.org/anthology/P16-1159). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=13568140432319924245&as_sdt=2005&sciodt=0,5&hl=en): 184)   

* Sam Wiseman and Alexander M. Rush. 2016. [Sequence-to-Sequence Learning as Beam-Search Optimization](http://aclweb.org/anthology/D16-1137). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=8919612243620131744&as_sdt=2005&sciodt=0,5&hl=en): 141)     

* Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma. 2016. [Dual Learning for Machine Translation](https://papers.nips.cc/paper/6469-dual-learning-for-machine-translation.pdf). In *Proceedings of NIPS 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=15841765927830550600&as_sdt=2005&sciodt=0,5&hl=en): 138)  

* Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. [An Actor-Critic Algorithm for Sequence Prediction](https://arxiv.org/pdf/1607.07086). In *Proceedings of ICLR 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=5228204938243984917&as_sdt=2005&sciodt=0,5&hl=en): 167)   

* Julia Kreutzer, Artem Sokolov, Stefan Riezler. 2017. [Bandit Structured Prediction for Neural Sequence-to-Sequence Learning](http://aclweb.org/anthology/P17-1138). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2303245646235792457,8131913197545815057): 11) 

* Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. [Dual Supervised Learning](https://arxiv.org/pdf/1707.00415.pdf). In *Proceedings of ICML 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=17907972833117899731): 29)  

* Yingce Xia, Jiang Bian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. [Dual Inference for Machine Learning](https://www.ijcai.org/proceedings/2017/0434.pdf). In *Proceedings of IJCAI 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=15405750739898389436): 9)

* Di He, Hanqing Lu, Yingce Xia, Tao Qin, Liwei Wang, and Tieyan Liu. 2017. [Decoding with Value Networks for Neural Machine Translation](http://papers.nips.cc/paper/6622-decoding-with-value-networks-for-neural-machine-translation.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=9924066051536654397&as_sdt=2005&sciodt=0,5&hl=en): 11)

* Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc’Aurelio Ranzato. 2018. [Classical Structured Prediction Losses for Sequence to Sequence Learning](http://aclweb.org/anthology/N18-1033). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=7858632228846408271&as_sdt=2005&sciodt=0,5&hl=en): 20)

* Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. [From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction](http://aclweb.org/anthology/P18-1155). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0,5&sciodt=0,5&cites=73472736706758753&scipsc=): 1)

* Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. [Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets](http://aclweb.org/anthology/N18-1122). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=14312548252804187966&as_sdt=2005&sciodt=0,5&hl=en): 43) 

* Kevin Clark, Minh-Thang Luong, Christopher D. Manning, and Quoc Le. 2018. [Semi-Supervised Sequence Modeling with Cross-View Training](http://aclweb.org/anthology/D18-1217). In *Proceedings of EMNLP 2018*.

* Lijun Wu, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. [A Study of Reinforcement Learning for Neural Machine Translation](http://aclweb.org/anthology/D18-1397). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9706797919793848294&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Semih Yavuz, Chung-Cheng Chiu, Patrick Nguyen, and Yonghui Wu. 2018. [CaLcs: Continuously Approximating Longest Common Subsequence for Sequence Level Optimization](http://aclweb.org/anthology/D18-1406). In *Proceedings of EMNLP 2018*.

* Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. [Learning to Teach with Dynamic Loss Functions](https://papers.nips.cc/paper/7882-learning-to-teach-with-dynamic-loss-functions.pdf). In *Proceedings of NeurIPS 2018*.

* Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019. [Multi-Agent Dual Learning](https://openreview.net/pdf?id=HyGhN2A5tm). In *Proceedings of ICLR 2019*.

* Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, and Lawrence Carin. 2019. [Improving Sequence-to-Sequence Learning via Optimal Transport](https://openreview.net/pdf?id=S1xtAjR5tX). In *Proceedings of ICLR 2019*.

* Sachin Kumar and Yulia Tsvetkov. 2019. [Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs](https://openreview.net/pdf?id=rJlDnoA5Y7). In *Proceedings of ICLR 2019*. 

* Xing Niu, Weijia Xu, and Marine Carpuat. 2019. [Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation](https://arxiv.org/pdf/1811.01116.pdf). In *Proceedings of NAACL 2019*.

* Weijia Xu, Xing Niu, and Marine Carpuat. 2019. [Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation](https://arxiv.org/pdf/1904.04079.pdf). In *Proceedings of NAACL 2019*.

* Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili, and Massimo Piccardi. [ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems](https://arxiv.org/pdf/1904.02461.pdf). In *Proceedings of NAACL 2019*.

* Reuben Cohn-Gordon and Noah Goodman. 2019. [Lost in Machine Translation: A Method to Reduce Meaning Loss](https://arxiv.org/pdf/1902.09514.pdf). In *Proceedings of NAACL 2019*.

* Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabas Poczos, and Tom M. Mitchell. 2019. [Competence-based Curriculum Learning for Neural Machine Translation](https://arxiv.org/pdf/1903.09848.pdf). In *Proceedings of NAACL 2019*.

* Gaurav Kumar, George Foster, Colin Cherry, and Maxim Krikun. 2019. [Reinforcement Learning based Curriculum Optimization for Neural Machine Translation](https://arxiv.org/pdf/1903.00041.pdf). In *Proceedings of NAACL 2019*.

* Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit. 2019. [Insertion Transformer: Flexible Sequence Generation via Insertion Operations](http://proceedings.mlr.press/v97/stern19a/stern19a.pdf). In *Proceedings of ICML 2019*.

* Laura Jehl, Carolin Lawrence, and Stefan Riezler. 2019. [Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss](https://arxiv.org/pdf/1907.03748). *Transactions of the Association for Computational Linguistics*.

* Motoki Sato, Jun Suzuki, and Shun Kiyono. 2019. [Effective Adversarial Regularization for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1020). In *Proceedings of ACL 2019*.

* Kehai Chen, Rui Wang, Masao Utiyama, and Eiichiro Sumita. 2019. [Neural Machine Translation with Reordering Embeddings](https://www.aclweb.org/anthology/P19-1174). In *Proceedings of ACL 2019*.

* Bram Bulte and Arda Tezcan. 2019. [Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation](https://www.aclweb.org/anthology/P19-1175). In *Proceedings of ACL 2019*.

* Mingming Yang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Min Zhang, and Tiejun Zhao. 2019. [Sentence-Level Agreement for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1296). In *Proceedings of ACL 2019*.

* Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. 2019. [Bridging the Gap between Training and Inference for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1426). In *Proceedings of ACL 2019*.

* John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel, and Graham Neubig. 2019. [Beyond BLEU:Training Neural Machine Translation with Semantic Similarity](https://www.aclweb.org/anthology/P19-1427). In *Proceedings of ACL 2019*.

* Zonghan Yang, Yong Cheng, Yang Liu, Maosong Sun. 2019. [Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach](https://www.aclweb.org/anthology/P19-1623). In *Proceedings of ACL 2019*.

* Kyra Yee, Nathan Ng, Yann N. Dauphin, and Michael Auli. 2019. [Simple and Effective Noisy Channel Modeling for Neural Machine Translation](https://arxiv.org/pdf/1908.05731). In *Proceedings of EMNLP 2019*.

* Sarthak Garg, Stephan Peitz, Udhyakumar Nallasamy, and Matthias Paulik. 2019. [Jointly Learning to Align and Translate with Transformer Models](https://arxiv.org/pdf/1909.02074). In *Proceedings of EMNLP 2019*.

* Tianchi Bi, Hao Xiong, Zhongjun He, Hua Wu and Haifeng Wang. 2019. [Multi-agent Learning for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1079.pdf). In *Proceedings of EMNLP 2019*.

* Zaixiang Zheng, Shujian Huang, Zhaopeng Tu, Xin-Yu Dai, and Jiajun Chen. 2019. [Dynamic Past and Future for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1086/). In *Proceedings of EMNLP 2019*.

* Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Cheng Xiang Zhai, Tie-Yan Liu. 2019. [Neural Machine Translation with Soft Prototype](https://papers.nips.cc/paper/8861-neural-machine-translation-with-soft-prototype). In *Proceedings of NeurIPS 2019*.

* Mingjun Zhao, Haijiang Wu, Di Niu and Xiaoli Wang. 2020. [Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models](https://sites.ualberta.ca/~dniu/Homepage/Publications_files/AAAI-ZhaoM.7640.pdf). In *Proceedings of AAAI 2020*.

* Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang and Dong Yu. 2020. [Modeling Fluency and Faithfulness for Diverse Neural Machine Translation](https://arxiv.org/abs/1912.00178). In *Proceedings of AAAI 2020*.

* Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend. 2020. [On the Weaknesses of Reinforcement Learning for Neural Machine Translation](https://openreview.net/forum?id=H1eCw3EKvH). In *Proceedings of ICLR 2020*.

* Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen. 2020. [Mirror-Generative Neural Machine Translation](https://openreview.net/forum?id=HkxQRTNYPH). In *Proceedings of ICLR 2020*.

* Angela Fan, Edouard Grave, Armand Joulin. 2020. [Reducing Transformer Depth on Demand with Structured Dropout](https://openreview.net/forum?id=SylO2yStDr). In *Proceedings of ICLR 2020*. 

* Yikai Zhou, Baosong Yang, Derek F. Wong, Yu Wan and Lidia S. Chao. 2020. [Uncertainty-Aware Curriculum Learning for Neural Machine Translation](https://arxiv.org/abs/1903.09848). In *Proceedings of ACL 2020*.

* Hongfei Xu, Josef van Genabith, Deyi Xiong and Qiuhui Liu. 2020. [Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change](https://arxiv.org/abs/2005.02008). In *Proceedings of ACL 2020*.

* Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong and Jingyi Zhang. 2020. [Lipschitz Constrained Parameter Initialization for Deep Transformers](https://arxiv.org/abs/1911.03179). In *Proceedings of ACL 2020*.

* Xintong Li, Lemao Liu, Rui Wang, Guoping Huang and Max Meng. 2020. [Regularized Context Gates on Transformer for Machine Translation](https://arxiv.org/abs/1908.11020). In *Proceedings of ACL 2020*.

* Sheng Shen, Zhewei Yao, Amir Gholami, Michael Mahoney, and Kurt Keutzer. 2020. [Rethinking Batch Normalization in Transformers](https://arxiv.org/abs/2003.07845). In *Proceedings of ICML 2020*.

* Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao. 2020. [Norm-Based Curriculum Learning for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.41/). In *Proceedings of ACL 2020*.

* Rongxiang Weng, Heng Yu, Xiangpeng Wei, Weihua Luo. 2020. [Towards Enhancing Faithfulness for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.212/). In *Proceedings of EMNLP 2020*. 

* Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen. 2020. [Self-Paced Learning for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.80/). In *Proceedings of EMNLP 2020*.

* Wenxiang Jiao, Xing Wang, Shilin He, Irwin King, Michael Lyu, Zhaopeng Tu. 2020. [Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.176/). In *Proceedings of EMNLP 2020*.

* Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie Zhou, Dong Yu. 2020. [Token-level Adaptive Training for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.76/). In *Proceedings of EMNLP 2020*.

* Xiao Pan, Mingxuan Wang, Liwei Wu, Lei Li. 2021. [Contrastive Learning for Many-to-many Multilingual Neural Machine Translation](https://aclanthology.org/2021.acl-long.21/). In *Proceedings of ACL 2021* .

* Zehui Lin , Liwei Wu , Mingxuan Wang, Lei Li . 2021. [Learning Language Specific Sub-netswork for Multilingual Machine Translation](https://aclanthology.org/2021.acl-long.25.pdf) . In *Proceedings of ACL 2021* .

Decoding


* Mingxuan Wang, Zhengdong Lu, Hang Li, and Qun Liu. 2016. [Memory-enhanced Decoder for Neural Machine Translation](http://aclweb.org/anthology/D16-1027). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=8953099567327192144&as_sdt=5,43&sciodt=0,43&hl=en): 30) 

* Shonosuke Ishiwatari, Jingtao Yao, Shujie Liu, Mu Li, Ming Zhou, Naoki Yoshinaga, Masaru Kitsuregawa, and Weijia Jia. 2017. [Chunk-based Decoder for Neural Machine Translation](http://aclweb.org/anthology/P17-1174). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=12622466792413888553&as_sdt=5,43&sciodt=0,43&hl=en): 4) 

* Hao Zhou, Zhaopeng Tu, Shujian Huang, Xiaohua Liu, Hang Li, and Jiajun Chen. 2017. [Chunk-Based Bi-Scale Decoder for Neural Machine Translation](http://aclweb.org/anthology/P17-2092). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=15037334213705032139&as_sdt=5,43&sciodt=0,43&hl=en): 6)

* Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, and Alex Smola. 2017. [Neural Machine Translation with Recurrent Attention Modeling](http://aclweb.org/anthology/E17-2061).  In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=5621977008323303060&as_sdt=5,43&sciodt=0,43&hl=en): 25)

* Markus Freitag and Yaser Al-Onaizan. 2017. [Beam Search Strategies for Neural Machine Translation](http://aclweb.org/anthology/W17-3207). In *Proceedings of the First Workshop on Neural Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=9963996198070293328&as_sdt=5,43&sciodt=0,43&hl=en): 14)

* Rajen Chatterjee, Matteo Negri, Marco Turchi, Marcello Federico, Lucia Specia, and Frédéric Blain. 2017. [Guiding Neural Machine Translation Decoding with External Knowledge](http://aclweb.org/anthology/W17-4716). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=16027327382881304751&as_sdt=5,43&sciodt=0,43&hl=en): 8)

* Cong Duy Vu Hoang, Gholamreza Haffari, and Trevor Cohn. 2017. [Towards Decoding as Continuous Optimisation in Neural Machine Translation](http://aclweb.org/anthology/D17-1014). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=3256665477810901088&as_sdt=5,43&sciodt=0,43&hl=en): 4)

* Yin-Wen Chang and Michael Collins. 2017. [Source-Side Left-to-Right or Target-Side Left-to-Right? An Empirical Comparison of Two Phrase-Based Decoding Algorithms](http://aclweb.org/anthology/D17-1157). In *Proceedings of EMNLP 2017*.

* Jiatao Gu, Kyunghyun Cho, and Victor O.K. Li. 2017. [Trainable Greedy Decoding for Neural Machine Translation](http://aclweb.org/anthology/D17-1210). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=8731447567218149379&as_sdt=2005&sciodt=0,5&hl=en): 18)

* Huda Khayrallah, Gaurav Kumar, Kevin Duh, Matt Post, and Philipp Koehn. 2017. [Neural Lattice Search for Domain Adaptation in Machine Translation](http://www.aclweb.org/anthology/I17-2004). In *Proceedings of IJCNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cluster=1478484647323458623&hl=zh-CN&as_sdt=0,5): 4)

* Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, and Noam Shazeer. 2018. [Fast Decoding in Sequence Models Using Discrete Latent Variables](https://arxiv.org/pdf/1803.03382.pdf). In *Proceedings of ICML 2018*. ([Citation](https://scholar.google.com/scholar?cites=4042994175439965815&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, and Hongji Wang. 2018. [Asynchronous Bidirectional Decoding for Neural Machine Translation](https://arxiv.org/pdf/1801.05122). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=8717464809531813198&as_sdt=2005&sciodt=0,5&hl=en): 10)

* Jiatao Gu, Daniel Jiwoong Im, and Victor O.K. Li. 2018. [Neural machine translation with gumbel-greedy decoding](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17299/16059). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com/scholar?cites=13306026917760415053&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Philip Schulz, Wilker Aziz, and Trevor Cohn. 2018. [A Stochastic Decoder for Neural Machine Translation](http://aclweb.org/anthology/P18-1115). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=2090499795836532737&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Raphael Shu and Hideki Nakayama. 2018. [Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation](http://aclweb.org/anthology/P18-2054). In *Proceedings of ACL 2018*.

* Junyang Lin, Xu Sun, Xuancheng Ren, Shuming Ma, Jinsong Su, and Qi Su. 2018. [Deconvolution-Based Global Decoding for Neural Machine Translation](http://aclweb.org/anthology/C18-1276). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=7984371866238647123&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Chunqi Wang, Ji Zhang, and Haiqing Chen. 2018. [Semi-Autoregressive Neural Machine Translation](http://aclweb.org/anthology/D18-1044). In *Proceedings of EMNLP 2018*.

* Xinwei Geng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2018. [Adaptive Multi-pass Decoder for Neural Machine Translation](http://aclweb.org/anthology/D18-1048). In *Proceedings of EMNLP 2018*.

* Wen Zhang, Liang Huang, Yang Feng, Lei Shen, and Qun Liu. 2018. [Speeding Up Neural Machine Translation Decoding by Cube Pruning](http://aclweb.org/anthology/D18-1460). In *Proceedings of EMNLP 2018*.

* Xinyi Wang, Hieu Pham, Pengcheng Yin, and Graham Neubig. 2018. [A Tree-based Decoder for Neural Machine Translation](http://aclweb.org/anthology/D18-1509). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9083843868999368969&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Chenze Shao, Xilin Chen, and Yang Feng. 2018. [Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation](http://aclweb.org/anthology/D18-1510). In *Proceedings of EMNLP 2018*.

* Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Hai Zhao. 2018. [Exploring Recombination for Efficient Decoding of Neural Machine Translation](http://aclweb.org/anthology/D18-1511). In *Proceedings of EMNLP 2018*.

* Jetic Gū, Hassan S. Shavarani, and Anoop Sarkar. 2018. [Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing](http://aclweb.org/anthology/D18-1037). In *Proceedings of EMNLP 2018*.

* Yilin Yang, Liang Huang, and Mingbo Ma. 2018. [Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation](http://aclweb.org/anthology/D18-1342). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=7003078853740771503&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Yun Chen, Victor O.K. Li, Kyunghyun Cho, and Samuel R. Bowman. 2018. [A Stable and Effective Learning Strategy for Trainable Greedy Decoding](http://aclweb.org/anthology/D18-1035). In *Proceedings of EMNLP 2018*.

* Wouter Kool, Herke van Hoof, and Max Welling. 2019. [Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement](http://proceedings.mlr.press/v97/kool19a/kool19a.pdf). In *Proceedings of ICML 2019*.

* Ashwin Kalyan, Peter Anderson, Stefan Lee, and Dhruv Batra. 2019. [Trainable Decoding of Sets of Sequences for Neural Sequence Models](http://proceedings.mlr.press/v97/kalyan19a/kalyan19a.pdf). In *Proceedings of ICML 2019*.  

* Eldan Cohen and Christopher Beck. 2019. [Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models](http://proceedings.mlr.press/v97/cohen19a/cohen19a.pdf). In *Proceedings of ICML 2019*. 

* Kartik Goyal, Chris Dyer, and Taylor Berg-Kirkpatrick. 2019. [An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search](https://arxiv.org/pdf/1904.06834.pdf). In *Proceedings of NAACL 2019*.

* Mingbo Ma, Renjie Zheng, and Liang Huang. 2019. [Learning to Stop in Structured Prediction for Neural Machine Translation](https://arxiv.org/pdf/1904.01032.pdf). In *Proceedings of NAACL 2019*.

* Han Fu, Chenghao Liu, and Jianling Sun. 2019. [Reference Network for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1287). In *Proceedings of ACL 2019*.

* Long Zhou, Jiajun Zhang, and Chengqing Zong. 2019. [Synchronous Bidirectional Neural Machine Translation](https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00256). *Transactions of the Association for Computational Linguistics*.

* Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, and Zhi-Hong Deng. 2019. [Fast Structured Decoding for Sequence Models](https://arxiv.org/pdf/1910.11555). In *Proceedings of NeurIPS 2019*.

* Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman and Kevin Gimpel. 2020. [ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation](http://arxiv.org/abs/2005.00850). In *Proceedings of ACL 2020*.

* Pinzhen Chen, Nikolay Bogoychev, Kenneth Heafield, and Faheem Kirefu. 2020. [Parallel Sentence Mining by Constrained Decoding](https://www.aclweb.org/anthology/2020.acl-main.152/). In *Proceedings of ACL 2020*.

* Julia Kreutzer, George Foster, Colin Cherry. 2020. [Inference Strategies for Machine Translation with Conditional Masking](https://www.aclweb.org/anthology/2020.emnlp-main.465/). In *Proceedings of EMNLP 2020*.

* Yuntian Deng, Alexander Rush. 2020. [Cascaded Text Generation with Markov Transformers](https://papers.nips.cc/paper/2020/file/01a0683665f38d8e5e567b3b15ca98bf-Paper.pdf). In *Proceedings of NeurIPS 2020*.

* Clara Meister, Ryan Cotterell, Tim Vieira. 2020. [If beam search is the answer, what was the question?](https://www.aclweb.org/anthology/2020.emnlp-main.170/). In *Proceedings of EMNLP 2020*.

* Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis. 2021. [Nearest Neighbor Machine Translation](https://openreview.net/pdf?id=7wCBOfJ8hJM). In *Proceedings of ICLR 2021*.

* Mathias Muller , Rico Sennrich. 2021. [Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation](https://aclanthology.org/2021.acl-long.22/). In *Proceedings of ACL 2021*.

* Hongfei Xu, Qiuhui Liu , Josef van Genabith , Deyi Xiong , Meng Zhang . 2021. [Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation](https://aclanthology.org/2021.acl-long.23/). In *Proceedings of ACL 2021*.

* Yang Feng , Shuhao Gu , Dengji Guo , Zhengxin Yang , Chenze Shao .2021. [Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation](https://aclanthology.org/2021.acl-long.223.pdf) .  In *Proceedings of ACL 2021*.

Low-resource Language Translation


* Rico Sennrich and Biao Zhang. 2019. [Revisiting Low-Resource Neural Machine Translation: A Case Study](https://arxiv.org/pdf/1905.11901). In *Proceedings of ACL 2019*. 

* Danni Liu , Jan Niehues , James Cross , Francisco Guzman , Xian Li . 2021. [Improving Zero-Shot Translation by Disentangling Positional Information](https://aclanthology.org/2021.acl-long.101.pdf) . In *Proceedings of ACL 2021*.

Semi-supervised Learning


* Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. [Improving Neural Machine Translation Models with Monolingual Data](https://arxiv.org/pdf/1511.06709). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=16647011114557315277&as_sdt=2005&sciodt=0,5&hl=en): 220)

* Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. [Semi-Supervised Learning for Neural Machine Translation](http://aclweb.org/anthology/P16-1185). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=4238720597816763796&as_sdt=2005&sciodt=0,5&hl=en): 59)

* Tobias Domhan and Felix Hieber. 2017. [Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning](http://aclweb.org/anthology/D17-1158). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=3638267208501348823&as_sdt=2005&sciodt=0,5&hl=en): 11)

* Anna Currey, Antonio Valerio Miceli Barone, and Kenneth Heafield. 2017. [Copied Monolingual Data Improves Low-Resource Neural Machine Translation](http://aclweb.org/anthology/W17-4715). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=5102771697654796737&as_sdt=2005&sciodt=0,5&hl=en): 14)

* Shuo Wang, Yang Liu, Chao Wang, Huanbo Luan, and Maosong Sun. 2019. [Improving Back-Translation with Uncertainty-based Confidence Estimation](https://arxiv.org/pdf/1909.00157). In *Proceedings of EMNLP 2019*.

* Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen. 2020. [Mirror-Generative Neural Machine Translation](https://openreview.net/forum?id=HkxQRTNYPH). In *Proceedings of ICLR 2020*.

Unsupervised Learning


* Nima Pourdamghani and Kevin Knight. 2017. [Deciphering Related Languages](http://aclweb.org/anthology/D17-1266). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=1168382888604094286&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. [Unsupervised Neural Machine Translation](https://openreview.net/pdf?id=Sy2ogebAW). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=6109181985493123662&as_sdt=2005&sciodt=0,5&hl=en): 78)

* Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2018. [Unsupervised Machine Translation Using Monolingual Corpora Only](https://openreview.net/pdf?id=rkYTTf-AZ). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=682955820897938264&as_sdt=2005&sciodt=0,5&hl=en): 78)

* Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. [Unsupervised Neural Machine Translation with Weight Sharing](http://aclweb.org/anthology/P18-1005). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=16608767535553803928&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2018. [Phrase-Based & Neural Unsupervised Machine Translation](http://aclweb.org/anthology/D18-1549). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=17725098892021008539&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Iftekhar Naim, Parker Riley, and Daniel Gildea. 2018. [Feature-Based Decipherment for Machine Translation](http://aclweb.org/anthology/J18-3006). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=17725098892021008539&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Jiawei Wu, Xin Wang, and William Yang Wang. 2019. [Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation](https://arxiv.org/pdf/1904.02331.pdf). In *Proceedings of NAACL 2019*.

* Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, and Jonathan May. 2019. [Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation](https://arxiv.org/pdf/1906.05683). In *Proceedings of ACL 2019*.

* Jiaming Luo, Yuan Cao, and Regina Barzilay. 2019. [Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B](https://arxiv.org/pdf/1906.06718). In *Proceedings of ACL 2019*.

* Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, and Tie-Yan Liu. 2019. [Unsupervised Pivot Translation for Distant Languages](https://www.aclweb.org/anthology/P19-1017). In *Proceedings of ACL 2019*.

* Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. [An Effective Approach to Unsupervised Machine Translation](https://www.aclweb.org/anthology/P19-1019). In *Proceedings of ACL 2019*.

* Viktor Hangya and Alexander Fraser. 2019. [Unsupervised Parallel Sentence Extraction with Parallel Segment Detection Helps Machine Translation](https://www.aclweb.org/anthology/P19-1119). In *Proceedings of ACL 2019*.

* Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, and Tiejun Zhao. 2019. [Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation](https://www.aclweb.org/anthology/P19-1119). In *Proceedings of ACL 2019*.

* Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor O.K. Li. 2019. [Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations](https://www.aclweb.org/anthology/P19-1121). In *Proceedings of ACL 2019*.

* Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2019. [Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders](https://www.aclweb.org/anthology/P19-1297). In *Proceedings of ACL 2019*.

* Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou and Shuai Ma. 2019. [Explicit Cross-lingual Pre-training for Unsupervised Machine Translation](https://www.aclweb.org/anthology/D19-1071.pdf). In *Proceedings of EMNLP 2019*.

* Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita and Tiejun Zhao. 2020. [Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation](https://arxiv.org/abs/2004.10171). In *Proceedings of ACL 2020*.

* Xiangyu Duan, Baijun Ji, Hao Jia, Min Tan, Min Zhang, Boxing Chen, Weihua Luo and Yue Zhang. 2020. [Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences](https://www.aclweb.org/anthology/2020.acl-main.143/). In *Proceedings of ACL 2020*.

* Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou and Shuai Ma. 2020. [A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.320/). In *Proceedings of ACL 2020*.

* Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser. 2020. [Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT](https://www.aclweb.org/anthology/2020.emnlp-main.214/). In *Proceedings of EMNLP 2020*.

* Jerin Philip, Alexandre Berard, Matthias Gallé, Laurent Besacier. 2020. [Monolingual Adapters for Zero-Shot Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.361/). In *Proceedings of EMNLP 2020*.

* Dana Ruiter, Josef van Genabith, Cristina España-Bonet. 2020. [Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.202/). In *Proceedings of EMNLP 2020*.

* Wei-Jen Ko , Ahmed El-Kishky , Adithya Renduchintala , Vishrav Chaudhary , Naman Goyal , Francisco Guzman , Pascale Fung , Philipp Koehn , Mona Diab . 2021 . [Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data](https://aclanthology.org/2021.acl-long.66/) . In *Proceedings of ACL 2021*.

Pivot-based Methods


* Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016. [Zero-Resource Translation with Multi-Lingual Neural Machine Translation](http://aclweb.org/anthology/D16-1026). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=9699063558012530354): 50)

* Hao Zheng, Yong Cheng, and Yang Liu. 2017. [Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation](http://nlp.csai.tsinghua.edu.cn/~ly/papers/ijcai2017_zh.pdf). In *Proceedings of IJCAI 2017*. ([Citation](https://scholar.google.com/scholar?cites=8742684674953684271&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. [Joint Training for Pivot-based Neural Machine Translation](http://nlp.csai.tsinghua.edu.cn/~ly/papers/ijcai2017_cy.pdf). In *Proceedings of IJCAI 2017*. ([Citation](https://scholar.google.com/scholar?cites=11174626133676084798&as_sdt=2005&sciodt=0,5&hl=en): 11) 

* Yun Chen, Yang Liu, Yong Cheng and Victor O.K. Li. 2017. [A Teacher-Student Framework for Zero-resource Neural Machine Translation](http://aclweb.org/anthology/P17-1176). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=13349008860652038472&as_sdt=2005&sciodt=0,5&hl=en): 15)

* Yun Chen, Yang Liu, and Victor O. K. Li. 2018. [Zero-Resource Neural Machine Translation with Multi-Agent Communication Game](https://arxiv.org/pdf/1802.03116). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com/scholar?cites=13902575159717479954&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, and Shuai Ma. 2018. [Triangular Architecture for Rare Language Translation](http://aclweb.org/anthology/P18-1006). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=10337098101101097173&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, and Hermann Ney. 2019. [Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages](https://arxiv.org/pdf/1909.09524). In *Proceedings of EMNLP 2019*.

Data Augmentation Methods


* Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. [Data Augmentation for Low-Resource Neural Machine Translation](http://aclweb.org/anthology/P17-2090). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=6141657859614474985&as_sdt=2005&sciodt=0,5&hl=en): 26)

* Marzieh Fadaee and Christof Monz. 2018. [Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation](http://aclweb.org/anthology/D18-1040). In *Proceedings of EMNLP 2018*. 

* Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. [Understanding Back-Translation at Scale](http://aclweb.org/anthology/D18-1045). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=5388849145974890035&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. [SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation](http://aclweb.org/anthology/D18-1100). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=3839046500027819595&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, and Graham Neubig. 2019. [Generalized Data Augmentation for Low-Resource Translation](https://arxiv.org/pdf/1906.03785). In *Proceedings of ACL 2019*.

* Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xueqi Cheng, and Tie-Yan Liu. 2019. [Soft Contextual Data Augmentation for Neural Machine Translation](https://arxiv.org/pdf/1905.10523). In *Proceedings of ACL 2019*.

* Chunting Zhou, Xuezhe Ma, Junjie Hu, and Graham Neubig. 2019. [Handling Syntactic Divergence in Low-resource Machine Translation](https://arxiv.org/pdf/1909.00040). In *Proceedings of EMNLP 2019*.  

* Yuanpeng Li, Liang Zhao, Jianyu Wang, and Joel Hestness. 2019. [Compositional Generalization for Primitive Substitutions](https://arxiv.org/pdf/1910.02612). In *Proceedings of EMNLP 2019*.  

* Guanlin Li, Lemao Liu, Guoping Huang, Conghui Zhu and Tiejun Zhao. 2019. [Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization](https://www.aclweb.org/anthology/D19-1570/). In *Proceedings of EMNLP 2019*.  

* Sergey Edunov, Myle Ott, Marc’Aurelio Ranzato and Michael Auli. 2020. [On The Evaluation of Machine Translation Systems Trained With Back-Translation](https://arxiv.org/abs/1908.05204). In *Proceedings of ACL 2020*.

* Aditya Siddhant, Ankur Bapna, Yuan Cao, Orhan Firat, Mia Chen, Sneha Kudugunta, Naveen Arivazhagan and Yonghui Wu. 2020. [Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation](https://arxiv.org/abs/2005.04816). In *Proceedings of ACL 2020*.

* Jitao XU, Josep Crego and Jean Senellart. 2020. [Boosting Neural Machine Translation with Similar Translations](https://www.aclweb.org/anthology/2020.acl-main.144/). In *Proceedings of ACL 2020*.

* Yong Cheng, Lu Jiang, Wolfgang Macherey and Jacob Eisenstein. 2020. [AdvAug: Robust Adversarial Augmentation for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.529/). In *Proceedings of ACL 2020*.

* Benjamin Marie, Raphael Rubino and Atsushi Fujita. 2020. [Tagged Back-translation Revisited: Why Does It Really Work?](https://www.aclweb.org/anthology/2020.acl-main.532/). In *Proceedings of ACL 2020*.

* Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn. 2020. [Simulated multiple reference training improves low-resource machine translation](https://www.aclweb.org/anthology/2020.emnlp-main.7/). In *Proceedings of EMNLP 2020*.

* Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo. 2020. [Iterative Domain-Repaired Back-Translation](https://www.aclweb.org/anthology/2020.emnlp-main.474/). In *Proceedings of EMNLP 2020*.

* Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, Ai Ti Aw. 2020. [Data Diversification: A Simple Strategy For Neural Machine Translation](https://papers.nips.cc/paper/2020/file/7221e5c8ec6b08ef6d3f9ff3ce6eb1d1-Paper.pdf). In *Proceedings of NeurIPS 2020*. 

* Christos Baziotis, Barry Haddow, Alexandra Birch. 2020. [Language Model Prior for Low-Resource Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.615/). In *Proceedings of EMNLP 2020*.

* Hieu Pham, Xinyi Wang, Yiming Yang, Graham Neubig. 2021. [Meta Back-Translation](https://openreview.net/pdf?id=3jjmdp7Hha). In *Proceedings of ICLR 2021*.

* M Saiful Bari , Tasnim Mohiuddin , and Shafiq Joty . 2021 . [UXLA: A Robust Unsupervised Data Augmentation Framework for Zero-Resource Cross-Lingual NLP](https://aclanthology.org/2021.acl-long.154.pdf) . In *Proceedings of ACL 2021*.

Data Selection Methods


* Marlies van der Wees, Arianna Bisazza and Christof Monz. 2017. [Dynamic Data Selection for Neural Machine Translation](http://aclweb.org/anthology/D17-1147). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.com/scholar?cites=2308754825624963103&as_sdt=2005&sciodt=0,5&hl=en): 16) 

* Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, and Ciprian Chelba. 2018. [Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection](http://aclweb.org/anthology/W18-6314). In *Proceedings of the Third Conference on Machine Translation*.

* Minh Quang Pham, Josep Crego, Jean Senellart, and François Yvon. 2018. [Fixing Translation Divergences in Parallel Corpora for Neural MT](http://aclweb.org/anthology/D18-1328). In *Proceedings of EMNLP 2018*.

* Xinyi Wang and Graham Neubig. 2019. [Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation](https://arxiv.org/pdf/1905.08212). In *Proceedings of ACL 2019*.

* Wei Wang, Isaac Caswell, and Ciprian Chelba. 2019. [Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1123). In *Proceedings of ACL 2019*.

* Dana Ruiter, Cristina España-Bonet, and Josef van Genabith. 2019. [Self-Supervised Neural Machine Translation](https://www.aclweb.org/anthology/P19-1178). In *Proceedings of ACL 2019*.

* Xabier Soto, Dimitar Shterionov, Alberto Poncelas and Andy Way. 2020. [Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation](http://arxiv.org/abs/2005.00308). In *Proceedings of ACL 2020*.

* Jiawei Zhou and Phillip Keung. 2020. [Improving Non-autoregressive Neural Machine Translation with Monolingual Data](https://arxiv.org/abs/2005.00932). In *Proceedings of ACL 2020*.

* Boliang Zhang, Ajay Nagesh and Kevin Knight. 2020. [Parallel Corpus Filtering via Pre-trained Language Models](https://www.aclweb.org/anthology/2020.acl-main.756/). In *Proceedings of ACL 2020*.

* Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig. 2020. [Dynamic Data Selection and Weighting for Iterative Back-Translation](https://www.aclweb.org/anthology/2020.emnlp-main.475/). In *Proceedings of EMNLP 2020*.

* Wenxiang Jiao , Xing Wang , Zhaopeng Tu , Shuming Shi , Michael R. Lyu , Irwin King . 2021 . [Self-Training Sampling with Monolingual Data Uncertainty for Neural Machine Translation](https://aclanthology.org/2021.acl-long.221.pdf)

Transfer Learning


* Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. [Transfer Learning for Low-Resource Neural Machine Translation](https://www.isi.edu/natural-language/mt/emnlp16-transfer.pdf). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?cites=10126416754494258051&as_sdt=2005&sciodt=0,5&hl=en): 104)

* Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O.K. Li. 2018. [Universal Neural Machine Translation for Extremely Low Resource Languages](http://aclweb.org/anthology/N18-1032). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=17858246967554922903&as_sdt=2005&sciodt=0,5&hl=en): 17)

* Tom Kocmi and Ondřej Bojar. 2018. [Trivial Transfer Learning for Low-Resource Neural Machine Translation](http://aclweb.org/anthology/W18-6325). In *Proceedings of the Third Conference on Machine Translation: Research Papers*.

* Boyuan Pan, Yazheng Yang, Hao Li, Zhou Zhao, Yueting Zhuang, Deng Cai, and Xiaofei He. 2018. [MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models](https://papers.nips.cc/paper/7848-macnet-transferring-knowledge-from-machine-comprehension-to-sequence-to-sequence-models.pdf). In *Proceedings of NeurIPS 2018*.

* Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. [Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies](https://www.aclweb.org/anthology/P19-1120). In *Proceedings of ACL 2019*.

* Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen and Weihua Luo. 2020. [Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation](https://arxiv.org/abs/1912.01214). In *Proceedings of AAAI 2020*.

* Alham Fikri Aji, Nikolay Bogoychev, Kenneth Heafield and Rico Sennrich. 2020. [In Neural Machine Translation, What Does Transfer Learning Transfer?](https://www.aclweb.org/anthology/2020.acl-main.688/). In *Proceedings of ACL 2020*.

* Mikel Artetxe, Gorka Labaka, Eneko Agirre. 2020. [Translation Artifacts in Cross-lingual Transfer Learning](https://www.aclweb.org/anthology/2020.emnlp-main.618/). In *Proceedings of EMNLP 2020*.

Meta Learning


* Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, and Victor O.K. Li. 2018. [Meta-Learning for Low-Resource Neural Machine Translation](http://aclweb.org/anthology/D18-1398). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=15276484097983678999&as_sdt=2005&sciodt=0,5&hl=en): 3)    

* Rumeng Li, Xun Wang and Hong Yu. 2020. [MetaMT,a MetaLearning Method Leveraging Multiple Domain Data for Low Resource Machine Translation](https://arxiv.org/abs/1912.05467). In *Proceedings of AAAI 2020*.

Multilingual Machine Translation


* Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. [Multi-Task Learning for Multiple Language Translation](http://aclweb.org/anthology/P15-1166). In *Proceedings of ACL 2015*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=6980356795259585193): 126) 

* Orhan Firat, Kyunghyun Cho and Yoshua Bengio. 2016. [Multi-way, Multilingual Neural Machine Translation with a Shared Attention Mechanism](https://arxiv.org/pdf/1601.01073.pdf). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=1297298716616390295&as_sdt=2005&sciodt=0,5&hl=en): 146) 

* Barret Zoph and Kevin Knight. 2016. [Multi-Source Neural Translation](https://arxiv.org/pdf/1601.00710.pdf). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=9798500345837394101&as_sdt=2005&sciodt=0,5&hl=en): 87) 

* Orhan Firat, Baskaran SanKaran, Yaser Al-Onaizan, Fatos T.Yarman Vural, Kyunghyun Cho. 2016. [Zero-Resource Translation with Multi-Lingual Neural Machine Translation](https://arxiv.org/pdf/1606.04164.pdf). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?cites=9699063558012530354&as_sdt=2005&sciodt=0,5&hl=en): 50)

* Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. [Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation](https://arxiv.org/pdf/1611.04558). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=12207392403413415154&as_sdt=2005&sciodt=0,5&hl=en): 297)

* Poorya Zaremoodi and Gholamreza Haffari. 2018. [Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach](http://aclweb.org/anthology/N18-1123). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=2302112873809678173&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Poorya Zaremoodi, Wray Buntine, and Gholamreza Haffari. 2018. [Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation](http://aclweb.org/anthology/P18-2104). In *Proceedings of ACL 2018*. 

* Surafel Melaku Lakew, Mauro Cettolo, and Marcello Federico. 2018. [A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation](http://aclweb.org/anthology/C18-1054). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com/scholar?cites=3404592318370335271&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Graeme Blackwood, Miguel Ballesteros, and Todd Ward. 2018. [Multilingual Neural Machine Translation with Task-Specific Attention](http://aclweb.org/anthology/C18-1263). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com/scholar?cites=2095693945870319009&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Devendra Singh Sachan and Graham Neubig. 2018. [Parameter Sharing Methods for Multilingual Self-Attentional Translation Models](http://aclweb.org/anthology/W18-6327). In *Proceedings of the Third Conference on Machine Translation: Research Papers*. 

* Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, and Tom Mitchell. 2018. [Contextual Parameter Generation for Universal Neural Machine Translation](http://aclweb.org/anthology/D18-1039). In *Proceedings of EMNLP 2018*. 

* Yining Wang, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing Zong. 2018. [Three Strategies to Improve One-to-Many Multilingual Translation](http://aclweb.org/anthology/D18-1326). In *Proceedings of EMNLP 2018*. 

* Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019. [Multilingual Neural Machine Translation with Knowledge Distillation](https://openreview.net/pdf?id=S1gUsoR9YX). In *Proceedings of ICLR 2019*. 

* Xinyi Wang, Hieu Pham, Philip Arthur, and Graham Neubig. 2019. [Multilingual Neural Machine Translation With Soft Decoupled Encoding](https://openreview.net/pdf?id=Skeke3C5Fm). In *Proceedings of ICLR 2019*.

* Maruan Al-Shedivat and Ankur P. Parikh. 2019. [Consistency by Agreement in Zero-shot Neural Machine Translation](https://arxiv.org/pdf/1904.02338.pdf). In *Proceedings of NAACL 2019*.

* Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. [Massively Multilingual Neural Machine Translation](https://arxiv.org/pdf/1903.00089.pdf). In *Proceedings of NAACL 2019*. 

* Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. [Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies](https://arxiv.org/pdf/1905.05475). In *Proceedings of ACL 2019*.

* Carlos Escolano, Marta R. Costa-Jussà, and José A. R. Fonollosa. 2019. [From Bilingual to Multilingual Neural Machine Translation by Incremental Training](https://arxiv.org/pdf/1907.00735). In *Proceedings of ACL 2019*.

* Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, and Chengqing Zong. 2019. [A Compact and Language-Sensitive Multilingual Translation Method](https://www.aclweb.org/anthology/P19-1117). In *Proceedings of ACL 2019*.

* Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2019. [Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders](https://www.aclweb.org/anthology/P19-1297). In *Proceedings of ACL 2019*.

* Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin, and Tie-Yan Liu. 2019. [Multilingual Neural Machine Translation with Language Clustering](https://arxiv.org/pdf/1908.09324). In *Proceedings of EMNLP 2019*.

* Sneha Reddy Kudugunta, Ankur Bapna, Isaac Caswell, Naveen Arivazhagan, and Orhan Firat. 2019. [Investigating Multilingual NMT Representations at Scale](https://arxiv.org/pdf/1909.02197). In *Proceedings of EMNLP 2019*.

* Ankur Bapna, Naveen Arivazhagan, and Orhan Firat. 2019. [Simple, Scalable Adaptation for Neural Machine Translation](https://arxiv.org/pdf/1909.08478). In *Proceedings of EMNLP 2019*. 

* Raj Dabre, Atsushi Fujita and Chenhui Chu. 2019. [Low-Resource Neural Machine Translation by Exploiting Multilingualism through Multi-Step Fine-Tuning Using N-way Parallel Corpora](https://www.aclweb.org/anthology/D19-1146/). In *Proceedings of EMNLP 2019*. 

* Xinyi Wang, Yulia Tsvetkov and Graham Neubig. 2020. [Balancing Training for Multilingual Neural Machine Translation](https://arxiv.org/abs/2004.06748). In *Proceedings of ACL 2020*.

* Biao Zhang, Philip Williams, Ivan Titov and Rico Sennrich. 2020. [Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation](https://arxiv.org/abs/2004.11867). In *Proceedings of ACL 2020*.

* Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita and Tiejun Zhao. 2020. [Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation](https://arxiv.org/abs/2004.10171). In *Proceedings of ACL 2020*.

* Aditya Siddhant, Ankur Bapna, Yuan Cao, Orhan Firat, Mia Chen, Sneha Kudugunta, Naveen Arivazhagan and Yonghui Wu. 2020. [Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation](https://arxiv.org/abs/2005.04816). In *Proceedings of ACL 2020*.

* Changfeng Zhu, Heng Yu, Shanbo Cheng and Weihua Luo. 2020. [Language-aware Interlingua for Multilingual Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.150/). In *Proceedings of ACL 2020*.

* Zehui Lin, Xiao Pan, Mingxuan Wang, Xipeng Qiu, Jiangtao Feng, Hao Zhou, Lei Li. 2020. [Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information](https://www.aclweb.org/anthology/2020.emnlp-main.210/). In *Proceedings of EMNLP 2020*.

* Sungwon Lyu, Bokyung Son, Kichang Yang, Jaekyoung Bae. 2020. [Revisiting Modularized Multilingual NMT to Meet Industrial Demands](https://www.aclweb.org/anthology/2020.emnlp-main.476/). In *Proceedings of EMNLP 2020*.

* Arturo Oncevay, Barry Haddow, Alexandra Birch. 2020. [Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations](https://www.aclweb.org/anthology/2020.emnlp-main.187/). In *Proceedings of EMNLP 2020*.

* Yiren Wang, ChengXiang Zhai, Hany Hassan. 2020. [Multi-task Learning for Multilingual Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.75/). In *Proceedings of EMNLP 2020*.

* Raj Dabre, Chenhui Chu, Anoop Kunchukuttan. 2020. [A Comprehensive Survey of Multilingual Neural Machine Translation](https://arxiv.org/abs/2001.01115). *arXiv:2001.01115*.

* Biao Zhang, Ankur Bapna, Rico Sennrich, Orhan Firat. 2021. [Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation](https://openreview.net/pdf?id=Wj4ODo0uyCF). In *Proceedings of ICLR 2021*.

Prior Knowledge Integration


 Word/Phrase Constraints 


* Wei He, Zhongjun He, Hua Wu, and Haifeng Wang. 2016. [Improved nerual machine translation with SMT features](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12189/11577). In *Proceedings of AAAI 2016*. ([Citation](https://scholar.google.com/scholar?cites=11596393526530282899&as_sdt=2005&sciodt=0,5&hl=en): 46)

* Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. [Vocabulary Manipulation for Neural Machine Translation](http://anthology.aclweb.org/P16-2021). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=10504291626587983597&as_sdt=2005&sciodt=0,5&hl=en): 36)

* Philip Arthur, Graham Neubig, and Satoshi Nakamura. 2016. [Incorporating Discrete Translation Lexicons into Neural Machine Translation](http://aclweb.org/anthology/D16-1162). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?cites=3629816068189607565&as_sdt=2005&sciodt=0,5&hl=en): 55)

* Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang. 2017. [Neural Machine Translation Advised by Statistical Machine Translation](https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/viewPaper/14451). In *Proceedings of AAAI 2016*. ([Citation](https://scholar.google.com/scholar?cites=9788492799819599206&as_sdt=2005&sciodt=0,5&hl=en): 34)

* Jiacheng Zhang, Yang Liu, Huanbo Luan, Jingfang Xu and Maosong Sun. 2017. [Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization](http://aclweb.org/anthology/P17-1139). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=16820322563543305280&as_sdt=2005&sciodt=0,5&hl=en): 13)

* Chris Hokamp and Qun Liu. 2017. [Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search](http://aclweb.org/anthology/P17-1141). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=3629816068189607565&as_sdt=2005&sciodt=0,5&hl=en): 19)

* Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, and Alex Smola. 2017. [Neural Machine Translation with Recurrent Attention Modeling](http://aclweb.org/anthology/E17-2061).  In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=5621977008323303060&as_sdt=2005&sciodt=0,5&hl=en): 25)

* Ofir Press and Lior Wolf. 2017. [Using the Output Embedding to Improve Language Models](http://aclweb.org/anthology/E17-2025). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=3142797974561089298&as_sdt=2005&sciodt=0,5&hl=en): 127)

* Rajen Chatterjee, Matteo Negri, Marco Turchi, Marcello Federico, Lucia Specia, and Frédéric Blain. 2017. [Guiding Neural Machine Translation Decoding with External Knowledge](http://aclweb.org/anthology/W17-4716). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=16027327382881304751&as_sdt=2005&sciodt=0,5&hl=en): 8)

* Rongxiang Weng, Shujian Huang, Zaixiang Zheng, Xinyu Dai, and Jiajun Chen. 2017. [Neural Machine Translation with Word Predictions](http://aclweb.org/anthology/D17-1013). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=9033034245087042151&as_sdt=2005&sciodt=0,5&hl=en): 8)

* Yang Feng, Shiyue Zhang, Andi Zhang, Dong Wang, and Andrew Abel. 2017. [Memory-augmented Neural Machine Translation](http://aclweb.org/anthology/D17-1146). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=825727884820810695&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Leonard Dahlmann, Evgeny Matusov, Pavel Petrushkov, and Shahram Khadivi. 2017. [Neural Machine Translation Leveraging Phrase-based Models in A Hybrid Search](http://aclweb.org/anthology/D17-1148). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=4507716603851611885&as_sdt=2005&sciodt=0,5&hl=en): 11)

* Xing Wang, Zhaopeng Tu, Deyi Xiong, and Min Zhang. 2017. [Translating Phrases in Neural Machine Translation](http://aclweb.org/anthology/D17-1149). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=13251445351500921697&as_sdt=2005&sciodt=0,5&hl=en): 15)

* Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, and Jingbo Zhu. 2017. [Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation](http://aclweb.org/anthology/D17-1150). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=18313642653606285813&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Po-Sen Huang, Chong Wang, Sitao Huang, Dengyong Zhou, and Li Deng. 2018. [Towards Neural Phrase-based Machine Translation](https://openreview.net/pdf?id=HktJec1RZ). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com/scholar?cites=14839462711165509564&as_sdt=2005&sciodt=0,5&hl=en): 15)

* Toan Nguyen and David Chiang. 2018. [Improving Lexical Choice in Neural Machine Translation](http://aclweb.org/anthology/N18-1031). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=8911122350121698073&as_sdt=2005&sciodt=0,5&hl=en): 8)

* Huadong Chen, Shujian Huang, David Chiang, Xinyu Dai, and Jiajun Chen. 2018. [Combining Character and Word Information in Neural Machine Translation Using a Multi-Level Attention](http://aclweb.org/anthology/N18-1116). In *Proceedings of NAACL 2018*. 

* Matt Post and David Vilar. 2018. [Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation](http://aclweb.org/anthology/N18-1119). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=3504623917475500888&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, and Satoshi Nakamura. 2018. [Guiding Neural Machine Translation with Retrieved Translation Pieces](http://aclweb.org/anthology/N18-1120). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=9376584188557423045&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Eva Hasler, Adrià de Gispert, Gonzalo Iglesias, and Bill Byrne. 2018. [Neural Machine Translation Decoding with Terminology Constraints](http://aclweb.org/anthology/N18-2081). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=17574582694557390759&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Nima Pourdamghani, Marjan Ghazvininejad, and Kevin Knight. 2018. [Using Word Vectors to Improve Word Alignments for Low Resource Machine Translation](http://aclweb.org/anthology/N18-2083). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=936856152380506206&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Shuming Ma, Xu SUN, Yizhong Wang, and Junyang Lin. 2018. [Bag-of-Words as Target for Neural Machine Translation](http://aclweb.org/anthology/P18-2053). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=4656961594972480096&as_sdt=2005&sciodt=0,5&hl=en): 10)

* Mingxuan Wang, Jun Xie, Zhixing Tan, Jinsong Su, Deyi Xiong, and Chao Bian. 2018. [Neural Machine Translation with Decoding-History Enhanced Attention](http://aclweb.org/anthology/C18-1124). In *Proceedings of COLING 2018*. 

* Arata Ugawa, Akihiro Tamura, Takashi Ninomiya, Hiroya Takamura, and Manabu Okumura. 2018. [Neural Machine Translation Incorporating Named Entity](http://aclweb.org/anthology/C18-1274). In *Proceedings of COLING 2018*. 

* Longyue Wang, Zhaopeng Tu, Andy Way, and Qun Liu. 2018. [Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism](http://aclweb.org/anthology/D18-1333). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=7240636423092684747&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Qian Cao and Deyi Xiong. 2018. [Encoding Gated Translation Memory into Neural Machine Translation](http://aclweb.org/anthology/D18-1340). In *Proceedings of EMNLP 2018*.

* Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2018. [FRAGE: Frequency-Agnostic Word Representation](https://arxiv.org/pdf/1809.06858). In *Proceedings of NeurIPS 2018*. ([Citation](https://scholar.google.com/scholar?cites=899516517229807927&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili, and Massimo Piccardi. [ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems](https://arxiv.org/pdf/1904.02461.pdf). In *Proceedings of NAACL 2019*.

* Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, and Min Zhang. 2019. [Code-Switching for Enhancing NMT with Pre-Specified Translation](https://www.aclweb.org/anthology/N19-1044). In *Proceedings of NAACL 2019*.

* Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, and Jingbo Zhu. 2019. [Shared-Private Bilingual Word Embeddings for Neural Machine Translation](https://arxiv.org/pdf/1906.03100). In *Proceedings of ACL 2019*.

* Georgiana Dinu, Prashant Mathur, Marcello Federico, and Yaser Al-Onaizan. 2019. [Training Neural Machine Translation to Apply Terminology Constraints](https://www.aclweb.org/anthology/P19-1294). In *Proceedings of ACL 2019*.

* Longyue Wang, Zhaopeng Tu, Xing Wang and Shuming Shi. 2019. [One Model to Learn Both: Zero Pronoun Prediction and Translation](https://www.aclweb.org/anthology/D19-1085.pdf). In *Proceedings of EMNLP 2019*

* Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu. 2020. [Incorporating BERT into Neural Machine Translation](https://openreview.net/forum?id=Hyl7ygStwB). In *Proceedings of ICLR 2020*.

* Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu and Jingyi Zhang. 2020. [Learning Source Phrase Representations for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.37/). In *Proceedings of ACL 2020*.

* Raymond Hendy Susanto, Shamil Chollampatt and Liling Tan. 2020. [Lexically Constrained Neural Machine Translation with Levenshtein Transformer](https://arxiv.org/abs/2004.12681). In *Proceedings of ACL 2020*.

* Thomas Zenkel, Joern Wuebker and John DeNero. 2020. [End-to-End Neural Word Alignment Outperforms GIZA++](https://www.aclweb.org/anthology/2020.acl-main.146/). In *Proceedings of ACL 2020*.

* Liang Ding, Longyue Wang and Dacheng Tao. 2020. [Self-Attention with Cross-Lingual Position Representation](https://www.aclweb.org/anthology/2020.acl-main.153/). In *Proceedings of ACL 2020*.

* Marion Weller-Di Marco and Alexander Fraser. 2020. [Modeling Word Formation in English–German Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.389/). In *Proceedings of ACL 2020*.

* Yun Chen, Yang Liu, Guanhua Chen, Xin Jiang, Qun Liu. 2020. [Accurate Word Alignment Induction from Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.42/). In *Proceedings of EMNLP 2020*.

* Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen. 2020. [Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses](https://www.aclweb.org/anthology/2020.emnlp-main.177/). In *Proceedings of EMNLP 2020*.

* Jingyi Zhang , Josef van Genabith . [A Bidirectional Transformer Based Alignment Model for Unsupervised Word Alignment](https://aclanthology.org/2021.acl-long.24.pdf) . In *Proceedings of ACL 2021*.

 Syntactic/Semantic Constraints 


* Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vymolova, Kaisheng Yao, Chris Dyer, and Gholamreza Haffari. 2016. [Incorporating Structural Alignment Biases into an Attentional Neural Translation Model](https://arxiv.org/pdf/1601.01085.pdf). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=6876101136632328854&as_sdt=2005&sciodt=0,5&hl=en): 80)

* Yong Cheng, Shiqi Shen, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. [Agreement-based Joint Training for Bidirectional Attention-based Neural Machine Translation](http://nlp.csai.tsinghua.edu.cn/~ly/papers/ijcai16_agree.pdf). In *Proceedings of IJCAI 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=7726998929707665947&as_sdt=2005&sciodt=0,5&hl=en): 26)

* Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. [Tree-to-Sequence Attentional Neural Machine Translation](http://aclweb.org/anthology/P16-1078). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=10114639659174243367&as_sdt=2005&sciodt=0,5&hl=en): 79)

* Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne. 2016. [Syntactically Guided Neural Machine Translation](http://anthology.aclweb.org/P16-2049). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=11012034683105038430): 32)

* Xing Shi, Inkit Padhi, and Kevin Knight. 2016. [Does string-based neural MT learn source syntax?](http://aclweb.org/anthology/D16-1159). In *Proceedings of the EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13782051589621719871): 57)

* Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. [Modeling Source Syntax for Neural Machine Translation](http://aclweb.org/anthology/P17-1064). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=4418568278013664001&as_sdt=2005&sciodt=0,5&hl=en): 30)

* Shuangzhi Wu, Dongdong Zhang, Nan Yang, Mu Li, and Ming Zhou. 2017. [Sequence-to-Dependency Neural Machine Translation](http://aclweb.org/anthology/P17-1065). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=13183481097489234938&as_sdt=2005&sciodt=0,5&hl=en): 19)

* Jinchao Zhang, Mingxuan Wang, Qun Liu, and Jie Zhou. 2017. [Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation](http://aclweb.org/anthology/P17-1140). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=9939097556529491198&as_sdt=2005&sciodt=0,5&hl=en): 8)

* Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. [Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder](http://aclweb.org/anthology/P17-1177). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=17162498190462264248&as_sdt=2005&sciodt=0,5&hl=en): 32)

* Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. [Learning to Parse and Translate Improves Neural Machine Translation](http://aclweb.org/anthology/P17-2012). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=17499695818526131085&as_sdt=2005&sciodt=0,5&hl=en): 29)

* Roee Aharoni and Yoav Goldberg. 2017. [Towards String-To-Tree Neural Machine Translation](http://aclweb.org/anthology/P17-2021). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=13743835036381505969&as_sdt=2005&sciodt=0,5&hl=en): 45)

* Kazuma Hashimoto and Yoshimasa Tsuruoka. 2017. [Neural Machine Translation with Source-Side Latent Graph Parsing](http://aclweb.org/anthology/D17-1012). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=2595733316497621779&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Simaan. 2017. [Graph Convolutional Encoders for Syntax-aware Neural Machine Translation](http://aclweb.org/anthology/D17-1209). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=4876389727678322394&as_sdt=2005&sciodt=0,5&hl=en): 31)

* Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, and Tiejun Zhao. 2017. [Neural Machine Translation with Source Dependency Representation](http://aclweb.org/anthology/D17-1304). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=3839215870693368887&as_sdt=2005&sciodt=0,5&hl=en): 7)

* Peyman Passban, Qun Liu, and Andy Way. 2018. [Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation](http://aclweb.org/anthology/N18-1006). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=13968879243228181963&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Diego Marcheggiani, Joost Bastings, and Ivan Titov. 2018. [Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks](http://aclweb.org/anthology/N18-2078). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9319609055086898131&as_sdt=2005&sciodt=0,5&hl=en): 7)

* Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, and Eiichiro Sumita. 2018. [Forest-Based Neural Machine Translation](http://aclweb.org/anthology/P18-1116). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=8184521634220071433&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, and Deyi Xiong. 2018. [Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings](http://aclweb.org/anthology/P18-1164). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=13357719581808108940&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Duygu Ataman and Marcello Federico. 2018. [Compositional Representation of Morphologically-Rich Input for Neural Machine Translation](http://aclweb.org/anthology/P18-2049). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=12939556873639208603&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Daniel Beck, Gholamreza Haffari, and Trevor Cohn. 2018. [Graph-to-Sequence Learning using Gated Graph Neural Networks](http://aclweb.org/anthology/P18-1026). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.au/scholar?cites=12197496840503693067&as_sdt=2005&sciodt=0,5&hl=en):3)

* Danielle Saunders, Felix Stahlberg, Adrià de Gispert, and Bill Byrne. 2018. [Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT](http://aclweb.org/anthology/P18-2051). In *Proceedings of ACL 2018*.

* Wen Zhang, Jiawei Hu, Yang Feng, and Qun Liu. 2018. [Refining Source Representations with Relation Networks for Neural Machine Translation](http://aclweb.org/anthology/C18-1110). In *Proceedings of COLING 2018*. 

* Poorya Zaremoodi and Gholamreza Haffari. 2018. [Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model](http://aclweb.org/anthology/C18-1120). In *Proceedings of COLING 2018*.

* Hao Zhang, Axel Ng, and Richard Sproat. 2018. [Fast and Accurate Reordering with ITG Transition RNN](http://aclweb.org/anthology/C18-1123). In *Proceedings of COLING 2018*.  

* Jetic Gū, Hassan S. Shavarani, and Anoop Sarkar. 2018. [Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing](http://aclweb.org/anthology/D18-1037). In *Proceedings of EMNLP 2018*.

* Anna Currey and Kenneth Heafield. 2018. [Multi-Source Syntactic Neural Machine Translation](http://aclweb.org/anthology/D18-1327). In *Proceedings of EMNLP 2018*.

* Xinyi Wang, Hieu Pham, Pengcheng Yin, and Graham Neubig. 2018. [A Tree-based Decoder for Neural Machine Translation](http://aclweb.org/anthology/D18-1509). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9083843868999368969&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Eliyahu Kiperwasser and Miguel Ballesteros. 2018. [Scheduled Multi-Task Learning: From Syntax to Translation](http://aclweb.org/anthology/Q18-1017). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=7224616032403591303&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Xiao Pu, Nikolaos Pappas, James Henderson, and Andrei Popescu-Belis. 2018. [Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation](https://www.aclweb.org/anthology/Q18-1044). *Transactions of the Association for Computational Linguistics*.

* Kai Song, Yue Zhang, Min Zhang, and Weihua Luo. 2018. [Improved English to Russian Translation by Neural Suffix Prediction](https://arxiv.org/pdf/1801.03615). In *Proceedings of AAAI 2018*.

* Rudra Murthy V, Anoop Kunchukuttan, and Pushpak Bhattacharyya. 2019. [Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages](https://arxiv.org/pdf/1811.00383.pdf). In *Proceedings of NAACL 2019*.

* Meishan Zhang, Zhenghua Li, Guohong Fu, and Min Zhang. 2019. [Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations](https://arxiv.org/pdf/1905.02878). In *Proceedings of NAACL 2019*.

* Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, and Jinsong Su. 2019. [Semantic Neural Machine Translation Using AMR](https://www.aclweb.org/anthology/Q19-1002). *Transactions of the Association for Computational Linguistics*.

* Nader Akoury, Kalpesh Krishna, and Mohit Iyyer. 2019. [Syntactically Supervised Transformers for Faster Neural Machine Translation](https://arxiv.org/pdf/1906.02780). In *Proceedings of ACL 2019*.

* Zhijiang Guo, Yan Zhang, Zhiyang Teng, and Wei Lu. 2019. [Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning](https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00269). *Transactions of the Association for Computational Linguistics*.

* Xuewen Yang, Yingru Liu, Dongliang Xie, Xin Wang, and Niranjan Balasubramanian. 2019. [Latent Part-of-Speech Sequences for Neural Machine Translation](https://arxiv.org/pdf/1908.11782). In *Proceedings of EMNLP 2019*.

* Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, and Zhaopeng Tu. 2019. [Multi-Granularity Self-Attention for Neural Machine Translation](https://arxiv.org/pdf/1909.02222). In *Proceedings of EMNLP 2019*.

* Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, and Zhaopeng Tu. 2019. [Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons](https://arxiv.org/pdf/1909.01562). In *Proceedings of EMNLP 2019*.

* KayYen Wong, Sameen Maruf and Gholamreza Haffari. 2020. [Contextual Neural Machine Translation Improves Translation of Cataphoric Pronouns](https://arxiv.org/abs/2004.09894). In *Proceedings of ACL 2020*.

* Emanuele Bugliarello and Naoaki Okazaki. 2020. [Enhancing Machine Translation with Dependency-Aware Self-Attention](http://arxiv.org/abs/1909.03149). In *Proceedings of ACL 2020*.

* Kehai Chen, Rui Wang, Masao Utiyama, and Eiichiro Sumita. 2020. [Content Word Aware Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.34/). In *Proceedings of ACL 2020*.

* Jian Yang, Shuming Ma, Dongdong Zhang, Zhoujun Li and Ming Zhou. 2020. [Improving Neural Machine Translation with Soft Template Prediction](https://www.aclweb.org/anthology/2020.acl-main.531/). In *Proceedings of ACL 2020*.

* R. Thomas McCoy, Robert Frank and Tal Linzen. 2020. [Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks](https://transacl.org/ojs/index.php/tacl/article/view/1892). *Transactions of the Association for Computational Linguistics*.

Coverage Constraints


* Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. [Modeling Coverage for Neural Machine Translation](http://aclweb.org/anthology/P16-1008). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=894656013823838967&as_sdt=2005&sciodt=0,5&hl=en): 236)

* Haitao Mi, Baskaran Sankaran, Zhiguo Wang, and Abe Ittycheriah. 2016. [Coverage Embedding Models for Neural Machine Translation](http://aclweb.org/anthology/D16-1096). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=10478809182142146899&as_sdt=2005&sciodt=0,5&hl=en): 59)

* Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, and Hang Li. 2017. [Context Gates for Neural Machine Translation](http://aclweb.org/anthology/Q17-1007). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=4217513324479200768&as_sdt=2005&sciodt=0,5&hl=en): 36)

* Yanyang Li, Tong Xiao, Yinqiao Li, Qiang Wang, Changming Xu, and Jingbo Zhu. 2018. [A Simple and Effective Approach to Coverage-Aware Neural Machine Translation](http://aclweb.org/anthology/P18-2047). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9588245142858602659&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Zaixiang Zheng, Hao Zhou, Shujian Huang, Lili Mou, Xinyu Dai, Jiajun Chen, and Zhaopeng Tu. 2018. [Modeling Past and Future for Neural Machine Translation](https://aclanthology.coli.uni-saarland.de/events/tacl-2018). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=3361428233702531610&as_sdt=2005&sciodt=0,5&hl=en): 10)

* Xiang Kong, Zhaopeng Tu, Shuming Shi, Eduard Hovy, and Tong Zhang. [Neural Machine Translation with Adequacy-Oriented Learning](https://arxiv.org/pdf/1811.08541.pdf). In *Proceedings of AAAI 2019*.

* Zaixiang Zheng, Shujian Huang, Zhaopeng Tu, Xin-Yu Dai, and Jiajun Chen. 2019. [Dynamic Past and Future for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1086/). In *Proceedings of EMNLP 2019*.

Document-level Translation


* Longyue Wang, Zhaopeng Tu, Andy Way, and Qun Liu. 2017. [Exploiting Cross-Sentence Context for Neural Machine Translation](http://aclweb.org/anthology/D17-1301). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=7614033458131200423&as_sdt=2005&sciodt=0,5&hl=en): 19)

* Jörg Tiedemann, and Yves Scherrer. 2017. [Neural Machine Translation with Extended Context](http://www.aclweb.org/anthology/W17-4811). In *Proceedings of the Third Workshop on Discourse in Machine Translation*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=16950693252825831302): 12)

* Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow. 2018. [Evaluating Discourse Phenomena in Neural Machine Translation](http://aclweb.org/anthology/N18-1118). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=1436848483757205177&as_sdt=2005&sciodt=0,5&hl=en): 11)

* Elena Voita, Pavel Serdyukov, Rico Sennrich, and Ivan Titov. 2018. [Context-Aware Neural Machine Translation Learns Anaphora Resolution](http://aclweb.org/anthology/P18-1117). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=16594777811418303416&as_sdt=2005&sciodt=0,5&hl=en): 7)

* Sameen Maruf and Gholamreza Haffari. 2018. [Document Context Neural Machine Translation with Memory Networks](http://aclweb.org/anthology/P18-1118). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=17337605639464710308&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Shaohui Kuang, Deyi Xiong, Weihua Luo, Guodong Zhou. 2018. [Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches](http://aclweb.org/anthology/C18-1050). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com/scholar?cites=12991114209233735355&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Shaohui Kuang and Deyi Xiong. 2018. [Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model](https://arxiv.org/pdf/1806.04466.pdf). In *Proceedings of COLING 2018*.

* Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang and Yang Liu. 2018. [Improving the Transformer Translation Model with Document-Level Context](http://aclweb.org/anthology/D18-1049). In *Proceedings of EMNLP 2018*.

* Samuel Läubli, Rico Sennrich, and Martin Volk. 2018. [Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation](http://aclweb.org/anthology/D18-1512). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=13135618112238453725&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, and James Henderson. 2018. [Document-Level Neural Machine Translation with Hierarchical Attention Networks](http://aclweb.org/anthology/D18-1325). In *Proceedings of EMNLP 2018*.

* Zhaopeng Tu, Yang Liu, Shuming Shi, and Tong Zhang. 2018. [Learning to Remember Translation History with a Continuous Cache](https://arxiv.org/pdf/1711.09367.pdf). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=15854294745619374487&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Elena Voita, Rico Sennrich, and Ivan Titov. 2019. [When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion](https://arxiv.org/pdf/1905.05979). In *Proceedings of ACL 2019*.

* Elena Voita, Rico Sennrich, and Ivan Titov. 2019. [Context-Aware Monolingual Repair for Neural Machine Translation](https://arxiv.org/pdf/1909.01383). In *Proceedings of EMNLP 2019*.

* Zhengxin Yang, Jinchao Zhang, Fandong Meng, Shuhao Gu, Yang Feng, and Jie Zhou. 2019. [Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation](https://arxiv.org/abs/1909.00564). In *Proceedings of EMNLP 2019*.

* Xin Tan, Longyin Zhang, Deyi Xiong, Guodong Zhou. 2019. [Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation](https://www.aclweb.org/anthology/D19-1168/). In *Proceedings of EMNLP 2019*.

* Yunsu Kim, Duc Thanh Tran, Hermann Ney. 2019. [When and Why is Document-level Context Useful in Neural Machine Translation?](https://arxiv.org/abs/1910.00294) In *Proceedings of DiscoMT@EMNLP 2019*.

* Zuchao Li, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Zhuosheng Zhang and Hai Zhao. 2020. [Explicit Sentence Compression for Neural Machine Translation](https://arxiv.org/abs/1912.11980). In *Proceedings of AAAI 2020*.

* Bei Li, Hui Liu, Ziyang Wang, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu and Changliang Li. 2020. [Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation](http://arxiv.org/abs/2005.03393). In *Proceedings of ACL 2020*.

* Xintong Li, Lemao Liu, Rui Wang, Guoping Huang and Max Meng. 2020. [Regularized Context Gates on Transformer for Machine Translation](https://arxiv.org/abs/1908.11020). In *Proceedings of ACL 2020*.

* Danielle Saunders, Felix Stahlberg and Bill Byrne. 2020. [Using Context in Neural Machine Translation Training Objectives](https://arxiv.org/abs/2005.01483). In *Proceedings of ACL 2020*.

* Shuming Ma, Dongdong Zhang and Ming Zhou. 2020. [A Simple and Effective Unified Encoder for Document-Level Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.321/). In *Proceedings of ACL 2020*.

* Zaixiang Zheng, Xiang Yue, Shujian Huang, Jiajun Chen, Alexandra Birch. 2020. [Towards Making the Most of Context in Neural Machine Translation](https://arxiv.org/abs/2002.07982). In *Proceedings of IJCAI 2020*.

* Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wang Ling, Lingpeng Kong, Phil Blunsom and Chris Dyer. 2020. [Better Document-Level Machine Translation with Bayes' Rule](https://arxiv.org/abs/1910.00553). *Transactions of the Association for Computational Linguistics*.

* Xiaomian Kang, Yang Zhao, Jiajun Zhang, Chengqing Zong. 2020. [Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning](https://www.aclweb.org/anthology/2020.emnlp-main.175/). In *Proceedings of EMNLP 2020*.

* Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan. 2020. [Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.81/). In *Proceedings of EMNLP 2020*.

* Domenic Donato, Lei Yu, Chris Dyer . 2021 . [Diverse Pretrained Context Encodings Improve Document Translation](https://aclanthology.org/2021.acl-long.104.pdf). In *Proceedings of ACL 2021*.

Robustness


* Yonatan Belinkov and Yonatan Bisk. 2018. [Synthetic and Natural Noise Both Break Neural Machine Translation](https://openreview.net/pdf?id=BJ8vJebC-). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com/scholar?cites=10493132199224079445&as_sdt=2005&sciodt=0,5&hl=en): 33)

* Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. [Generating Natural Adversarial Examples](https://openreview.net/pdf?id=H1BLjgZCb). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com/scholar?cites=6487263081764376046&as_sdt=2005&sciodt=0,5&hl=en): 45)

* Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, and Yang Liu. 2018. [Towards Robust Neural Machine Translation](http://aclweb.org/anthology/P18-1163). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=13572592499424174633&as_sdt=2005&sciodt=0,5&hl=en): 5)   

* Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. [Semantically Equivalent Adversarial Rules for Debugging NLP models](http://aclweb.org/anthology/P18-1079). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=3200079019495885814&as_sdt=2005&sciodt=0,5&hl=en): 12)         

* Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018. [On Adversarial Examples for Character-Level Neural Machine Translation](http://aclweb.org/anthology/C18-1055). In *Proceedings of COLING 2018*.

* Paul Michel and Graham Neubig. 2018. [MTNT: A Testbed for Machine Translation of Noisy Text](http://aclweb.org/anthology/D18-1050). In *Proceedings of EMNLP 2018*.  

* Antonios Anastasopoulos, Alison Lui, Toan Nguyen, and David Chiang. 2019. [Neural Machine Translation of Text from Non-Native Speakers](https://arxiv.org/pdf/1808.06267.pdf). In *Proceedings of NAACL 2019*.  

* Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. [On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models](https://arxiv.org/pdf/1903.06620.pdf). In *Proceedings of NAACL 2019*.   

* Vaibhav Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. [Improving Robustness of Machine Translation with Synthetic Noise](https://arxiv.org/pdf/1902.09508.pdf). In *Proceedings of NAACL 2019*. 

* Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. [Robust Neural Machine Translation with Doubly Adversarial Inputs](https://arxiv.org/pdf/1906.02443). In *Proceedings of ACL 2019*.

* Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2019. [Robust Neural Machine Translation with Joint Textual and Phonetic Embedding](https://www.aclweb.org/anthology/P19-1291). In *Proceedings of ACL 2019*.

* Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh. 2020. [Robustness Verification for Transformers](https://openreview.net/forum?id=BJxwPJHFwS). In *Proceedings of ICLR 2020*. 

* Wei Zou, Shujian Huang, Jun Xie, Xinyu Dai and Jiajun Chen. 2020. [A Reinforced Generation of Adversarial Examples for Neural Machine Translation](https://arxiv.org/abs/1911.03677). In *Proceedings of ACL 2020*.  

* Xing Niu, Prashant Mathur, Georgiana Dinu and Yaser Al-Onaizan. 2020. [Evaluating Robustness to Input Perturbations for Neural Machine Translation](https://arxiv.org/abs/2005.00580). In *Proceedings of ACL 2020*.

* Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan and Dawn Song. 2020. [Pretrained Transformers Improve Out-of-Distribution Robustness](https://arxiv.org/abs/2004.06100). In *Proceedings of ACL 2020*.

* Yong Cheng, Lu Jiang, Wolfgang Macherey and Jacob Eisenstein. 2020. [AdvAug: Robust Adversarial Augmentation for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.529/). In *Proceedings of ACL 2020*.

* Eric Wallace, Mitchell Stern, Dawn Song. 2020. [Imitation Attacks and Defenses for Black-box Machine Translation Systems](https://www.aclweb.org/anthology/2020.emnlp-main.446/). In *Proceedings of EMNLP 2020*.

* Denis Emelin, Ivan Titov, Rico Sennrich. 2020. [Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks](https://www.aclweb.org/anthology/2020.emnlp-main.616/). In *Proceedings of EMNLP 2020*.

* Xinze Zhang , Junzhe Zhang , Zhenhua Chen , Kun He . 2021. [Crafting Adversarial Examples for Neural Machine Translation](https://aclanthology.org/2021.acl-long.153.pdf) . In *Proceedings of ACL 2021*. 

Interpretability
 

* Yanzhuo Ding, Yang Liu, Huanbo Luan and Maosong Sun. 2017. [Visualizing and Understanding Neural Machine Translation](http://aclweb.org/anthology/P17-1106). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=6029143337933047130&as_sdt=2005&sciodt=0,5&hl=en): 22)

* Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M. Rush. 2018. [Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models](https://arxiv.org/pdf/1804.09299.pdf). In *Proceedings of VAST 2018* and *Proceedings of EMNLP-BlackBox 2018*. ([Citation](https://scholar.google.com/scholar?cites=8924303979242528991&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Alessandro Raganato and Jorg Tiedemann. 2018. [An Analysis of Encoder Representations in Transformer-Based Machine Translation](http://aclweb.org/anthology/W18-5431). In *Proceedings of EMNLP-BlackBox 2018*.

* Felix Stahlberg, Danielle Saunders, and Bill Byrne. 2018. [An Operation Sequence Model for Explainable Neural Machine Translation](http://aclweb.org/anthology/W18-5420). In *Proceedings of EMNLP-BlackBox 2018*.

* Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, D. Anthony Bau, and James Glass. 2019. [What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models](http://people.csail.mit.edu/belinkov/assets/pdf/aaai2019.pdf). In *Proceedings of AAAI 2019*. ([Citation](https://scholar.google.com/scholar?cites=9612190838970536755&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2019. [Identifying and Controlling Important Neurons in Neural Machine Translation](https://openreview.net/pdf?id=H1z-PsR5KX). In *Proceedings of ICLR 2019*. ([Citation](https://scholar.google.com/scholar?cites=10670221460130643181&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Yonatan Belinkov, and James Glass. 2019. [Analysis Methods in Neural Language Processing: A Survey](https://www.aclweb.org/anthology/Q19-1004). *Transactions of the Association for Computational Linguistics*.

* Sofia Serrano and Noah A. Smith. 2019. [Is Attention Interpretable?](https://arxiv.org/pdf/1906.03731). In *Proceedings of ACL 2019*.

* Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. [Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned](https://arxiv.org/pdf/1905.09418). In *Proceedings of ACL 2019*.

* Joris Baan, Jana Leible, Mitja Nikolaus, David Rau, Dennis Ulmer, Tim Baumgärtner, Dieuwke Hupkes, and Elia Bruni. 2019. [On the Realization of Compositionality in Neural Networks](https://arxiv.org/pdf/1906.01634). In *Proceedings of ACL 2019*.

* Jesse Vig and Yonatan Belinkov. 2019. [Analyzing the Structure of Attention in a Transformer Language Model](https://arxiv.org/pdf/1906.04284). In *Proceedings of ACL 2019*.

* Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, and Zhaopeng Tu. 2019. [Assessing the Ability of Self-Attention Networks to Learn Word Order](https://arxiv.org/pdf/1906.00592). In *Proceedings of ACL 2019*.

* Xintong Li, Guanlin Li, Lemao Liu, Max Meng, and Shuming Shi. 2019. [On the Word Alignment from Neural Machine Translation](https://www.aclweb.org/anthology/P19-1124). In *Proceedings of ACL 2019*.

* Elena Voita, Rico Sennrich, and Ivan Titov. 2019. [The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives](https://arxiv.org/pdf/1909.01380). In *Proceedings of EMNLP 2019*.

* Shilin He, Zhaopeng Tu, Xing Wang, Longyue Wang, Michael R. Lyu, and Shuming Shi. 2019. [Towards Understanding Neural Machine Translation with Word Importance](https://arxiv.org/pdf/1909.00326). In *Proceedings of EMNLP 2019*.

* Felix Stahlberg and Bill Byrne. 2019. [On NMT Search Errors and Model Errors: Cat Got Your Tongue?](https://arxiv.org/pdf/1908.10090). In *Proceedings of EMNLP 2019*.

* Gino Brunner, Yang Liu, Damian Pascual, Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer. 2020. [On Identifiability in Transformers](https://openreview.net/forum?id=BJg1f6EFDB). In *Proceedings of ICLR 2020*.

* Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar. 2020. [Are Transformers universal approximators of sequence-to-sequence functions?](https://openreview.net/forum?id=ByxRM0Ntvr). In *Proceedings of ICLR 2020*.

* Akash Kumar Mohankumar, Preksha Nema, Sharan Narasimhan, Mitesh M. Khapra, Balaji Vasan Srinivasan and Balaraman Ravindran. 2020. [Towards Transparent and Explainable Attention Models](https://arxiv.org/abs/2004.14243). In *Proceedings of ACL 2020*.

* Samira Abnar and Willem Zuidema. 2020. [Quantifying Attention Flow in Transformers](https://arxiv.org/abs/2005.00928). In *Proceedings of ACL 2020*.

* Jierui Li, Lemao Liu, Huayang Li, Guanlin Li, Guoping Huang and Shuming Shi. 2020. [Evaluating Explanation Methods for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.35/). In *Proceedings of ACL 2020*.

* Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui. 2020. [Attention is Not Only a Weight: Analyzing Transformers with Vector Norms](https://www.aclweb.org/anthology/2020.emnlp-main.574/). In *Proceedings of EMNLP 2020*.

* Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han. 2020. [Understanding the Difficulty of Training Transformers](https://www.aclweb.org/anthology/2020.emnlp-main.463/). In *Proceedings of EMNLP 2020*.

* Wenxuan Wang, Zhaopeng Tu. 2020. [Rethinking the Value of Transformer Components](https://www.aclweb.org/anthology/2020.coling-main.529.pdf). In *Proceedings of COLING 2020*.

* Elena Voita , Rico Sennrich , Ivan Titov . 2021 . [ Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation](https://aclanthology.org/2021.acl-long.91.pdf) . In *Proceedings of ACL 2021*.

* Weicheng Ma, Kai Zhang , Renze Lou , Lili Wang . 2021 . [Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks](https://aclanthology.org/2021.acl-long.152.pdf) . In *Proceedings of ACL 2021*.

Linguistic Interpretation


* Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, and Yoshua Bengio. 2015. [Embedding Word Similarity with Neural Machine Translation](https://arxiv.org/pdf/1412.6448.pdf). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com/scholar?cites=3941248209566557946&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Xing Shi, Inkit Padhi, and Kevin Knight. 2016. [Does String-based Neural MT Learn Source Syntax?](http://aclweb.org/anthology/D16-1159). In *Proceedings of the EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=13782051589621719871): 57)

* Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass. 2017. [What do Neural Machine Translation Models Learn about Morphology?](http://aclweb.org/anthology/P17-1080). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=3142186338143493642&as_sdt=2005&sciodt=0,5&hl=en): 50)

* Ella Rabinovich, Noam Ordan, and Shuly Wintner. 2017. [Found in Translation: Reconstructing Phylogenetic Language Trees from Translations](http://aclweb.org/anthology/P17-1049). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=10035323574777301594&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Rico Sennrich. 2017. [How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs](http://aclweb.org/anthology/E17-2060). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=14294900718072928557&as_sdt=2005&sciodt=0,5&hl=en): 25)

* Adam Poliak, Yonatan Belinkov, James Glass, and Benjamin Van Durme. 2018. [On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference](http://aclweb.org/anthology/N18-2082). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=9402109271974711503&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Arianna Bisazza and Clara Tump. 2018. [The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation](http://aclweb.org/anthology/D18-1313). In *Proceedings of EMNLP 2018*.

* Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. [Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter](http://aclweb.org/anthology/D18-1396). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=1081737155461853408&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Gongbo Tang, Rico Sennrich, and Joakim Nivre. 2019. [Encoders Help You Disambiguate Word Senses in Neural Machine Translation](https://arxiv.org/pdf/1908.11771). In *Proceedings of EMNLP 2019*.

* Parker Riley, Isaac Caswell, Markus Freitag and David Grangier. 2020. [Translationese as a Language in “Multilingual” NMT](https://arxiv.org/abs/1911.03823). In *Proceedings of ACL 2020*.

* Emanuele Bugliarello, Sabrina J. Mielke, Antonios Anastasopoulos, Ryan Cotterell and Naoaki Okazaki. 2020. [It’s Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information](http://arxiv.org/abs/2005.02354). In *Proceedings of ACL 2020*.

Fairness and Diversity


* Hayahide Yamagishi, Shin Kanouchi, Takayuki Sato, and Mamoru Komachi. 2016. [Controlling the Voice of a Sentence in Japanese-to-English Neural Machine Translation](http://www.aclweb.org/anthology/W16-4620). In *Proceedings of the 3rd Workshop on Asian Translation*. ([Citation](https://scholar.google.com/scholar?cites=3457358295141990828&as_sdt=2005&sciodt=0,5&hl=en): 11)  

* Rico Sennrich, Barry Haddow and Alexandra Birch. 2016. [Controlling Politeness in Neural Machine Translation via Side Constraints](http://aclweb.org/anthology/N16-1005). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=13603295392629577946&as_sdt=2005&sciodt=0,5&hl=en): 49)           

* Xing Niu, Marianna Martindale, and Marine Carpuat. 2017. [A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output](http://aclweb.org/anthology/D17-1299). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?cites=1203074987073423616&as_sdt=2005&sciodt=0,5&hl=en): 8)   

* Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia, and Shuly Wintner. 2017. [Personalized Machine Translation: Preserving Original Author Traits](http://aclweb.org/anthology/E17-1101). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=6856955572531425903&as_sdt=2005&sciodt=0,5&hl=en): 10)    

* Myle Ott, Michael Auli, David Grangier, and Marc'Aurelio Ranzato. 2018. [Analyzing Uncertainty in Neural Machine Translation](https://arxiv.org/pdf/1803.00047). In *Proceedings of ICML 2018*. ([Citation](https://scholar.google.com/scholar?cites=1522001537063991105&as_sdt=2005&sciodt=0,5&hl=en): 11)    

* Paul Michel and Graham Neubig. 2018. [Extreme Adaptation for Personalized Neural Machine Translation](http://www.aclweb.org/anthology/P18-2050). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=16717798879574507487&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Eva Vanmassenhove, Christian Hardmeier, and Andy Way. 2018. [Getting Gender Right in Neural Machine Translation](http://www.aclweb.org/anthology/D18-1334). In *Proceedings of EMNLP 2018*.  

* Ashwin Kalyan, Peter Anderson, Stefan Lee, and Dhruv Batra. 2019. [Trainable Decoding of Sets of Sequences for Neural Sequence Models](http://proceedings.mlr.press/v97/kalyan19a/kalyan19a.pdf). In *Proceedings of ICML 2019*.  

* Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2019. [Mixture Models for Diverse Machine Translation: Tricks of the Trade](http://proceedings.mlr.press/v97/shen19c/shen19c.pdf). In *Proceedings of ICML 2019*.

* Wouter Kool, Herke van Hoof, and Max Welling. 2019. [Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement](http://proceedings.mlr.press/v97/kool19a/kool19a.pdf). In *Proceedings of ICML 2019*.

* Won Ik Cho, Ji Won Kim, Seok Min Kim, and Nam Soo Kim. 2019. [On Measuring Gender Bias in Translation of Gender-neutral Pronouns](https://arxiv.org/pdf/1905.11684). In *Proceedings of ACL 2019*.

* Gabriel Stanovsky, Noah A. Smith, and Luke Zettlemoyer. 2019. [Evaluating Gender Bias in Machine Translation](https://arxiv.org/pdf/1906.00591). In *Proceedings of ACL 2019*.

* Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. 2019. [Measuring Bias in Contextualized Word Representations](https://arxiv.org/pdf/1906.07337). In *Proceedings of ACL 2019*.

* Raphael Shu, Hideki Nakayama, and Kyunghyun Cho. 2019. [Generating Diverse Translations with Sentence Codes](https://www.aclweb.org/anthology/P19-1177). In *Proceedings of ACL 2019*.

* Daphne Ippolito, Reno Kriz, Joao Sedoc, Maria Kustikova, and Chris Callison-Burch. 2019. [Comparison of Diverse Decoding Methods from Conditional Language Models](https://www.aclweb.org/anthology/P19-1365). In *Proceedings of ACL 2019*.

* Xing Niu and Marine Carpuat. 2020. [Controlling Neural Machine Translation Formality with Synthetic Supervision](https://arxiv.org/abs/1911.08706). In *Proceedings of AAAI 2020*.

* Zewei Sun, Shujian Huang, Hao-Ran Wei, Xin-yu Dai, and Jiajun Chen. 2020. [Generating Diverse Translation by Manipulating Multi-Head Attention](https://arxiv.org/pdf/1911.09333). In *Proceedings of AAAI 2020*.

* Shuo Wang, Zhaopeng Tu, Shuming Shi and Yang Liu. 2020. [On the Inference Calibration of Neural Machine Translation](https://arxiv.org/abs/2005.00963). In *Proceedings of ACL 2020*.

* Danielle Saunders and Bill Byrne. 2020. [Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem](http://arxiv.org/abs/2004.04498). In *Proceedings of ACL 2020*.

* Dirk Hovy, Federico Bianchi and Tommaso Fornaciari. 2020. [“You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases](https://www.aclweb.org/anthology/2020.acl-main.154/). In *Proceedings of ACL 2020*.

* Luisa Bentivogli, Beatrice Savoldi, Matteo Negri, Mattia A. Di Gangi, Roldano Cattoni and Marco Turchi. 2020. [Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus](https://www.aclweb.org/anthology/2020.acl-main.619/). In *Proceedings of ACL 2020*.

* Sorami Hisamoto, Matt Post and Kevin Duh. 2020. [Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?](https://transacl.org/ojs/index.php/tacl/article/view/1779). *Transactions of the Association for Computational Linguistics*.

* Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn. 2020. [Simulated multiple reference training improves low-resource machine translation](https://www.aclweb.org/anthology/2020.emnlp-main.7/). In *Proceedings of EMNLP 2020*.

* Xuanfu Wu, Yang Feng, Chenze Shao. 2020. [Generating Diverse Translation from Model Distribution with Dropout](https://www.aclweb.org/anthology/2020.emnlp-main.82/). In *Proceedings of EMNLP 2020*.

Efficiency


* Abigail See, Minh-Thang Luong, and Christopher D. Manning. 2016. [Compression of Neural Machine Translation Models via Pruning](http://aclweb.org/anthology/K16-1029). In *Proceedings of CoNLL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=13072353668416361496&as_sdt=2005&sciodt=0,5&hl=en): 33)

* Yusuke Oda, Philip Arthur, Graham Neubig, Koichiro Yoshino, and Satoshi Nakamura. 2017. [Neural Machine Translation via Binary Code Prediction](http://aclweb.org/anthology/P17-1079). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=9954145361647418034&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Xing Shi and Kevin Knight. 2017. [Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary](http://aclweb.org/anthology/P17-2091). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=7302197227417767855&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Ofir Press and Lior Wolf. 2017. [Using the Output Embedding to Improve Language Models](http://aclweb.org/anthology/E17-2025). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=3142797974561089298): 126)

* Xiaowei Zhang, Wei Chen, Feng Wang, Shuang Xu, and Bo Xu. 2017. [Towards Compact and Fast Neural Machine Translation Using a Combined Method](http://aclweb.org/anthology/D17-1154). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=832815405370901340&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Felix Stahlberg and Bill Byrne. 2017. [Unfolding and Shrinking Neural Machine Translation Ensembles](http://aclweb.org/anthology/D17-1208). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=14880262780099335970&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Jacob Devlin. 2017. [Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU](http://aclweb.org/anthology/D17-1300). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=17103371978045782164&as_sdt=2005&sciodt=0,5&hl=en): 8) 

* Dakun Zhang, Jungi Kim, Josep Crego, and Jean Senellart. 2017. [Boosting Neural Machine Translation](http://aclweb.org/anthology/I17-2046). In *Proceedings of IJCNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=10941157301841399344&as_sdt=2005&sciodt=0,5&hl=en): 3) 

* Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, and Noam Shazeer. 2018. [Fast Decoding in Sequence Models Using Discrete Latent Variables](https://arxiv.org/pdf/1803.03382.pdf). In *Proceedings of ICML 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=4042994175439965815&as_sdt=2005&sciodt=0,5&hl=en): 3) 

* Gonzalo Iglesias, William Tambellini, Adrià de Gispert, Eva Hasler, and Bill Byrne. 2018. [Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment](http://aclweb.org/anthology/N18-3013). In *Proceedings of NAACL 2018*. 

* Jerry Quinn and Miguel Ballesteros. 2018. [Pieces of Eight: 8-bit Neural Machine Translation](http://aclweb.org/anthology/N18-3014). In *Proceedings of NAACL 2018*.

* Matt Post and David Vilar. 2018. [Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation](http://aclweb.org/anthology/N18-1119). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=3504623917475500888&as_sdt=2005&sciodt=0,5&hl=en): 6)   

* Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. [Accelerating Neural Transformer via an Average Attention Network](http://aclweb.org/anthology/P18-1166). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=16436039193082710776&as_sdt=2005&sciodt=0,5&hl=en): 5) 

* Rui Wang, Masao Utiyama, and Eiichiro Sumita. 2018. [Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation](http://aclweb.org/anthology/P18-2048). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=867223386840543463&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Myle Ott, Sergey Edunov, David Grangier, and Michael Auli. 2018. [Scaling Neural Machine Translation](http://aclweb.org/anthology/W18-6301). In *Proceedings of the Third Conference on Machine Translation: Research Papers*.

* Joern Wuebker, Patrick Simianer, and John DeNero. 2018. [Compact Personalized Models for Neural Machine Translation](http://aclweb.org/anthology/D18-1104). In *Proceedings of EMNLP 2018*.

* Wen Zhang, Liang Huang, Yang Feng, Lei Shen, and Qun Liu. 2018. [Speeding Up Neural Machine Translation Decoding by Cube Pruning](http://aclweb.org/anthology/D18-1460). In *Proceedings of EMNLP 2018*.  

* Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Hai Zhao. 2018. [Exploring Recombination for Efficient Decoding of Neural Machine Translation](http://aclweb.org/anthology/D18-1511). In *Proceedings of EMNLP 2018*.   

* Nikolay Bogoychev, Kenneth Heafield, Alham Fikri Aji, and Marcin Junczys-Dowmunt. 2018. [Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation](http://aclweb.org/anthology/D18-1332). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=12306021941401324130&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Mitchell Stern, Noam Shazeer, and Jakob Uszkoreit. 2018. [Blockwise Parallel Decoding for Deep Autoregressive Models](https://papers.nips.cc/paper/8212-blockwise-parallel-decoding-for-deep-autoregressive-models.pdf). In *Proceedings of NeurIPS 2018*.

* Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, Song Han. 2020. [Efficient Transformer for Mobile Applications](https://openreview.net/forum?id=ByeMPlHKPH). In *Proceedings of ICLR 2020*. 

* Christopher Brix, Parnia Bahar and Hermann Ney. 2020. [Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture](https://arxiv.org/abs/2005.03454). In *Proceedings of ACL 2020*.

* Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, and Song Han. 2020. [HAT: Hardware-Aware Transformers for Efficient Natural Language Processing](https://arxiv.org/abs/2005.14187). In *Proceedings of ACL 2020*. 

* Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez. 2020. [Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers](https://arxiv.org/abs/2002.11794). In *Proceedings of ICML 2020*. 

* Maximiliana Behnke, Kenneth Heafield. 2020. [Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.211/). In *Proceedings of EMNLP 2020*.

* Minjia Zhang, Yuxiong He. 2020. [Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping](https://papers.nips.cc/paper/2020/file/a1140a3d0df1c81e24ae954d935e8926-Paper.pdf). In *Proceedings of NeurIPS 2020*.

* Yimeng Wu, Peyman Passban, Mehdi Rezagholizadeh, Qun Liu. 2020. [Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers](https://www.aclweb.org/anthology/2020.emnlp-main.74/). In *Proceedings of EMNLP 2020*.

Pre-Training


* Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. [Learned in Translation: Contextualized Word Vectors](http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=12356231721397988330&as_sdt=2005&sciodt=0,5&hl=en): 136)

* Ye Qi, Devendra Sachan, Matthieu Felix, Sarguna Padmanabhan, and Graham Neubig. 2018. [When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?](http://aclweb.org/anthology/N18-2084). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=6166308028416584239&as_sdt=2005&sciodt=0,5&hl=en): 19)

* Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. [Deep Contextualized Word Representations](http://aclweb.org/anthology/N18-1202). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=14181983828043963745&as_sdt=2005&sciodt=0,5&hl=en): 519)

* Jeremy Howard and Sebastian Ruder. 2018. [Universal Language Model Fine-tuning for Text Classification](http://aclweb.org/anthology/P18-1031). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=2986760879834934707&as_sdt=2005&sciodt=0,5&hl=en): 114)

* Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. [XNLI: Evaluating Cross-lingual Sentence Representations](https://www.aclweb.org/anthology/D18-1269). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=15041461338388299895): 9)

* Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf). Technical Report, OpenAI. ([Citation](https://scholar.google.com.hk/scholar?cites=8939608408376234789&as_sdt=2005&sciodt=0,5&hl=en): 94)

* Guillaume Lample and Alexis Conneau. 2019. [Cross-lingual Language Model Pretraining](https://arxiv.org/pdf/1901.07291). *arXiv:1901.07291*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=11542237222100207278): 3)

* Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805). In *Proceedings of NAACL 2019*. ([Citation](https://scholar.google.com.hk/scholar?cites=3166990653379142174&as_sdt=2005&sciodt=0,5&hl=en): 292)

* Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf). Technical Report, OpenAI. ([Citation](https://scholar.google.com/scholar?cites=7713405291981945630&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Sergey Edunov, Alexei Baevski, and Michael Auli. 2019. [Pre-trained Language Model Representations for Language Generation](https://arxiv.org/pdf/1903.09722.pdf). In *Proceedings of NAACL 2019*. ([Citation](https://scholar.google.com/scholar?cites=46961033050134131&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. [MASS: Masked Sequence to Sequence Pre-training for Language Generation](https://arxiv.org/pdf/1905.02450). In *Proceedings of ICML 2019*.

* Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/pdf/1906.08237). *arXiv:1906.08237*.

* Jiacheng Yang, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Yong Yu, Weinan Zhang and Lei Li. 2020. [Towards Making the Most of BERT in Neural Machine Translation](https://arxiv.org/abs/1908.05672). In *Proceedings of AAAI 2020*.

* Rongxiang Weng, Heng Yu, Shujian Huang, Shanbo Cheng and Weihua Luo. 2020. [Acquiring Knowledge from Pre-trained Model to Neural Machine Translation](https://arxiv.org/abs/1912.01774). In *Proceedings of AAAI 2020*.

* Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu. 2020. [Incorporating BERT into Neural Machine Translation](https://openreview.net/forum?id=Hyl7ygStwB). In *Proceedings of ICLR 2020*.

* Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov and Luke Zettlemoyer. 2020. [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461). In *Proceedings of ACL 2020*.

* Thibault Sellam, Dipanjan Das and Ankur Parikh. 2020. [BLEURT: Learning Robust Metrics for Text Generation](https://www.aclweb.org/anthology/2020.acl-main.704/). In *Proceedings of ACL 2020*.

* Sascha Rothe, Shashi Narayan and Aliaksei Severyn. 2020. [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://transacl.org/ojs/index.php/tacl/article/view/1849). *Transactions of the Association for Computational Linguistics*.

* Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju. 2020. [CSP:Code-Switching Pre-training for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.208/). In *Proceedings of EMNLP 2020*.

* Zehui Lin, Xiao Pan, Mingxuan Wang, Xipeng Qiu, Jiangtao Feng, Hao Zhou, Lei Li. 2020. [Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information](https://www.aclweb.org/anthology/2020.emnlp-main.210/). In *Proceedings of EMNLP 2020*.

* Junliang Guo, Zhirui Zhang, Linli Xu, Hao-Ran Wei, Boxing Chen, Enhong Chen. 2020. [Incorporating BERT into Parallel Sequence Decoding with Adapters](https://papers.nips.cc/paper/2020/file/7a6a74cbe87bc60030a4bd041dd47b78-Paper.pdf). In *Proceedings of NeurIPS 2020*.

* Linqing Chen , Junhui Li , Zhengxian Gong , Boxing Chen , Weihua Luo , Min Zhang , Guodong Zhou . 2021. [Breaking the Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training](https://aclanthology.org/2021.acl-long.222.pdf) . In *Proceedings of ACL 2021*.

Non-Autoregressive Translation


* Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. 2018. [Non-Autoregressive Neural Machine Translation](https://arxiv.org/abs/1711.02281). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=3482831974828539059&as_sdt=2005&sciodt=0,5&hl=en): 93)

* Chunqi Wang, Ji Zhang, and Haiqing Chen. 2018. [Semi-Autoregressive Neural Machine Translation](http://aclweb.org/anthology/D18-1044). In *Proceedings of EMNLP 2018*.  

* Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. [Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement](http://aclweb.org/anthology/D18-1149). In *Proceedings of EMNLP 2018*. 

* Jindřich Libovický and Jindřich Helcl. 2018. [End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification](http://aclweb.org/anthology/D18-1336). In *Proceedings of EMNLP 2018*. 

* Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard Hovy. 2019. [FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow](https://arxiv.org/pdf/1909.02480). In *Proceedings of EMNLP 2019*.

* Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2019. [Hint-Based Training for Non-Autoregressive Machine Translation](https://arxiv.org/pdf/1909.06708). In *Proceedings of EMNLP 2019*.

* Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer. 2019. [Mask-Predict: Parallel Decoding of Conditional Masked Language Models](https://arxiv.org/abs/1904.09324). In *Proceedings of EMNLP 2019*.

* Sean Welleck, Kianté Brantley, Hal Daumé III, and Kyunghyun Cho. 2019. [Non-Monotonic Sequential Text Generation](https://arxiv.org/pdf/1902.02192). In *Proceedings of ICML 2019*.

* Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit. 2019. [Insertion Transformer: Flexible Sequence Generation via Insertion Operations](http://proceedings.mlr.press/v97/stern19a/stern19a.pdf). In Proceedings of ICML 2019.

* Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Xilin Chen, and Jie Zhou. 2019. [Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation](https://arxiv.org/pdf/1906.09444). In *Proceedings of ACL 2019*.

* Bingzhen Wei, Mingxuan Wang, Hao Zhou, Junyang Lin, and Xu Sun. 2019. [Imitation Learning for Non-Autoregressive Neural Machine Translation](https://www.aclweb.org/anthology/P19-1125). In *Proceedings of ACL 2019*.

* Jiatao Gu, Changhan Wang, Junbo Zhao. 2019. [Levenshtein Transformer](https://papers.nips.cc/paper/9297-levenshtein-transformer). In *Proceedings of NeurIPS 2019*.

* Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, and Tie-Yan Liu. 2019. [Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input](https://arxiv.org/pdf/1812.09664.pdf). In *Proceedings of AAAI 2019*.

* Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019. [Non-Autoregressive Machine Translation with Auxiliary Regularization](https://arxiv.org/pdf/1902.10245.pdf). In *Proceedings of AAAI 2019*.

* Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu. 2020. [Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation](https://arxiv.org/abs/1911.08717). In *Proceedings of AAAI 2020*.

* Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng and Jie Zhou. 2020. [Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation](https://arxiv.org/pdf/1911.09320.pdf). In *Proceedings of AAAI 2020*

* Raphael Shu, Jason Lee, Hideki Nakayama and Kyunghyun Cho. 2020. [Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior](https://arxiv.org/abs/1908.07181). In *Proceedings of AAAI 2020*.

* Jiawei Zhou and Phillip Keung. 2020. [Improving Non-autoregressive Neural Machine Translation with Monolingual Data](https://arxiv.org/abs/2005.00932). In *Proceedings of ACL 2020*.

* Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, and Omer Levy. 2020. [Aligned Cross Entropy for Non-Autoregressive Machine Translation](https://arxiv.org/abs/2004.01655). In *Proceedings of ICML 2020*.

* Jungo Kasai, James Cross, Marjan Ghazvininejad, and Jiatao Gu. 2020. [Parallel Machine Translation with Disentangled Context Transformer](https://arxiv.org/abs/2001.05136). In *Proceedings of ICML 2020*.

* Junliang Guo, Linli Xu and Enhong Chen. 2020. [Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.36/). In *Proceedings of ACL 2020*.

* Qiu Ran, Yankai Lin, Peng Li and Jie Zhou. 2020. [Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.277/). In *Proceedings of ACL 2020*.

* William Chan, Mitchell Stern, Jamie Kiros, Jakob Uszkoreit. 2020. [An Empirical Study of Generation Order for Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.464/). In *Proceedings of EMNLP 2020*.

* Jason Lee, Raphael Shu, Kyunghyun Cho. 2020. [Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.73/). In *Proceedings of EMNLP 2020*.

* Xiang Kong, Zhisong Zhang, Eduard Hovy. 2020. [Incorporating a Local Translation Mechanism into Non-autoregressive Translation](https://www.aclweb.org/anthology/2020.emnlp-main.79/). In *Proceedings of EMNLP 2020*.

* Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi. 2020. [Non-Autoregressive Machine Translation with Latent Alignments](https://www.aclweb.org/anthology/2020.emnlp-main.83/). In *Proceedings of EMNLP 2020*.

* Liang Ding, Longyue Wang, Di Wu, Dacheng Tao, Zhaopeng Tu. 2020. [Context-Aware Cross-Attention for Non-Autoregressive Translation](https://www.aclweb.org/anthology/2020.coling-main.389.pdf). In *Proceedings of COLING 2020*.

* Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah Smith. 2021. [Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation](https://openreview.net/pdf?id=KpfasTaLUpq). In *Proceedings of ICLR 2021*.

* Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu. 2021. [Understanding and Improving Lexical Choice in Non-Autoregressive Translation](https://openreview.net/pdf?id=ZTFeSBIX9C). In *Proceedings of ICLR 2021*.

* Qiu Ran, Yankai Lin, Peng Li, Jie Zhou. 2021. [Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information](https://arxiv.org/pdf/1911.02215.pdf). In *Proceedings of AAAI 2021*.

* Yongchang Hao, Shilin He, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu, Xing Wang. 2021. [Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation](https://www.aclweb.org/anthology/2021.naacl-main.313.pdf). In *Proceedings of NAACL 2021*.

* Yu Bao, Shujian Huang, Tong Xiao, Dongqi Wang, Xinyu Dai, Jiajun Chen. 2021. [Non-Autoregressive Translation by Learning Target Categorical Codes](https://www.aclweb.org/anthology/2021.naacl-main.458.pdf). In *Proceedings of NAACL 2021*.

* Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu. 2021. [Progressive Multi-Granularity Training for Non-Autoregressive Translation](https://arxiv.org/pdf/2106.05546.pdf). In *Proceedings of ACL 2021*.

* Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu. 2021. [Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation](https://arxiv.org/pdf/2106.00903.pdf). In *Proceedings of ACL 2021*.

* Cunxiao Du, Zhaopeng Tu, Jing Jiang. 2021. [Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation](https://arxiv.org/pdf/2106.05093.pdf). In *Proceedings of ICML 2021*.

* Lihua Qian, Hao Zhou , Yu Bao , Mingxuan Wang , Lin Qiu , Weinan Zhang , Yong Yu , Lei Li . 2021. [Glancing Transformer for Non-Autoregressive Neural Machine Translation](https://aclanthology.org/2021.acl-long.155.pdf) . In *Proceedings of ACL 2021*.

Speech Translation and Simultaneous Translation


* Matt Post, Gaurav Kumar, Adam Lopez, Damianos Karakos, Chris Callison-Burch and Sanjeev Khudanpur. 2013. [Improved Speech-to-Text Translation with the Fisher and Callhome Spanish–English Speech Translation Corpus](http://www.mt-archive.info/10/IWSLT-2013-Post.pdf). In *Proceedings of IWSLT 2013*. ([Citation](https://scholar.google.com.hk/scholar?cites=11894485689812442585&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Gaurav Kumar, Matt Post, Daniel Povey and Sanjeev Khudanpur. 2014. [Some insights from translating conversational telephone speech](https://ieeexplore.ieee.org/abstract/document/6854197) In *Proceedings of ICASSP 2014*. ([Citation](https://scholar.google.com.hk/scholar?cites=8525865656244874295&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Long Duong, Antonios Anastasopoulos, David Chiang, Steven Bird, and Trevor Cohn. 2016. [An Attentional Model for Speech Translation without Transcription](http://www.aclweb.org/anthology/N16-1109). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=17801967122712636447&as_sdt=2005&sciodt=0,5&hl=en): 37)

* Antonios Anastasopoulos, David Chiang, and Long Duong. 2016. [An Unsupervised Probability Model for Speech-to-translation Alignment of Low-resource Languages](https://aclweb.org/anthology/D16-1133). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=323823800810193203&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu and Zhifeng Chen. 2017. [Sequence-to-sequence Models can Directly Translate Foreign Speech](https://arxiv.org/abs/1703.08581). In *Proceedings of Interspeech 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=10073093152246570315&as_sdt=2005&sciodt=0,5&hl=en): 41)

* Jiatao Gu, Graham Neubig, Kyunghyun Cho, and Victor O.K. Li. 2017. [Learning to Translate in Real-time with Neural Machine Translation](http://aclweb.org/anthology/E17-1099). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=14299891671990230013&as_sdt=2005&sciodt=0,5&hl=en): 17)

* Sameer Bansal, Herman Kamper, Adam Lopez, and Sharon Goldwater. 2017. [Towards Speech-to-text Translation without Speech Recognition](http://aclweb.org/anthology/E17-2076). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=639319209334631051&as_sdt=2005&sciodt=0,5&hl=en): 13)  

* Antonios Anastasopoulos and David Chiang. 2018. [Tied Multitask Learning for Neural Speech Translation](https://arxiv.org/pdf/1802.06655.pdf). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=5810351802252447673&as_sdt=2005&sciodt=0,5&hl=en): 10) 

* Fahim Dalvi, Nadir Durrani, Hassan Sajjad, and Stephan Vogel. 2018. [Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation](http://aclweb.org/anthology/N18-2079). In *Proceedings of NAACL 2018*. 

* Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, and Graham Neubig. 2018. [Automatic Estimation of Simultaneous Interpreter Performance](http://aclweb.org/anthology/P18-2105). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=5687670489913511293&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Florian Dessloch, Thanh-Le Ha, Markus Müller, Jan Niehues, Thai Son Nguyen, Ngoc-Quan Pham, Elizabeth Salesky, Matthias Sperber, Sebastian Stüker, Thomas Zenkel, and Alexander Waibel. 2018. [KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning](http://aclweb.org/anthology/C18-2020). In *Proceedings of COLING 2018*.

* Ashkan Alinejad, Maryam Siahbani, and Anoop Sarkar. 2018. [Prediction Improves Simultaneous Neural Machine Translation](http://aclweb.org/anthology/D18-1337). In *Proceedings of EMNLP 2018*.  

* Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, and Sharon Goldwater. 2019. [Pre-training on high-resource speech recognition improves low-resource speech-to-text translation](https://arxiv.org/pdf/1809.01431.pdf). In *Proceedings of NAACL 2019*.

* Nikolai Vogler, Craig Stewart, and Graham Neubig. 2019. [Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation](https://arxiv.org/pdf/1904.00930.pdf). In *Proceedings of NAACL 2019*.

* Elizabeth Salesky, Matthias Sperber, and Alex Waibel. 2019. [Fluent Translations from Disfluent Speech in End-to-End Speech Translation](https://arxiv.org/pdf/1906.00556). In *Proceedings of NAACL 2019*.

* Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, and Colin Raffel. 2019. [Monotonic Infinite Lookback Attention for Simultaneous Machine Translation](https://arxiv.org/pdf/1906.05218). In *Proceedings of ACL 2019*.

* Matthias Sperber, Graham Neubig, Ngoc-Quan Pham, and Alex Waibel. 2019. [Self-Attentional Models for Lattice Inputs](https://arxiv.org/pdf/1906.01617). In *Proceedings of ACL 2019*.

* Pei Zhang, Boxing Chen, Niyu Ge, and Kai Fan. 2019. [Lattice Transformer for Speech Translation](https://arxiv.org/pdf/1906.05551). In *Proceedings of ACL 2019*. 

* Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, and Colin Raffel. 2019. [Monotonic Infinite Lookback Attention for Simultaneous Machine Translation](https://www.aclweb.org/anthology/P19-1126). In *Proceedings of ACL 2019*.

* Elizabeth Salesky, Matthias Sperber, and Alan W Black. 2019. [Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation](https://www.aclweb.org/anthology/P19-1179). In *Proceedings of ACL 2019*.

* Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, and Haifeng Wang.  2019. [STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://www.aclweb.org/anthology/P19-1289). In *Proceedings of ACL 2019*.

* Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. 2019. [Simultaneous Translation with Flexible Policy via Restricted Imitation Learning](https://www.aclweb.org/anthology/P19-1582). In *Proceedings of ACL 2019*.

* Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel. 2019. [Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation](https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00270). *Transactions of the Association for Computational Linguistics*.

* Baigong Zheng, Renjie Zheng, Mingbo Ma, and Liang Huang. 2019. [Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation](https://arxiv.org/pdf/1909.01559). In *Proceedings of EMNLP 2019*.

* Renjie Zheng, Mingbo Ma, Baigong Zheng, and Liang Huang. 2019. [Speculative Beam Search for Simultaneous Translation](https://arxiv.org/pdf/1909.05421). In *Proceedings of EMNLP 2019*.

* Jiatao Gu, Changhan Wang, Junbo Zhao. 2019. [Levenshtein Transformer](https://papers.nips.cc/paper/9297-levenshtein-transformer). In *Proceedings of NeurIPS 2019*.

* Chengyi Wang, Yu Wu, Shujie Liu, Ming Zhou and Zhenglu Yang. 2020. [Curriculum Pre-training for End-to-End Speech Translation](https://arxiv.org/abs/2004.10093). In *Proceedings of ACL 2020*.

* Elizabeth Salesky and Alan W Black. 2020. [Phone Features Improve Speech Translation](https://arxiv.org/abs/2005.13681). In *Proceedings of ACL 2020*.

* Matthias Sperber and Matthias Paulik. 2020. [Speech Translation and the End-to-End Promise: Taking Stock of Where We Are](https://arxiv.org/abs/2004.06358). In *Proceedings of ACL 2020*.

* Renjie Zheng, Mingbo Ma, Baigong Zheng, Kaibo Liu and Liang Huang. 2020. [Opportunistic Decoding with Timely Correction for Simultaneous Translation](https://arxiv.org/abs/2005.00675). In *Proceedings of ACL 2020*.

* Baigong Zheng, Kaibo Liu, Renjie Zheng, Mingbo Ma, Hairong Liu and Liang Huang. 2020. [Simultaneous Translation Policies: From Fixed to Adaptive](http://arxiv.org/abs/2004.13169). In *Proceedings of ACL 2020*.

* Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu and Hung-yi Lee. 2020. [Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation](https://arxiv.org/abs/2005.10678). In *Proceedings of ACL 2020*.

* Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao QIN, Zhou Zhao and Tie-Yan Liu. 2020. [SimulSpeech: End-to-End Simultaneous Speech to Text Translation](https://www.aclweb.org/anthology/2020.acl-main.350/). In *Proceedings of ACL 2020*.

* Ashkan Alinejad, Anoop Sarkar. 2020. [Effectively pretraining a speech translation decoder with Machine Translation data](https://www.aclweb.org/anthology/2020.emnlp-main.644/). In *Proceedings of EMNLP 2020*.

* Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Haifeng Wang. 2020. [Learning Adaptive Segmentation Policy for Simultaneous Translation](https://www.aclweb.org/anthology/2020.emnlp-main.178/). In *Proceedings of EMNLP 2020*.

* Ozan Caglayan, Julia Ive, Veneta Haralampieva, Pranava Madhyastha, Loïc Barrault, Lucia Specia. 2020. [Simultaneous Machine Translation with Visual Context](https://www.aclweb.org/anthology/2020.emnlp-main.184/). In *Proceedings of EMNLP 2020*.

* Javier Iranzo-Sánchez, Adrià Giménez Pastor, Joan Albert Silvestre-Cerdà, Pau Baquero-Arnal, Jorge Civera Saiz, Alfons Juan. 2020. [Direct Segmentation Models for Streaming Speech Translation](https://www.aclweb.org/anthology/2020.emnlp-main.206/). In *Proceedings of EMNLP 2020*.

Multi-modality


* Julian Hitschler, Shigehiko Schamoni, Stefan Riezler. 2016. [Multimodal Pivots for Image Caption Translation](http://aclweb.org/anthology/P16-1227). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2998317485328832141): 34)

* Lucia Specia, Stella Frank, Khalil Sima'an, and Desmond Elliott. 2016. [A Shared Task on Multimodal Machine Translation and Crosslingual Image Description](http://aclweb.org/anthology/W16-2346). In *Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers*. ([Citation](https://scholar.google.com.hk/scholar?hl=en&as_sdt=2005&sciodt=0,5&cites=10227072007263391757&scipsc=): 47)

* Sergio Rodríguez Guasch, Marta R. Costa-jussà. 2016. [WMT 2016 Multimodal Translation System Description based on Bidirectional Recurrent Neural Networks with Double-Embeddings](http://aclweb.org/anthology/W16-2362). In *Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers*. ([Citation](https://scholar.google.com.hk/scholar?cites=4203794059992068345&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Po-Yao Huang, Frederick Liu, Sz-Rung Shiang, Jean Oh, and Chris Dyer. 2016. [Attention-based Multimodal Neural Machine Translation](https://www.aclweb.org/anthology/W16-2360). In *Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers*. ([Citation](https://scholar.google.com.hk/scholar?cites=3098391471855879500&as_sdt=2005&sciodt=0,5&hl=en): 34)

* Iacer Calixto, Desmond Elliott, and Stella Frank. 2016. [DCU-UvA Multimodal MT **System report**](http://aclweb.org/anthology/W16-2359). In *Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers*. ([Citation](https://scholar.google.com.hk/scholar?cites=13635685318707561524&as_sdt=2005&sciodt=0,5&hl=en): 12)

* Kashif Shah, Josiah Wang, and Lucia Specia. 2016. [SHEF-Multimodal: Grounding Machine Translation on Images](https://aclweb.org/anthology/W16-2363). In *Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers*. ([Citation](https://scholar.google.com/scholar?cites=11223367231679829742&as_sdt=5,39&sciodt=0,39&hl=en): 17)

* Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, and Lucia Specia. 2017. [Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description](http://aclweb.org/anthology/W17-4718). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=268734032292286129&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Iacer Calixto, Qun Liu, and Nick Campbell. 2017. [Doubly-Attentive Decoder for Multi-modal Neural Machine Translation](http://aclweb.org/anthology/P17-1175). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=9882133753270023054&as_sdt=2005&sciodt=0,5&hl=en): 31)

* Jean-Benoit Delbrouck and Stéphane Dupont. 2017. [An empirical study on the effectiveness of images in Multimodal Neural Machine Translation](http://aclweb.org/anthology/D17-1095). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=4462543203996753904&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Iacer Calixto and Qun Liu. 2017. [Incorporating Global Visual Features into Attention-based Neural Machine Translation](http://aclweb.org/anthology/D17-1105). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=6076628072948213440&as_sdt=2005&sciodt=0,5&hl=en): 14) 

* Jason Lee, Kyunghyun Cho, Jason Weston, and Douwe Kiela. 2018. [Emergent Translation in Multi-Agent Communication](https://openreview.net/pdf?id=H1vEXaxA-). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=16875774594076963034&as_sdt=2005&sciodt=0,5&hl=en): 8) 

* Yun Chen, Yang Liu, and Victor O. K. Li. 2018. [Zero-Resource Neural Machine Translation with Multi-Agent Communication Game](https://arxiv.org/pdf/1802.03116). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=13902575159717479954&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Loïc Barrault, Fethi Bougares, Lucia Specia, Chiraag Lala, Desmond Elliott, and Stella Frank. 2018. [Findings of the Third Shared Task on Multimodal Machine Translation](http://aclweb.org/anthology/W18-6402). In *Proceedings of the Third Conference on Machine Translation: Shared Task Papers*. ([Citation](https://scholar.google.com.hk/scholar?cites=1407951263246368352&as_sdt=2005&sciodt=0,5&hl=en): 1)

* John Hewitt, Daphne Ippolito, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya, and Chris Callison-Burch. 2018. [Learning Translations via Images with a Massively Multilingual Image Dataset](http://aclweb.org/anthology/P18-1239). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=8128328221941110465&as_sdt=2005&sciodt=0,5&hl=en): 1) 

* Mingyang Zhou, Runxiang Cheng, Yong Jae Lee, and Zhou Yu. 2018. [A Visual Attention Grounding Neural Model for Multimodal Machine Translation](http://aclweb.org/anthology/D18-1400). In *Proceedings of EMNLP 2018*.

* Desmond Elliott. 2018. [Adversarial Evaluation of Multimodal Machine Translation](http://aclweb.org/anthology/D18-1329). In *Proceedings of EMNLP 2018*.

* Ozan Caglayan, Pranava Madhyastha, Lucia Specia, and Loïc Barrault. 2019. [Probing the Need for Visual Context in Multimodal Machine Translation](https://arxiv.org/pdf/1903.08678.pdf). In *Proceedings of NAACL 2019*. 

* Iacer Calixto, Miguel Rios, and Wilker Aziz. 2019. [Latent Variable Model for Multi-modal Translation](https://arxiv.org/pdf/1811.00357). In *Proceedings of ACL 2019*.

* Julia Ive, Pranava Madhyastha, and Lucia Specia. 2019. [Distilling Translations with Visual Awareness](https://arxiv.org/pdf/1906.07701). In *Proceedings of ACL 2019*.

* Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao. 2020. [Neural Machine Translation with Universal Visual Representation](https://openreview.net/forum?id=Byl8hhNYPS). In *Proceedings of ICLR 2020*.

* Po-Yao Huang, Junjie Hu, Xiaojun Chang and Alexander Hauptmann. 2020. [Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting](http://arxiv.org/abs/2005.03119). In *Proceedings of ACL 2020*.

* Shu Okabe, Frédéric Blain, and Lucia Specia. 2020. [Multimodal Quality Estimation for Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.114/). In *Proceedings of ACL 2020*.

* Shaowei Yao and Xiaojun Wan. 2020. [Multimodal Transformer for Multimodal Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.400/). In *Proceedings of ACL 2020*.

* Shuo Sun, Francisco Guzmán and Lucia Specia. 2020. [Are we Estimating or Guesstimating Translation Quality?](https://www.aclweb.org/anthology/2020.acl-main.558/). In *Proceedings of ACL 2020*.

* Ozan Caglayan, Julia Ive, Veneta Haralampieva, Pranava Madhyastha, Loïc Barrault, Lucia Specia. 2020. [Simultaneous Machine Translation with Visual Context](https://www.aclweb.org/anthology/2020.emnlp-main.184/). In *Proceedings of EMNLP 2020*.

Ensemble and Reranking


* Ekaterina Garmash, and Christof Monz. 2016. [Ensemble Learning for Multi-Source Neural Machine Translation](http://aclweb.org/anthology/C16-1133). In *Proceedings of COLING 2016*. ([Citation](https://scholar.google.com/scholar?cites=10720572689338720536&as_sdt=2005&sciodt=0,5&hl=en): 18)

* Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. [Neural System Combination for Machine Translation](http://aclweb.org/anthology/P17-2060). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=2547807449547851378&as_sdt=2005&sciodt=0,5&hl=en): 21)

* Jiaji Huang, Yi Li, Wei Ping, and Liang Huang. 2018. [Large Margin Neural Language Model](http://aclweb.org/anthology/D18-1150). In *Proceedings of EMNLP 2018*.

* Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2019. [Mixture Models for Diverse Machine Translation: Tricks of the Trade](http://proceedings.mlr.press/v97/shen19c/shen19c.pdf). In *Proceedings of ICML 2019*.

* Yiren Wang, Lijun Wu, Yingce Xia, Tao Qin, ChengXiang Zhai and Tie-Yan Liu. 2020. [Transductive Ensemble Learning for Neural Machine Translation](https://publish.illinois.edu/yirenwang/). In *Proceedings of AAAI 2020*.

Domain Adaptation


* Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. [An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation](http://aclweb.org/anthology/P17-2061). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=11154619650853156425&as_sdt=2005&sciodt=0,5&hl=en): 40)

* Rui Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2017. [Sentence Embedding for Neural Machine Translation Domain Adaptation](http://aclweb.org/anthology/P17-2089). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=12026801731726213856&as_sdt=2005&sciodt=0,5&hl=en): 8)

* Boxing Chen, Colin Cherry, George Foster, and Samuel Larkin. 2017. [Cost Weighting for Neural Machine Translation Domain Adaptation](http://aclweb.org/anthology/W17-3205). In *Proceedings of the First Workshop on Neural Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=11511062396100603245&as_sdt=2005&sciodt=0,5&hl=en): 10)

* Reid Pryzant and Denny Britz. 2017. [Effective Domain Mixing for Neural Machine Translation](http://aclweb.org/anthology/W17-4712). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=5830143292179945460&as_sdt=2005&sciodt=0,5&hl=en): 6)

* Mara Chinea-Rios, Álvaro Peris and Francisco Casacuberta. 2017. [Adapting Neural Machine Translation with Parallel Synthetic Data](http://aclweb.org/anthology/W17-4714). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=14166012599677352590&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. [Instance Weighting for Neural Machine Translation Domain Adaptation](http://aclweb.org/anthology/D17-1155). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=11790197905041828318&as_sdt=2005&sciodt=0,5&hl=en): 13)

* Antonio Valerio Miceli Barone, Barry Haddow, Ulrich Germann, and Rico Sennrich. 2017. [Regularization techniques for fine-tuning in neural machine translation](http://aclweb.org/anthology/D17-1156). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=10429379661740278678&as_sdt=2005&sciodt=0,5&hl=en): 6)

* David Vilar. 2018. [Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models](http://aclweb.org/anthology/N18-2080). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=5262017870970882749&as_sdt=2005&sciodt=0,5&hl=en): 2)   

* Paul Michel and Graham Neubig. 2018. [Extreme Adaptation for Personalized Neural Machine Translation](http://aclweb.org/anthology/P18-2050). In *Proceedings for ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=16717798879574507487&as_sdt=2005&sciodt=0,5&hl=en): 6)    

* Shiqi Zhang and Deyi Xiong. 2018. [Sentence Weighting for Neural Machine Translation Domain Adaptation](http://aclweb.org/anthology/C18-1269). In *Proceedings of COLING 2018*.   

* Chenhui Chu and Rui Wang. 2018. [A Survey of Domain Adaptation for Neural Machine Translation](http://aclweb.org/anthology/C18-1111). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=12774117070156464640&as_sdt=2005&sciodt=0,5&hl=en): 7)

* Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, and Jianqiang Zhao. 2018. [Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination](http://aclweb.org/anthology/D18-1041). In *Proceedings of EMNLP 2018*.

* Graham Neubig and Junjie Hu. 2018. [Rapid Adaptation of Neural Machine Translation to New Languages](http://aclweb.org/anthology/D18-1103). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=18133973017615911986&as_sdt=2005&sciodt=0,5&hl=en): 4)  

* Shuhao Gu, Yang Feng, and Qun Liu. 2019. [Improving Domain Adaptation Translation with Domain Invariant and Specific Information](https://arxiv.org/pdf/1904.03879.pdf). In *Proceedings of NAACL 2019*.

* Ankur Bapna and Orhan Firat. 2019. [Non-Parametric Adaptation for Neural Machine Translation](https://arxiv.org/pdf/1903.00058.pdf). In *Proceedings of NAACL 2019*. 

* Junjie Hu, Mengzhou Xia, Graham Neubig, and Jaime Carbonell. 2019. [Domain Adaptation of Neural Machine Translation by Lexicon Induction](https://arxiv.org/pdf/1906.00376). In *Proceedings of ACL 2019*.

* Danielle Saunders, Felix Stahlberg, Adria de Gispert, and Bill Byrne. 2019. [Domain Adaptive Inference for Neural Machine Translation](https://arxiv.org/pdf/1906.00408). In *Proceedings of ACL 2019*. 

* Zi-Yi Dou, Junjie Hu, Antonios Anastasopoulos, and Graham Neubig. 2019. [Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings](https://arxiv.org/pdf/1908.10430.pdf). In *Proceedings of EMNLP 2019*. 

* Ankur Bapna, Naveen Arivazhagan, and Orhan Firat. 2019. [Simple, Scalable Adaptation for Neural Machine Translation](https://arxiv.org/pdf/1909.08478). In *Proceedings of EMNLP 2019*. 

* Jiali Zeng, Yang Liu, jinsong su, yubing Ge, Yaojie Lu, Yongjing Yin and jiebo luo. 2019. [Iterative Dual Domain Adaptation for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1078.pdf). In *Proceedings of EMNLP 2019*. 

* Wei Wang, Ye Tian, Jiquan Ngiam, Yinfei Yang, Isaac Caswell and Zarana Parekh. 2020. [Learning a Multi-Domain Curriculum for Neural Machine Translation](https://arxiv.org/abs/1908.10940). In *Proceedings of ACL 2020*.

* Chaojun Wang and Rico Sennrich. 2020. [On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation](https://arxiv.org/abs/2005.03642). In *Proceedings of ACL 2020*.

* Haoming Jiang, Chen Liang, Chong Wang and Tuo Zhao. 2020. [Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing](https://www.aclweb.org/anthology/2020.acl-main.165/). In *Proceedings of ACL 2020*.

* Anna Currey, Prashant Mathur, Georgiana Dinu. 2020. [Distilling Multiple Domains for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.364/). In *Proceedings of EMNLP 2020*.

* Haoyue Shi , Luke  Zettlemoyer , Sida I. Wang . 2021. [Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment](https://aclanthology.org/2021.acl-long.67/) . In *Proceedings of ACL 2021*.

Quality Estimation


* Julia Kreutzer, Shigehiko Schamoni, Stefan Riezler. 2015. [Quality Estimation from Scratch (QUETCH): Deep Learning for Word-Level Translation Quality Estimation](http://www.aclweb.org/anthology/W15-3037). In *Proceedings of the Tenth Workshop on Statistical Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=2308754825624963103&as_sdt=2005&sciodt=0,5&hl=en): 24)

* Hyun Kim and Jong-Hyeok Lee. 2016. [A Recurrent Neural Networks Approach for Estimating the Quality of Machine Translation Output](http://aclweb.org/anthology/N16-1059). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=830241254846777269&as_sdt=2005&sciodt=0,5&hl=en): 11)  

* Hyun Kim and Jong-Hyeok Lee, Seung-Hoon Na. 2017. [Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation](http://aclweb.org/anthology/W17-4763). In *Proceedings of WMT 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=14077676925816230812&as_sdt=2005&sciodt=0,5&hl=en): 10)

* Osman Baskaya, Eray Yildiz, Doruk Tunaoglu, Mustafa Tolga Eren, and A. Seza Doğruöz. 2017. [Integrating Meaning into Quality Evaluation of Machine Translation](http://aclweb.org/anthology/E17-1020). In *Proceedings of EACL 2017*. 

* Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, and Carolina Scarton. 2017. [Improving Evaluation of Document-level Machine Translation Quality Estimation](http://aclweb.org/anthology/E17-2057). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=13409644842476040211&as_sdt=2005&sciodt=0,5&hl=en): 1)    

* Rico Sennrich. 2017. [How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs](http://aclweb.org/anthology/E17-2060). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=14294900718072928557&as_sdt=2005&sciodt=0,5&hl=en): 25)  

* Pierre Isabelle, Colin Cherry, and George Foster. 2017. [A Challenge Set Approach to Evaluating Machine Translation](http://aclweb.org/anthology/D17-1263). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=10744403566307443052&as_sdt=2005&sciodt=0,5&hl=en): 26) 

* André F.T. Martins, Marcin Junczys-Dowmunt, Fabio N. Kepler, Ramón Astudillo, Chris Hokamp, and Roman Grundkiewicz. 2017. [Pushing the Limits of Translation Quality Estimation](http://aclweb.org/anthology/Q17-1015). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=17497507120611954135&as_sdt=2005&sciodt=0,5&hl=en): 13) 

* Maoxi Li, Qingyu Xiang, Zhiming Chen, and Mingwen Wang. 2018. [A Unified Neural Network for Quality Estimation of Machine Translation](https://www.jstage.jst.go.jp/article/transinf/E101.D/9/E101.D_2018EDL8019/_article/-char/en). *IEICE Transactions on Information and Systems*. ([Citation](https://scholar.google.com.hk/scholar?cites=17497507120611954135&as_sdt=2005&sciodt=0,5&hl=en): 13)   

* Lucia Specia, Frédéric Blain, Varvara Logacheva, Ramón F. Astudillo, and André Martins. 2018. [Findings of the WMT 2018 Shared Task on Quality Estimation](http://aclweb.org/anthology/W18-6451). In *Proceedings of the Third Conference on Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=11225823265419143916&as_sdt=2005&sciodt=0,5&hl=en): 2)   

* Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, and Graham Neubig. 2018. [Automatic Estimation of Simultaneous Interpreter Performance](http://aclweb.org/anthology/P18-2105). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?hl=en&as_sdt=2005&sciodt=0,5&cites=5687670489913511293&scipsc=): 1) 

* Holger Schwenk. 2018. [Filtering and Mining Parallel Data in a Joint Multilingual Space](http://aclweb.org/anthology/P18-2037). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=7363119514762721542&as_sdt=2005&sciodt=0,5&hl=en): 4) 

* Julia Ive, Frédéric Blain, and Lucia Specia. 2018. [deepQuest: A Framework for Neural-based Quality Estimation](http://aclweb.org/anthology/C18-1266). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=4501237247493636014&as_sdt=2005&sciodt=0,5&hl=en): 1)  

* Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, and Luo Si. 2019. ["Bilingual Expert" Can Find Translation Errors](https://arxiv.org/pdf/1807.09433). In *Proceedings of AAAI 2019*.

* Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat and Karthik Raman. 2020. [Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation](https://arxiv.org/abs/1909.00437). In *Proceedings of AAAI 2020*.

* Shu Okabe, Frédéric Blain, and Lucia Specia. 2020. [Multimodal Quality Estimation for Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.114/). In *Proceedings of ACL 2020*.

* Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, and Lucia Specia. 2020. [Unsupervised Quality Estimation for Neural Machine Translation](https://arxiv.org/abs/2005.10608). *Transactions of the Association for Computational Linguistics*.

* Jingyi Zhang, Josef van Genabith. 2020. [Translation Quality Estimation by Jointly Learning to Score and Rank](https://www.aclweb.org/anthology/2020.emnlp-main.205/). In *Proceedings of EMNLP 2020*.

* Vania Mendonca , Ricardo Rei , Lu´ısa Coheur , Alberto Sardinha , Ana Lucia Santos . 2021 . [Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort](https://aclanthology.org/2021.acl-long.242.pdf) . In *Proceedings of ACL 2021*.

Human-centered NMT


Interactive NMT


* Joern Wuebker, Spence Green, John DeNero, Saša Hasan and Minh-Thang Luong. 2016. [Models and Inference for Prefix-Constrained Machine Translation](http://aclweb.org/anthology/P16-1007). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=6217828709297735294&as_sdt=2005&sciodt=0,5&hl=es): 14)

* Rebecca Knowles and Philipp Koehn. 2017. [Neural Interactive Translation Prediction](https://www.cs.jhu.edu/~phi/publications/neural-interactive-translation.pdf). In *Proceedings of AMTA 2016*. ([Citation](https://scholar.google.es/scholar?cites=16855799109441363843&as_sdt=2005&sciodt=0,5&hl=es): 24)

* Álvaro Peris, Miguel Domingo and Francisco Casacuberta. 2017. [Interactive neural machine translation](https://www.researchgate.net/publication/312275926). In *Computer Speech and Language*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=es&cites=2848232799976037224&as_sdt=5): 21)

* Khanh Nguyen, Hal Daumé III, and Jordan Boyd-Graber. 2017. [Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback](http://aclweb.org/anthology/D17-1153). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=15247143946986909844&as_sdt=2005&sciodt=0,5&hl=en): 11) 

* Álvaro Peris and Francisco Casacuberta. 2018. [Active Learning for Interactive Neural Machine Translation of Data Streams](http://aclweb.org/anthology/K18-1015). In *Proceedings of CoNLL 2018*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=es&cites=14996862010471139834&as_sdt=5): 1)

* Tsz Kin Lam, Julia Kreutzer, and Stefan Riezler. 2018. [A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation](https://arxiv.org/pdf/1805.01553). In *Proceedings of EAMT 2018*.

* Julia Kreutzer, Shahram Khadivi, Evgeny Matusov, Stefan Riezler. 2018. [Can Neural Machine Translation be Improved with User Feedback?](http://aclweb.org/anthology/N18-3012). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5878376279798739633): 3).

* Pavel Petrushkov, Shahram Khadivi and Evgeny Matusov. 2018. [Learning from Chunk-based Feedback in Neural Machine Translation](http://aclweb.org/anthology/P18-2052). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=11022197412542590938&as_sdt=2005&sciodt=0,5&hl=en): 1) 

* Julia Kreutzer, Joshua Uyheng, and Stefan Riezler. 2018. [Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning](http://aclweb.org/anthology/P18-1165). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=13544384067638756323&as_sdt=2005&sciodt=0,5&hl=en): 2) 

* Álvaro Peris and Francisco Casacuberta. 2019. [A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks](https://arxiv.org/pdf/1905.08181). In *Proceedings of ACL 2019*.

* Miguel Domingo, Mercedes García-Martínez, Amando Estela, Laurent Bié, Alexandre Helle, Álvaro Peris, Francisco Casacuberta, and Manuerl Herranz. 2019. [Demonstration of a Neural Machine Translation System with Online Learning for Translators](https://arxiv.org/pdf/1906.09000). In *Proceedings of ACL 2019*.

* Julia Kreutzer and Stefan Riezler. 2019. [Self-Regulated Interactive Sequence-to-Sequence Learning](https://arxiv.org/pdf/1907.05190.pdf). In *Proceedings of ACL 2019*.

Automatic Post-Editing


* Santanu Pal, Sudip Kumar Naskar, Mihaela Vela, and Josef van Genabith. 2016. [A neural network based approach to automatic post-editing](http://aclweb.org/anthology/P16-2046). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=12283909725778804406&as_sdt=2005&sciodt=0,5&hl=en): 14)  

* Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2016. [Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing](http://aclweb.org/anthology/W16-2378). In *Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers*. ([Citation](https://scholar.google.com.hk/scholar?cites=8379495332607620604&as_sdt=2005&sciodt=0,5&hl=en): 27)  

* Santanu Pal, Sudip Kumar Naskar, Mihaela Vela, Qun Liu, and Josef van Genabith. 2017. [Neural Automatic Post-Editing Using Prior Alignment and Reranking](http://aclweb.org/anthology/E17-2056). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=17137949386428082191&as_sdt=2005&sciodt=0,5&hl=en): 11) 

* Rajen Chatterjee, Gebremedhen Gebremelak, Matteo Negri, and Marco Turchi. 2017. [Online Automatic Post-editing for MT in a Multi-Domain Translation Environment](http://aclweb.org/anthology/E17-1050). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=16624698279716802422&as_sdt=2005&sciodt=0,5&hl=en): 1) 

* Marcin Junczys-Dowmunt, Roman Grundkiewicz. 2017. [An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing](http://aclweb.org/anthology/I17-1013). In *Proceedings of IJCNLP 2017*.

* David Grangier and Michael Auli. 2018. [QuickEdit: Editing Text & Translations by Crossing Words Out](http://aclweb.org/anthology/N18-1025). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9500777791162222168&as_sdt=2005&sciodt=0,5&hl=en): 1) 

* Thuy-Trang Vu and Gholamreza Haffari. 2018. [Automatic Post-Editing of Machine Translation: A Neural Programmer-Interpreter Approach](http://aclweb.org/anthology/D18-1341). In *Proceedings of EMNLP 2018*.

* Marcin Junczys-Dowmunt, Roman Grundkiewicz. 2018. [MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing](https://arxiv.org/pdf/1809.00188.pdf). In *Proceedings of WMT 2018*.

* Gonçalo M. Correia and André F. T. Martins. 2019. [A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning](https://arxiv.org/pdf/1906.06253). In *Proceedings of ACL 2019*.

* Xuancheng Huang, Yang Liu, Huanbo Luan, Jingfang Xu, Maosong Sun. 2019. [Learning to Copy for Automatic Post-Editing](https://www.aclweb.org/anthology/D19-1634.pdf). In *Proceedings of EMNLP 2019*.

* Nico Herbig, Tim Düwel, Santanu Pal, Kalliopi Meladaki, Mahsa Monshizadeh, Antonio Krüger and Josef van Genabith. 2020. [MMPE: A Multi-Modal Interface for Post-Editing Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.155/). In *Proceedings of ACL 2020*.

* Shamil Chollampatt, Raymond Hendy Susanto, Liling Tan, Ewa Szymanska. 2020. [Can Automatic Post-Editing Improve NMT?](https://www.aclweb.org/anthology/2020.emnlp-main.217/). In *Proceedings of EMNLP 2020*.

Poetry Translation


* Marjan Ghazvininejad, Yejin Choi, and Kevin Knight. 2018. [Neural Poetry Translation](http://aclweb.org/anthology/N18-2011). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=4597758342230970450&as_sdt=2005&sciodt=0,5&hl=en): 1)

Eco-friendly


* Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. [Energy and Policy Considerations for Deep Learning in NLP](https://www.aclweb.org/anthology/P19-1355). In *Proceedings of ACL 2019*.

Compositional Generalization


* Yafu Li, Yongjing Yin, Yulong Chen, Yue Zhang. 2021. [On Compositional Generalization of Neural Machine Translation](https://aclanthology.org/2021.acl-long.368/). In *Proceedings of ACL 2021*.  

* Verna Dankers, Elia Bruni, Dieuwke Hupkes. 2022. [The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study](https://aclanthology.org/2022.acl-long.286/). In *Proceedings of ACL 2022*.  

* Hao Zheng, Mirella Lapata. 2022. [Disentangled Sequence to Sequence Learning for Compositional Generalization](https://aclanthology.org/2022.acl-long.293/). In *Proceedings of ACL 2022*. 

* Verna Dankers, Christopher Lucas, Ivan Titov. 2022. [Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation](https://aclanthology.org/2022.acl-long.252/). In *Proceedings of ACL 2022*. 

Endangered Language Revitalization


* Shiyue Zhang, Benjamin Frey, Mohit Bansal. 2020. [ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization](https://www.aclweb.org/anthology/2020.emnlp-main.43/). In *Proceedings of EMNLP 2020*.

* Tahmid Hasan, Abhik Bhattacharjee, Kazi Samin, Masum Hasan, Madhusudan Basak, M. Sohel Rahman, Rifat Shahriyar. 2020. [Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.207/). In *Proceedings of EMNLP 2020*.

Word Translation


* Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013. [Exploiting Similarities among Languages for Machine Translation](https://arxiv.org/pdf/1309.4168.pdf). *arxiv:1309.4168*. ([Citation](https://scholar.google.com.hk/scholar?cites=18389495985810631724&as_sdt=2005&sciodt=0,5&hl=en): 581) 

* Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. 2015. [Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation](http://aclweb.org/anthology/N15-1104). In *Proceedings of NAACL 2015*. ([Citation](https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=2005&sciodt=0,5&cites=4009320309746318198&scipsc=): 89)

* Georgiana Dinu, Angeliki Lazaridou, and Marco Baroni. 2015. [Improving Zero-shot Learning by Mitigating the Hubness Problem](https://arxiv.org/pdf/1412.6568.pdf). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com.hk/scholar?cites=4810137765860435505&as_sdt=2005&sciodt=0,5&hl=en): 110) 

* Meng Zhang, Yang Liu, Huanbo Luan, Maosong Sun, Tatsuya Izuha, and Jie Hao. 2016. [Building Earth Mover's Distance on Bilingual Word Embeddings for Machine Translation](http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/12227/12035). In *Proceedings of AAAI 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=10787724557107708547&as_sdt=2005&sciodt=0,5&hl=en): 11) 

* Meng Zhang, Yang Liu, Huanbo Luan, Yiqun Liu, and Maosong Sun. 2016. [Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover's Distance Regularization](http://aclweb.org/anthology/C16-1300). In *Proceedings of COLING 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=7442971885961632428&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Ivan Vulić and Anna Korhonen. [On the Role of Seed Lexicons in Learning Bilingual Word Embeddings](http://www.aclweb.org/anthology/P16-1024). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=9848186834020452809&as_sdt=2005&sciodt=0,5&hl=en): 39)

* Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. [Learning principled bilingual mappings of word embeddings while preserving monolingual invariance](http://www.aclweb.org/anthology/D16-1250). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=5308709105842309671&as_sdt=2005&sciodt=0,5&hl=en): 73)

* Meng Zhang, Haoruo Peng, Yang Liu, Huanbo Luan, and Maosong Sun. [Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision](http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14682/14264). In *Proceedings of AAAI 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=6351287463037630922&as_sdt=2005&sciodt=0,5&hl=en): 11)

* Ann Irvine and Chris Callison-Burch. 2017. [A Comprehensive Analysis of Bilingual Lexicon Induction](http://aclweb.org/anthology/J17-2001). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=9284068492500255032&as_sdt=2005&sciodt=0,5&hl=en): 12)

* Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. [Learning Bilingual Word Embeddings with (Almost) No Bilingual Data](http://aclweb.org/anthology/P17-1042). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=17614535864871662614&as_sdt=2005&sciodt=0,5&hl=en): 62)

* Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. [Adversarial Training for Unsupervised Bilingual Lexicon Induction](http://aclweb.org/anthology/P17-1179). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=1858752500147406961): 41)

* Geert Heyman, Ivan Vulić, and Marie-Francine Moens. 2017. [Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations](http://aclweb.org/anthology/E17-1102). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=585284476576929954&as_sdt=2005&sciodt=0,5&hl=en): 9)

* Bradley Hauer, Garrett Nicolai, and Grzegorz Kondrak. 2017. [Bootstrapping Unsupervised Bilingual Lexicon Induction](http://aclweb.org/anthology/E17-2098). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=12378647251883332742&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Yunsu Kim, Julian Schamper, and Hermann Ney. 2017. [Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes](http://aclweb.org/anthology/E17-2103). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=10713109281510942659): 1) 

* Derry Tanti Wijaya, Brendan Callahan, John Hewitt, Jie Gao, Xiao Ling, Marianna Apidianaki, and Chris Callison-Burch. 2017. [Learning Translations via Matrix Completion](http://aclweb.org/anthology/D17-1152). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=9020955741604455257&as_sdt=2005&sciodt=0,5&hl=en): 3) 

* Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. [Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction](http://aclweb.org/anthology/D17-1207). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=8228362677106813515&as_sdt=2005&sciodt=0,5&hl=en): 26)

* Ndapandula Nakashole and Raphael Flauger. 2017. [Knowledge Distillation for Bilingual Dictionary Induction](http://aclweb.org/anthology/D17-1264). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=1036105547945298329&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Hanan Aldarmaki, Mahesh Mohan, and Mona Diab. 2018. [Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings](http://aclweb.org/anthology/Q18-1014). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=4781812228167043431&as_sdt=2005&sciodt=0,5&hl=en): 5)

* Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. [Word Translation without Parallel Data](https://openreview.net/pdf?id=H196sainb). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com/scholar?cites=8622718096243524923&as_sdt=2005&sciodt=0,5&hl=en): 11)

* Fabienne Braune, Viktor Hangya, Tobias Eder, and Alexander Fraser. 2018. [Evaluating Bilingual Word Embeddings on the Long Tail](http://aclweb.org/anthology/N18-2030). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=1773448771543494989&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Ndapa Nakashole and Raphael Flauger. 2018. [Characterizing Departures from Linearity in Word Translation](http://aclweb.org/anthology/P18-2036). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=669635789435605162&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Anders Søgaard, Sebastian Ruder, and Ivan Vulić. 2018. [On the Limitations of Unsupervised Bilingual Dictionary Induction](http://aclweb.org/anthology/P18-1072). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=1427533216601294786&as_sdt=2005&sciodt=0,5&hl=en): 17)

* Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. [A Robust Self-learning Method for Fully Unsupervised Cross-lingual Mappings of Word Embeddings](http://aclweb.org/anthology/P18-1073). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=7012967033921106213&as_sdt=2005&sciodt=0,5&hl=en): 17)

* Parker Riley and Daniel Gildea. 2018. [Orthographic Features for Bilingual Lexicon Induction](http://aclweb.org/anthology/P18-2062). In *Proceedings of ACL 2018*.

* Amir Hazem and Emmanuel Morin. 2018. [Leveraging Meta-Embeddings for Bilingual Lexicon Extraction from Specialized Comparable Corpora](http://aclweb.org/anthology/C18-1080). In *Proceedings of COLING 2018*.

* Lifu Huang, Kyunghyun Cho, Boliang Zhang, Heng Ji, and Kevin Knight. 2018. [Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding](http://aclweb.org/anthology/D18-1023). In *Proceedings of EMNLP 2018*.

* Xilun Chen and Claire Cardie. 2018. [Unsupervised Multilingual Word Embeddings](http://aclweb.org/anthology/D18-1024). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=15847135808149408064&as_sdt=2005&sciodt=0,5&hl=en): 4)

* Ta Chung Chi and Yun-Nung Chen. 2018. [CLUSE: Cross-Lingual Unsupervised Sense Embeddings](http://aclweb.org/anthology/D18-1025). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=1931895311858153391&as_sdt=2005&sciodt=0,5&hl=en): 1)

* Yerai Doval, Jose Camacho-Collados, Luis Espinosa Anke, and Steven Schockaert. 2018. [Improving Cross-Lingual Word Embeddings by Meeting in the Middle](http://aclweb.org/anthology/D18-1027). In *Proceedings of EMNLP 2018*. 

* Sebastian Ruder, Ryan Cotterell, Yova Kementchedjhieva, and Anders Søgaard. 2018. [A Discriminative Latent-Variable Model for Bilingual Lexicon Induction](http://aclweb.org/anthology/D18-1042). In *Proceedings of EMNLP 2018*.

* Yedid Hoshen and Lior Wolf. 2018. [Non-Adversarial Unsupervised Word Translation](http://aclweb.org/anthology/D18-1043). In *Proceedings of EMNLP 2018*.

* Ndapa Nakashole. 2018. [NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings](http://aclweb.org/anthology/D18-1047). In *Proceedings of EMNLP 2018*.

* Mareike Hartmann, Yova Kementchedjhieva, and Anders Søgaard. 2018. [Why is unsupervised alignment of English embeddings from different algorithms so hard?](http://aclweb.org/anthology/D18-1056). In *Proceedings of EMNLP 2018*.

* Zi-Yi Dou, Zhi-Hao Zhou, and Shujian Huang. 2018. [Unsupervised Bilingual Lexicon Induction via Latent Variable Models](http://aclweb.org/anthology/D18-1062). In *Proceedings of EMNLP 2018*.

* Tanmoy Mukherjee, Makoto Yamada, and Timothy Hospedales. 2018. [Learning Unsupervised Word Translations Without Adversaries](http://aclweb.org/anthology/D18-1063). In *Proceedings of EMNLP 2018*.

* David Alvarez-Melis and Tommi Jaakkola. 2018. [Gromov-Wasserstein Alignment of Word Embedding Spaces](http://aclweb.org/anthology/D18-1214). In *Proceedings of EMNLP 2018*.

* Ruochen Xu, Yiming Yang, Naoki Otani, and Yuexin Wu. 2018. [Unsupervised Cross-lingual Transfer of Word Embedding Spaces](http://aclweb.org/anthology/D18-1268). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=15320274773511615227): 2)

* Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, and Edouard Grave. 2018. [Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion](http://aclweb.org/anthology/D18-1330). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=437763240249389525&as_sdt=2005&sciodt=0,5&hl=en): 2)

* Sebastian Ruder, Ivan Vulić, and Anders Søgaard. 2019. [A Survey Of Cross-lingual Word Embedding Models](https://arxiv.org/pdf/1706.04902.pdf). *Journal of Artificial Intelligence Research*. ([Citation](https://scholar.google.com/scholar?cites=2174368482827457639&as_sdt=2005&sciodt=0,5&hl=en): 22)

* Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. 2019. [Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach](https://arxiv.org/pdf/1808.08773). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=3887586742254907953&as_sdt=2005&sciodt=0,5&hl=en): 3)

* Tasnim Mohiuddin and Shafiq Joty. 2019. [Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training](https://arxiv.org/pdf/1904.04116.pdf). In *Proceedings of NAACL 2019*.

* Chunting Zhou, Xuezhe Ma, Di Wang, and Graham Neubig. 2019. [Density Matching for Bilingual Word Embedding](https://arxiv.org/pdf/1904.02343.pdf). In *Proceedings of NAACL 2019*.

* Noa Yehezkel Lubin, Jacob Goldberger, and Yoav Goldberg. 2019. [Aligning Vector-spaces with Noisy Supervised Lexicons](https://arxiv.org/pdf/1903.10238.pdf). In *Proceedings of NAACL 2019*. 

* Tal Schuster, Ori Ram, Regina Barzilay, and Amir Globerson. 2019. [Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing](https://arxiv.org/pdf/1902.09492.pdf). In *Proceedings of NAACL 2019*.  

* Hanan Aldarmaki and Mona Diab. 2019. [Context-Aware Cross-Lingual Mapping](https://arxiv.org/pdf/1903.03243.pdf). In *Proceedings of NAACL 2019*.  

* Yoshinari Fujinuma, Jordan Boyd-Graber, and Michael J. Paul. 2019. [A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity](https://arxiv.org/pdf/1906.01926). In *Proceedings of ACL 2019*.

* Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, and Jordan Boyd-Graber. 2019. [Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization](https://arxiv.org/pdf/1906.01622). In *Proceedings of ACL 2019*.

* Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, and Eneko Agirre. 2019. [Analyzing the Limitations of Cross-lingual Word Embedding Mappings](https://arxiv.org/pdf/1906.05407). In *Proceedings of ACL 2019*.

* Takashi Wada, Tomoharu Iwata, and Yuji Matsumoto. 2019. [Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models](https://www.aclweb.org/anthology/P19-1300). In *Proceedings of ACL 2019*.

* Pengcheng Yang, Fuli Luo, Peng Chen, Tianyu Liu, and Xu Sun. 2019. [MAAM: A Morphology-Aware Alignment Model for Unsupervised Bilingual Lexicon Induction](https://www.aclweb.org/anthology/P19-1308). In *Proceedings of ACL 2019*.

* Benjamin Marie and Atsushi Fujita. 2019. [Unsupervised Joint Training of Bilingual Word Embeddings](https://www.aclweb.org/anthology/P19-1312). In *Proceedings of ACL 2019*.

* Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. [Bilingual Lexicon Induction through Unsupervised Machine Translation](https://www.aclweb.org/anthology/P19-1494). In *Proceedings of ACL 2019*.

* Elias Stengel-Eskin, Tzu-Ray Su, Matt Post, and Benjamin Van Durme. 2019. [A Discriminative Neural Model for Cross-Lingual Word Alignment](https://arxiv.org/pdf/1909.00444). In *Proceedings of EMNLP 2019*.

* Ivan Vulić, Goran Glavaš, Roi Reichart, and Anna Korhonen. 2019. [Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?](https://arxiv.org/pdf/1909.01638). In *Proceedings of EMNLP 2019*.

* Paula Czarnowska, Sebastian Ruder, Edouard Grave, Ryan Cotterell, and Ann Copestake. 2019. [Don't Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction](https://arxiv.org/pdf/1909.02855). In *Proceedings of EMNLP 2019*.

* Yova Kementchedjhieva, Mareike Hartmann, and Anders Søgaard. 2019. [Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction](https://arxiv.org/pdf/1909.05708). In *Proceedings of EMNLP 2019*.

* Mareike Hartmann, Yova Kementchedjhieva, Anders Søgaard. 2019. [Comparing Unsupervised Word Translation Methods Step by Step](https://papers.nips.cc/paper/8836-comparing-unsupervised-word-translation-methods-step-by-step). In *Proceedings of NeurIPS 2019*. 

* Steven Cao, Nikita Kitaev, Dan Klein. 2020. [Multilingual Alignment of Contextual Word Representations](https://openreview.net/forum?id=r1xCMyBtPS). In *Proceedings of ICLR 2020*. 

* Gábor Berend. 2020. [Massively Multilingual Sparse Word Representations](https://openreview.net/forum?id=HyeYTgrFPB). In *Proceedings of ICLR 2020*.

* Xu Zhao, Zihao Wang, Yong Zhang and Hao Wu. 2020. [A Relaxed Matching Procedure for Unsupervised BLI](https://www.aclweb.org/anthology/2020.acl-main.274/). In *Proceedings of ACL 2020*.

* Mladen Karan, Ivan Vulić, Anna Korhonen and Goran Glavaš. 2020. [Classification-Based Self-Learning for Weakly Supervised Bilingual Lexicon Induction](https://www.aclweb.org/anthology/2020.acl-main.618/). In *Proceedings of ACL 2020*.

* Shuo Ren, Shujie Liu, Ming Zhou and Shuai Ma. 2020. [A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction](https://www.aclweb.org/anthology/2020.acl-main.318/). In *Proceedings of ACL 2020*.

* Pratik Jawanpuria, Mayank Meghwanshi and Bamdev Mishra. 2020. [Geometry-aware domain adaptation for unsupervised alignment of word embeddings](https://www.aclweb.org/anthology/2020.acl-main.276/). In *Proceedings of ACL 2020*.

* Tasnim Mohiuddin, M Saiful Bari, Shafiq Joty. 2020. [LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space](https://www.aclweb.org/anthology/2020.emnlp-main.215/). In *Proceedings of EMNLP 2020*.

WMT Winners


[WMT](http://www.statmt.org/wmt19/) is the most important annual international competition on machine translation. We collect the [competition results](http://matrix.statmt.org) on the news translation task since WMT 2016 (the First Conference of Machine Translation) and summarize the techniques used in the systems with the top performance. Currently, we focus on four directions: ZH-EN, EN-ZH, DE-EN, and EN-DE. The summarized algorithms might be incomplete; your suggestions are welcome!

WMT 2019


* The winner of [ZH-EN](http://matrix.statmt.org/matrix/systems_list/1901), [DE-EN](http://matrix.statmt.org/matrix/systems_list/1902) and [EN-DE](http://matrix.statmt.org/matrix/systems_list/1909): **Microsoft**

    * **System report**: Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Di He, Weicong Chen, Yang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, Tao Qin and Tie-Yan Liu. 2019. [Microsoft Research Asia’s Systems for WMT19](http://www.statmt.org/wmt19/pdf/WMT0048.pdf). In *Proceedings of the Fourth Conference on Machine Translation: Shared Task Papers*.

    * **News**: [Microsoft Research Asia (MSRA) leads in 2019 WMT international machine translation competition](https://news.microsoft.com/apac/2019/05/22/microsoft-research-asia-msra-leads-in-2019-wmt-international-machine-translation-competition/)

    * **Techniques**:

        * Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019. [Multi-Agent Dual Learning](https://openreview.net/pdf?id=HyGhN2A5tm). In *Proceedings of ICLR 2019*.

        * Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. [MASS: Masked Sequence to Sequence Pre-training for Language Generation](https://arxiv.org/pdf/1905.02450). In *Proceedings of ICML 2019*.

        * Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2018. [Neural Architecture Optimization](https://papers.nips.cc/paper/8007-neural-architecture-optimization.pdf). In *Proceedings of NeurIPS 2018*.

        * Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xueqi Cheng, Tie-Yan Liu. 2019. [Soft Contextual Data Augmentation for Neural Machine Translation](https://arxiv.org/pdf/1905.10523.pdf). In *Proceedings of ACL 2019*.

* The winner of [EN-ZH](http://matrix.statmt.org/matrix/systems_list/1908): **PATECH**

    * **System report**: Coming soon...

    * **Techniques**: Transformer + Back-Translation + Reranking + Ensemble

WMT 2018


* The winner of [ZH-EN](http://matrix.statmt.org/matrix/systems_list/1892): **Tencent**

    * **System report**: Mingxuan Wang, Li Gong, Wenhuan Zhu, Jun Xie, and Chao Bian. 2018. [Tencent Neural Machine Translation Systems for WMT18](https://www.aclweb.org/anthology/W18-6429). In *Proceedings of the Third Conference on Machine Translation: Shared Task Papers*.

    * **Techniques**: RNMT + Transformer + BPE + Rerank ensemble outputs with 48 features (including t2t R2l, t2t L2R, rnn L2R, rnn R2L etc.) + Back Translation + Joint Train with English to Chinese systems + Fine-tuning with selected data + Knowledge distillation

    

* The winner of [EN-ZH](http://matrix.statmt.org/matrix/systems_list/1893): **GTCOM**

    * **System report**: Chao Bei, Hao Zong, Yiming Wang, Baoyong Fan, Shiqi Li, and Conghu Yuan. 2018. [An Empirical Study of Machine Translation for the Shared Task of WMT18](https://www.aclweb.org/anthology/W18-6404). In *Proceedings of the Third Conference on Machine Translation: Shared Task Papers*.

    * **Techniques**: Transformer + Back-Translation + Data Filtering by rules, language models and translation models + BPE + Greedy Ensemble Decoding + Fine-Tuning with newstest2017 back translation

    

* The winner of [DE-EN](http://matrix.statmt.org/matrix/systems_list/1880): **RWTH Aachen University**

    * **System report**: Julian Schamper, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. 2018. [The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018](https://www.aclweb.org/anthology/W18-6426). In *Proceedings of the Third Conference on Machine Translation: Shared Task Papers*.

    * **Techniques**: Ensemble of 3-strongest Transformer models + Data Selection + BPE + Fine-Tuning + Important Hyperparameters (batch size and model dimension)

    

* The winner of [EN-DE](http://matrix.statmt.org/matrix/systems_list/1881): **Microsoft**

    * **System report**: Marcin Junczys-Dowmunt. 2018. [Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data](https://www.aclweb.org/anthology/W18-6415). In *Proceedings of the Third Conference on Machine Translation: Shared Task Papers*.

    * **Techniques**: Marian + Transformer-big + BPE + Ensemble + Data Filtering + Domain-Weighted {ParaCrawl, original data} + Decoder-time ensemble with in-domain Transformer-style language model + Reranking with Right-to-left Transformer-big models

WMT 2017


* The winner of [ZH-EN](http://matrix.statmt.org/matrix/systems_list/1878): **Sogou**

    * **System report**: Yuguang Wang, Shanbo Cheng, Liyang Jiang, Jiajun Yang, Wei Chen, Muze Li, Lin Shi, Yanfeng Wang, and Hongtao Yang. 2017. [Sogou Neural Machine Translation Systems for WMT17](https://www.aclweb.org/anthology/W17-4742). In *Proceedings of the Second Conference on Machine Translation: Shared Task Papers*.

    * **Techniques**: Encoder-Decoder with Attention + BPE + Reranking (R2L, T2S, N-gram language models) + Tagging Model + Name Entity Translation + Ensemble

* The winner of [EN-ZH](http://matrix.statmt.org/matrix/systems_list/1879), [DE-EN](http://matrix.statmt.org/matrix/systems_list/1868) and [EN-DE](http://matrix.statmt.org/matrix/systems_list/1869): **University of Edinburgh**  

    * **System report**: Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, and Philip Williams. 2017. [The University of Edinburgh’s Neural MT Systems for WMT17](https://www.aclweb.org/anthology/W17-4739). In *Proceedings of the Second Conference on Machine Translation: Shared Task Papers*. 

    * **Techniques**: Encoder-Decoder with Attention + Deep Model + Layer Normalization + Weight Tying + Back-Translation + BPE + Reranking(L2R, R2L) + Ensemble

WMT 2016


* The winner of [DE-EN](http://matrix.statmt.org/matrix/systems_list/1846): **University of Regensburg**

    * **System report**: Failed to find it

    * **Techniques**: Failed to find it

    

* The winner of [EN-DE](http://matrix.statmt.org/matrix/systems_list/1846): **University of Edinburgh**

    * **System report**: [Edinburgh Neural Machine Translation Systems for WMT 16](http://www.aclweb.org/anthology/W16-2323). In *Proceedings of the First Conference on Machine Translation: Shared Task Papers*.

    * **Techniques**: Encoder-Decoder with Attention + Back-Translation + BPE + Reranking(R2L) + Ensemble