https://github.com/THUNLP-MT/MT-Reading-List
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
https://github.com/THUNLP-MT/MT-Reading-List
machine-translation reading-list
Last synced: about 1 year ago
JSON representation
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
- Host: GitHub
- URL: https://github.com/THUNLP-MT/MT-Reading-List
- Owner: THUNLP-MT
- License: bsd-3-clause
- Created: 2018-12-03T10:45:15.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-08-09T14:48:06.000Z (almost 2 years ago)
- Last Synced: 2025-03-26T09:24:46.213Z (about 1 year ago)
- Topics: machine-translation, reading-list
- Language: TeX
- Homepage:
- Size: 997 KB
- Stars: 2,441
- Watchers: 164
- Forks: 447
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-ai-list-guide - MT-Reading-List
- awesome-machine-learning-resources - **[List - MT/MT-Reading-List?style=social) (Table of Contents)
- awesome-machine-translation - MT-Reading-List - A machine translation reading list maintained by the Tsinghua Natural Language Processing Group. (Other MT Lists 📝)
- fucking-machine-learning-tutorials - Machine Translation Reading List
- Machine-Learning-Tutorials - Machine Translation Reading List
README
# Machine Translation Reading List
This is a machine translation reading list maintained by the Tsinghua Natural Language Processing Group.
The past three decades have witnessed the rapid development of machine translation, especially for data-driven approaches such as statistical machine translation (SMT) and neural machine translation (NMT). Due to the dominance of NMT at the present time, priority is given to collecting important, up-to-date NMT papers; the [Edinburgh/JHU MT research survey wiki](http://www.statmt.org/survey/) has good coverage of older papers and a brief description for each sub-topic of MT. Our list is still incomplete and the categorization might be inappropriate. We will keep adding papers and improving the list. Any suggestions are welcome!
* [10 Must Reads](#10_must_reads)
* [Tutorials and Surveys](#surveys)
* [Statistical Machine Translation](#statistical_machine_translation)
* [Word-based Models](#word_based_models)
* [Phrase-based Models](#phrase_based_models)
* [Syntax-based Models](#syntax_based_models)
* [Discriminative Training](#discriminative_training)
* [System Combination](#system_combination)
* [Human-centered SMT](#human_centered_smt)
* [Interactive SMT](#interactive)
* [Adaptation](#adaptation_smt)
* [Evaluation](#evaluation)
* [Neural Machine Translation](#neural_machine_translation)
* [Model Architecture](#model_architecture)
* [Attention Mechanism](#attention_mechanism)
* [Open Vocabulary](#open_vocabulary)
* [Training Objectives and Frameworks](#training)
* [Decoding](#decoding)
* [Low-resource Language Translation](#low_resource_language_translation)
* [Semi-supervised Learning](#semi_supervised)
* [Unsupervised Learning](#unsupervised)
* [Pivot-based Methods](#pivot_based)
* [Data Augmentation](#data_augmentation)
* [Data Selection](#data_selection)
* [Transfer Learning](#transfer_learning)
* [Meta Learning](#meta_learning)
* [Multilingual Machine Translation](#multilingual)
* [Prior Knowledge Integration](#prior_knowledge_integration)
* [Word/Phrase Constraints](#word_phrase_constraints)
* [Syntactic/Semantic Constraints](#syntactic_semantic_constraints)
* [Coverage Constraints](#coverage_constraints)
* [Document-level Translation](#document_level_translation)
* [Robustness](#robustness)
* [Interpretability](#interpretability)
* [Linguistic Interpretation](#linguistic_interpretation)
* [Fairness and Diversity](#fairness_and_diversity)
* [Efficiency](#efficiency)
* [Non-Autoregressive Translation](#NAT)
* [Speech Translation and Simultaneous Translation](#speech_translation_and_simultaneous_translation)
* [Multi-modality](#multi_modality)
* [Ensemble and Reranking](#ensemble_reranking)
* [Pre-training](#pre_training)
* [Domain Adaptation](#domain_adaptation)
* [Quality Estimation](#quality_estimation)
* [Human-centered NMT](#human_centered)
* [Interactive NMT](#interactive_nmt)
* [Automatic Post-Editing](#ape)
* [Poetry Translation](#poetry_translation)
* [Eco-friendly](#eco_friendly)
* [Compositional Generalization](#compositional_generalization)
* [Endangered Language Revitalization](#endangered)
* [Word Translation (Bilingual Lexicon Induction)](#word_translation)
* [WMT Winners](#wmt_winners)
* [WMT 2019](#wmt19)
* [WMT 2018](#wmt18)
* [WMT 2017](#wmt17)
* [WMT 2016](#wmt16)
10 Must Reads
* Peter E. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. [The Mathematics of Statistical Machine Translation: Parameter Estimation](http://aclweb.org/anthology/J93-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=2259057253133260714&as_sdt=2005&sciodt=0,5&hl=en): 5,218)
* Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. [BLEU: a Method for Automatic Evaluation of Machine Translation](http://aclweb.org/anthology/P02-1040). In *Proceedings of ACL 2002*. ([Citation](https://scholar.google.com/scholar?cites=9019091454858686906&as_sdt=2005&sciodt=0,5&hl=en): 10,700)
* Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. [Statistical Phrase-Based Translation](http://aclweb.org/anthology/N03-1017). In *Proceedings of NAACL 2003*. ([Citation](https://scholar.google.com/scholar?cites=11796378766060939113&as_sdt=2005&sciodt=0,5&hl=en): 3,713)
* Franz Josef Och. 2003. [Minimum Error Rate Training in Statistical Machine Translation](http://aclweb.org/anthology/P03-1021). In *Proceedings of ACL 2003*. ([Citation](https://scholar.google.com/scholar?cites=15358949031331886708&as_sdt=2005&sciodt=0,5&hl=en): 3,115)
* David Chiang. 2007. [Hierarchical Phrase-Based Translation](http://aclweb.org/anthology/J07-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=17074501474509484516&as_sdt=2005&sciodt=0,5&hl=en): 1,235)
* Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. [Sequence to Sequence Learning
with Neural Networks](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf). In *Proceedings of NIPS 2014*. ([Citation](https://scholar.google.com/scholar?cites=13133880703797056141&as_sdt=2005&sciodt=0,5&hl=en): 9,432)
* Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473.pdf). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com/scholar?cites=9430221802571417838&as_sdt=2005&sciodt=0,5&hl=en): 10,479)
* Diederik P. Kingma, Jimmy Ba. 2015. [Adam: A Method for Stochastic Optimization](https://arxiv.org/pdf/1412.6980). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com/scholar?cites=16194105527543080940&as_sdt=2005&sciodt=0,5&hl=en): 37,480)
* Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=1307964014330144942&as_sdt=2005&sciodt=0,5&hl=en): 1,679)
* Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. [Attention is All You Need](https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com/scholar?cites=2960712678066186980&as_sdt=2005&sciodt=0,5&hl=en): 6,112)
Tutorials and Surveys
* Zhixing Tan, Shuo Wang, Zonghan Yang, Gang Chen, Xuancheng Huang, Maosong Sun, and Yang Liu. 2020. [Neural Machine Translation: A Review of Methods, Resources, and Tools](https://arxiv.org/abs/2012.15515). *AI Open*.
* Felix Stahlberg. 2020. [Neural Machine Translation: A Review and Survey](https://arxiv.org/abs/1912.02047). *Journal of Artificial Intelligence Research*.
* Philipp Koehn and Rebecca Knowles. 2017. [Six Challenges for Neural Machine Translation](http://www.aclweb.org/anthology/W17-3204). In *Proceedings of the First Workshop on Neural Machine Translation*.
* Philipp Koehn. 2017. [Neural Machine Translation](https://arxiv.org/abs/1709.07809). *arxiv:1709.07809*.
* Oriol Vinyals and Navdeep Jaitly. 2017. [Seq2Seq ICML Tutorial](https://docs.google.com/presentation/d/1quIMxEEPEf5EkRHc2USQaoJRC4QNX6_KomdZTBMBWjk/present?slide=id.p). *ICML 2017 Tutorial*.
* Raj Dabre, Chenhui Chu, and Anoop Kunchukuttan. 2020. [A Survey of Multilingual Neural Machine Translation](https://doi.org/10.1145/3406095). *ACM Computing Surveys. Surv. 53, 5, Article 99 (October 2020)*.
* Tutorial on [Multilingual Neural Machine Translation](https://github.com/anoopkunchukuttan/multinmt_tutorial_coling2020) at COLING 2020.
* Graham Neubig. 2017. [Neural Machine Translation and Sequence-to-sequence Models: A Tutorial](https://arxiv.org/pdf/1703.01619.pdf). *arXiv:1703.01619*. ([Citation](https://scholar.google.com/scholar?cites=17621873290135947085&as_sdt=2005&sciodt=0,5&hl=en): 45)
* Thang Luong, Kyunghyun Cho, and Christopher Manning. 2016. [Neural Machine Translation](https://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf). *ACL 2016 Tutorial*.
* Adam Lopez. 2008. [Statistical Machine Translation](http://delivery.acm.org/10.1145/1390000/1380586/a8-lopez.pdf?ip=101.5.129.50&id=1380586&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2E587F3204F5B62A59%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1546058891_981e84a24804f2dbc0549b9892a2ea1d). *ACM Computing Surveys*.
* Philipp Koehn. 2006. [Statistical Machine Translation: the Basic, the Novel, and the Speculative](http://homepages.inf.ed.ac.uk/pkoehn/publications/tutorial2006.pdf). *EACL 2006 Tutorial*.
Statistical Machine Translation
Word-based Models
* Peter E. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. [The Mathematics of Statistical Machine Translation: Parameter Estimation](http://aclweb.org/anthology/J93-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=2259057253133260714&as_sdt=2005&sciodt=0,5&hl=en): 4,965)
* Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. [HMM-Based Word Alignment in Statistical Translation](http://aclweb.org/anthology/C96-2141). In *Proceedings of COLING 1996*. ([Citation](https://scholar.google.com.hk/scholar?cites=6742027174667056165&as_sdt=2005&sciodt=0,5&hl=en): 940)
* Franz Josef Och and Hermann Ney. 2003. [A Systematic Comparison of Various Statistical Alignment Models](http://aclweb.org/anthology/J03-1002). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=7906670690027479083&as_sdt=2005&sciodt=0,5&hl=en): 3,980)
* Percy Liang, Ben Taskar, and Dan Klein. 2006. [Alignment by Agreement](https://cs.stanford.edu/~pliang/papers/alignment-naacl2006.pdf). In *Proceedings of NAACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=10766838746666771394&as_sdt=2005&sciodt=0,5&hl=en): 452)
* Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. [A Simple, Fast, and Effective Reparameterization of IBM Model 2](http://www.aclweb.org/anthology/N13-1073). In *Proceedings of NAACL 2013*. ([Citation](https://scholar.google.com.hk/scholar?cites=13560076980956479370&as_sdt=2005&sciodt=0,5&hl=en): 310)
Phrase-based Models
* Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. [Statistical Phrase-Based Translation](http://aclweb.org/anthology/N03-1017). In *Proceedings of NAACL 2003*. ([Citation](https://scholar.google.com.hk/scholar?cites=11796378766060939113&as_sdt=2005&sciodt=0,5&hl=en): 3,516)
* Michel Galley and Christopher D. Manning. 2008. [A Simple and Effective Hierarchical Phrase Reordering Model](https://nlp.stanford.edu/pubs/emnlp08-lexorder.pdf). In *Proceedings of EMNLP 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=14572547803642015856&as_sdt=2005&sciodt=0,5&hl=en): 275)
Syntax-based Models
* Dekai Wu. 1997. [Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora](http://aclweb.org/anthology/J97-3002). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=7926725626202301933&as_sdt=2005&sciodt=0,5&hl=en): 1,009)
* Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. [Scalable Inference and Training of Context-Rich Syntactic Translation Models](http://aclweb.org/anthology/P06-1121). In *Proceedings of COLING/ACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=2650671041278094269&as_sdt=2005&sciodt=0,5&hl=en): 475)
* Yang Liu, Qun Liu, and Shouxun Lin. 2006. [Tree-to-String Alignment Template for Statistical Machine Translation](http://aclweb.org/anthology/P06-1077). In *Proceedings of COLING/ACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=8683308453323663525&as_sdt=2005&sciodt=0,5&hl=en): 391)
* Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. [Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation](https://aclanthology.info/pdf/P/P06/P06-1066.pdf). In *Proceedings of COLING/ACL 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=11896300896063367737&as_sdt=2005&sciodt=0,5&hl=en): 299)
* David Chiang. 2007. [Hierarchical Phrase-Based Translation](http://aclweb.org/anthology/J07-2003). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=17074501474509484516&as_sdt=2005&sciodt=0,5&hl=en): 1,192)
* Liang Huang and David Chiang. 2007. [Forest Rescoring: Faster Decoding with Integrated Language Models](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.5058&rep=rep1&type=pdf). In *Proceedings of ACL 2007*. ([Citation](https://scholar.google.com.hk/scholar?cites=2826188279623417237&as_sdt=2005&sciodt=0,5&hl=en): 280)
* Haitao Mi, Liang Huang, and Qun Liu. 2008. [Forest-based Translation](http://aclweb.org/anthology/P08-1023). *In Proceedings of ACL 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=11263493281241243162&as_sdt=2005&sciodt=0,5&hl=en): 239)
* Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. [A Tree Sequence Alignment-based Tree-to-Tree Translation Model](http://www.aclweb.org/anthology/P08-1064). In *Proceedings of ACL 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=4828105603038412208&as_sdt=2005&sciodt=0,5&hl=en): 124)
* Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. [A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model](http://aclweb.org/anthology/P08-1066). In *Proceedings of ACL 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=15082517325172081801&as_sdt=2005&sciodt=0,5&hl=en): 278)
* Haitao Mi and Liang Huang. 2008. [Forest-based Translation Rule Extraction](http://aclweb.org/anthology/D08-1022). In *Proceedings of EMNLP 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=11263493281241243162&as_sdt=2005&sciodt=0,5&hl=en): 239)
* Yang Liu, Yajuan Lü, and Qun Liu. 2009. [Improving Tree-to-Tree Translation with Packed Forests](http://aclweb.org/anthology/P09-1063). In *Proceedings of ACL/IJNLP 2009*. ([Citation](https://scholar.google.com.hk/scholar?cites=3907324274083528908&as_sdt=2005&sciodt=0,5&hl=en): 93)
* David Chiang. 2010. [Learning to Translate with Source and Target Syntax](http://aclweb.org/anthology/P10-1146). In *Proceedings of ACL 2010*. ([Citation](https://scholar.google.com.hk/scholar?cites=18270412258769590027&as_sdt=2005&sciodt=0,5&hl=en): 118)
Discriminative Training
* Franz Josef Och and Hermann Ney. 2002. [Discriminative Training and Maximum Entropy Models for Statistical Machine Translation](http://aclweb.org/anthology/P02-1038). In *Proceedings of ACL 2002*. ([Citation](https://scholar.google.com.hk/scholar?cites=2845378992177918439&as_sdt=2005&sciodt=0,5&hl=en): 1,258)
* Franz Josef Och. 2003. [Minimum Error Rate Training in Statistical Machine Translation](http://aclweb.org/anthology/P03-1021). In *Proceedings of ACL 2003*. ([Citation](https://scholar.google.com.hk/scholar?cites=15358949031331886708&as_sdt=2005&sciodt=0,5&hl=en): 2,984)
* Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. [Online Large-Margin Training for Statistical Machine Translation](http://aclweb.org/anthology/D07-1080). In *Proceedings of EMNLP-CoNLL 2007*. ([Citation](https://scholar.google.com.hk/scholar?cites=6690339336101573833&as_sdt=2005&sciodt=0,5&hl=en): 197)
* David Chiang, Kevin Knight, and Wei Wang. 2009. [11,001 New Features for Statistical Machine Translation](http://aclweb.org/anthology/N09-1025). In *Proceedings of NAACL 2009*. ([Citation](https://scholar.google.com.hk/scholar?cites=14062409519286340147&as_sdt=2005&sciodt=0,5&hl=en): 251)
System Combination
* Antti-Veikko Rosti, Spyros Matsoukas, and Richard Schwartz. 2007. [Improved Word-Level System Combination for Machine Translation](http://aclweb.org/anthology/P07-1040). In *Proceedings of ACL 2007*. ([Citation](https://scholar.google.com.hk/scholar?cites=13310846375895519088&as_sdt=2005&sciodt=0,5&hl=en): 144)
* Xiaodong He, Mei Yang, Jianfeng Gao, Patrick Nguyen, and Robert Moore. 2008. [Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems](http://aclweb.org/anthology/D08-1011). In *Proceedings of EMNLP 2008*. ([Citation](https://scholar.google.com.hk/scholar?cites=5843300493006970528&as_sdt=2005&sciodt=0,5&hl=en): 96)
Human-centered SMT
Interactive SMT
* George Foster, Pierre Isabelle and Pierre Plamondon. 1997. [Target-text mediated interactive machine translation](https://sci-hub.tw/10.2307/40009035). *Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=17084037882064721827&as_sdt=2005&sciodt=0,5): 116)
* Philippe Langlais, Guy Lapalme and Marie Lorange. 2002. [TransType: Development-Evaluation Cycles to Boost Translator’s Productivity](https://sci-hub.tw/10.2307/40007093). *Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=7892155138946158318&as_sdt=2005&sciodt=0,5): 74)
* Jesús Tomas and Francisco Casacuberta. 2006. [Statistical phrase-based models for interactive computer-assisted translation](http://aclweb.org/anthology/P06-2107). In *Proceedings of COLING/ACL*. ([Citation](https://scholar.google.com/scholar?cites=2242179645100420046&as_sdt=2005&sciodt=0,5): 31)
* Enrique Vidal, Francisco Casacuberta, Luis Rodríguez-Ruiz, Jorge Civera, Carlos D. Martínez-Hinarejos. 2006. [Computer-Assisted Translation Using Speech Recognition](https://ieeexplore.ieee.org/document/1621206). *IEEE Transaction on Audio, Speech and Language Processing*. ([Citation](https://scholar.google.com/scholar?cites=32625184311110830&as_sdt=2005&sciodt=0,5): 62)
* Shahram Khadivi and Hermann Ney. 2008. [Integration of Speech Recognition and Machine Translation in Computer-Assisted Translation](https://sci-hub.tw/10.1109/tasl.2008.2004301). *IEEE Transaction on Audio, Speech and Language Processing*. ([Citation](https://scholar.google.com/scholar?cites=1690852455408892756&as_sdt=2005&sciodt=0,5): 30)
* Sergio Barrachina, Oliver Bender, Francisco Casacuberta, Jorge Civera, Elsa Cubel, Shahram Khadivi, Antonio L. Lagarda, Hermann Ney, Jesús Tomás and Enrique Vidal. 2009. [Statistical approaches to computer-assisted translation](https://www.mitpressjournals.org/doi/abs/10.1162/coli.2008.07-055-R2-06-29). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=17691637682117292572&as_sdt=2005&sciodt=0,5): 207)
* Francisco Casacuberta, Jorge Civera, Elsa Cubel, Antonio L. Lagarda, Guy Lapalme, Elliott Macklovitch, Enrique Vidal. 2009. [Human interaction for high quality machine translation](https://sci-hub.tw/10.1145/1562764.1562798). *Communications of the ACM*. ([Citation](https://scholar.google.com/scholar?cites=6184654159576071790&as_sdt=2005&sciodt=0,5): 49)
* Vicent Alabau, Alberto Sanchis and Francisco Casacuberta. 2014. [Improving on-line handwritten recognition in interactive machine translation](sci-hub.tw/10.1016/j.patcog.2013.09.035). *Pattern Recognition*. ([Citation](https://scholar.google.com/scholar?cites=11987123133913382404&as_sdt=2005&sciodt=0,5): 18)
* Shanbo Cheng, Shujian Huang, Huadong Chen, Xin-Yu Dai and Jiajun Chen. 2016. [PRIMT: A Pick-Revise Framework for Interactive Machine Translation](http://www.aclweb.org/anthology/N16-1148). In *Proceedings of NAACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=3643727460542665178&as_sdt=2005&sciodt=0,5): 9)
* Miguel Domingo, Álvaro Peris and Francisco Casacuberta. 2018. [Segment-based interactive-predictive machine translation](https://www.researchgate.net/publication/322275484_Segment-based_interactive-predictive_machine_translation). *Machine Translation*. ([Citation](https://scholar.google.com/scholar?cites=4148585683672959462&as_sdt=2005&sciodt=0,5): 2)
Adaptation
* Pascual Martínez-Gómez, Germán Sanchis-Trilles and Francisco Casacuberta. 2012. [Online adaptation strategies for statistical machine translation in post-editing scenarios](https://sci-hub.tw/10.1016/j.patcog.2012.01.011). *Pattern Recognition*. ([Citation](https://scholar.google.com/scholar?cites=9143628035426486873&as_sdt=2005&sciodt=0,5): 40)
* Jesús González-Rubio and Francisco Casacuberta. 2014. [Cost-Sensitive Active Learning for Computer-Assisted Translation](https://sci-hub.tw/10.1016/j.patrec.2013.06.007). *Pattern Recognition Letters*. ([Citation](https://scholar.google.com/scholar?cites=13196627956841822823&as_sdt=2005&sciodt=0,5): 11)
* Antonio L. Lagarda, Daniel Ortiz-Martínez, Vicent Alabau and Francisco Casacuberta. 2015. [Translating without in-domain corpus: Machine translation post-editing with online learning techniques](https://sci-hub.tw/10.1016/j.csl.2014.10.004). *Computer Speech & Language*. ([Citation](https://scholar.google.com/scholar?cites=6721510771212778605&as_sdt=2005&sciodt=0,5): 10)
* Germán Sanchis-Trilles, Francisco Casacuberta. 2015. [Improving translation quality stability using Bayesian predictive adaptation](https://sci-hub.tw/10.1016/j.csl.2015.03.001). *Computer Speech & Language*. ([Citation](https://scholar.google.com/scholar?q=Improving+translation+quality+stability+using+Bayesian+predictive+adaptation): 1)
* Daniel Ortiz-Martínez. 2016. [Online Learning for Statistical Machine Translation](https://www.mitpressjournals.org/doi/full/10.1162/COLI_a_00244). *Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=4979468821667106694&as_sdt=2005&sciodt=0,5): 13)
Evaluation
* Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. [BLEU: a Method for Automatic Evaluation of Machine Translation](http://aclweb.org/anthology/P02-1040). In *Proceedings of ACL 2002*. ([Citation](https://scholar.google.com.hk/scholar?cites=9019091454858686906&as_sdt=2005&sciodt=0,5&hl=en): 8,499)
* Philipp Koehn. 2004. [Statistical Significance Tests for Machine Translation Evaluation](http://www.aclweb.org/anthology/W04-3250). In *Proceedings of EMNLP 2004*. ([Citation](https://scholar.google.com.hk/scholar?cites=6141850486206753388&as_sdt=2005&sciodt=0,5&hl=en): 1,015)
* Satanjeev Banerjee and Alon Lavie. 2005. [METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments](http://aclweb.org/anthology/W05-0909). In *Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization*. ([Citation](https://scholar.google.com.hk/scholar?cites=11797833340491598355&as_sdt=2005&sciodt=0,5&hl=en): 1,355)
* Matthew Snover and Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. [A Study of Translation Edit Rate with Targeted Human Annotation](http://mt-archive.info/AMTA-2006-Snover.pdf). In *Proceedings of AMTA 2006*. ([Citation](https://scholar.google.com.hk/scholar?cites=1809540661740640949&as_sdt=2005&sciodt=0,5&hl=en): 1,713)
* Maja Popovic. 2015. [chrF: Character n-gram F-score for Automatic MT Evaluation](http://aclweb.org/anthology/W15-3049). In *Proceedings of WMT 2015*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=12169100229181212462): 58)
* Xin Wang, Wenhu Chen, Yuan-Fang Wang, and William Yang Wang. 2018. [No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling](http://aclweb.org/anthology/P18-1083). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=1809540661740640949&as_sdt=2005&sciodt=0,5&hl=en): 10)
* Arun Tejasvi Chaganty, Stephen Mussman, and Percy Liang. 2018. [The price of debiasing automatic metrics in natural language evaluation](https://arxiv.org/pdf/1807.02202). In *Proceedings of ACL 2018*.
* Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. 2019. [compare-mt: A Tool for Holistic Comparison of Language Generation Systems](https://arxiv.org/pdf/1903.07926.pdf). In *Proceedings of NAACL 2019*.
* Robert Schwarzenberg, David Harbecke, Vivien Macketanz, Eleftherios Avramidis, and Sebastian Möller. 2019. [Train, Sort, Explain: Learning to Diagnose Translation Models](https://arxiv.org/pdf/1903.12017.pdf). In *Proceedings of NAACL 2019*.
* Nitika Mathur, Timothy Baldwin, and Trevor Cohn. 2019. [Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation](https://www.aclweb.org/anthology/P19-1269). In *Proceedings of ACL 2019*.
* Prathyusha Jwalapuram, Shafiq Joty, Irina Temnikova, and Preslav Nakov. 2019. [Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite](https://arxiv.org/pdf/1909.00131). In *Proceedings of ACL 2019*.
* Sergey Edunov, Myle Ott, Marc’Aurelio Ranzato and Michael Auli. 2020. [On The Evaluation of Machine Translation Systems Trained With Back-Translation](https://arxiv.org/abs/1908.05204). In *Proceedings of ACL 2020*.
* Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West and Steffen Eger. 2020. [On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation](http://arxiv.org/abs/2005.01196). In *Proceedings of ACL 2020*.
* Marina Fomicheva, Lucia Specia, and Francisco Guzmán. 2020. [Multi-Hypothesis Machine Translation Evaluation](https://www.aclweb.org/anthology/2020.acl-main.113/). In *Proceedings of ACL 2020*.
* Kosuke Takahashi, Katsuhito Sudoh and Satoshi Nakamura. 2020. [Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model](https://www.aclweb.org/anthology/2020.acl-main.327/). In *Proceedings of ACL 2020*.
* Nitika Mathur, Timothy Baldwin and Trevor Cohn. 2020. [Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics](https://www.aclweb.org/anthology/2020.acl-main.448/). In *Proceedings of ACL 2020*.
* Markus Freitag, David Grangier, Isaac Caswell. 2020. [BLEU might be Guilty but References are not Innocent](https://www.aclweb.org/anthology/2020.emnlp-main.5/). In *Proceedings of EMNLP 2020*.
* Yvette Graham, Barry Haddow, Philipp Koehn. 2020. [Statistical Power and Translationese in Machine Translation Evaluation](https://www.aclweb.org/anthology/2020.emnlp-main.6/). In *Proceedings of EMNLP 2020*.
* Brian Thompson, Matt Post. 2020. [Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing](https://www.aclweb.org/anthology/2020.emnlp-main.8/). In *Proceedings of EMNLP 2020*.
* Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie. 2020. [COMET: A Neural Framework for MT Evaluation](https://www.aclweb.org/anthology/2020.emnlp-main.213/). In *Proceedings of EMNLP 2020*.
Neural Machine Translation
Model Architecture
* Nal Kalchbrenner and Phil Blunsom. 2013. [Recurrent Continuous Translation Models](http://aclweb.org/anthology/D13-1176). In *Proceedings of EMNLP 2013*. ([Citation](https://scholar.google.com/scholar?cites=14122455772200752032&as_sdt=2005&sciodt=0,5&hl=en): 623)
* Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. [Sequence to Sequence Learning
with Neural Networks](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf). In *Proceedings of NIPS 2014*. ([Citation](https://scholar.google.com/scholar?cites=13133880703797056141&as_sdt=2005&sciodt=0,5&hl=en): 5,452)
* Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com/scholar?cites=9430221802571417838&as_sdt=2005&sciodt=0,5&hl=en): 5,596)
* Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. [Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation](https://arxiv.org/pdf/1609.08144). In *Proceedings of NIPS 2016*. ([Citation](https://scholar.google.com/scholar?cites=17018428530559089870&as_sdt=2005&sciodt=0,5&hl=en): 1,046)
* Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. 2016. [Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation](http://aclweb.org/anthology/Q16-1027). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com/scholar?cites=2319930273054317494&as_sdt=2005&sciodt=0,5&hl=en): 73)
* Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. [Incorporating Copying Mechanism in Sequence-to-Sequence Learning](http://aclweb.org/anthology/P16-1154). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com/scholar?cites=6836221883265474919&as_sdt=2005&sciodt=0,5&hl=en): 254)
* Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. 2016. [Variational Neural Machine Translation](http://aclweb.org/anthology/D16-1050). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com/scholar?cites=16453011540088245227&as_sdt=2005&sciodt=0,5&hl=en): 38)
* Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. [Neural Machine Translation in Linear Time](https://arxiv.org/pdf/1610.10099). *arXiv:1610.10099*. ([Citation](https://scholar.google.com/scholar?cites=13142156854384740601&as_sdt=5,39&sciodt=0,39&hl=en): 189)
* Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. [Convolutional Sequence to Sequence Learning](https://arxiv.org/pdf/1705.03122.pdf). In *Proceedings of ICML 2017*. ([Citation](https://scholar.google.com/scholar?cites=9032432574575787905&as_sdt=2005&sciodt=0,5&hl=en): 453)
* Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. [A Convolutional Encoder Model for Neural Machine Translation](http://aclweb.org/anthology/P17-1012). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=13078160224216368728&as_sdt=2005&sciodt=0,5&hl=en): 85)
* Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. [Deep Neural Machine Translation with Linear Associative Unit](http://aclweb.org/anthology/P17-1013). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?cites=13710779557836853910&as_sdt=2005&sciodt=0,5&hl=en): 21)
* Matthias Sperber, Graham Neubig, Jan Niehues, and Alex Waibel. 2017. [Neural Lattice-to-Sequence Models for Uncertain Inputs](http://aclweb.org/anthology/D17-1145). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=6601112324222176825&as_sdt=2005&sciodt=0,5&hl=en): 11)
* Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. [Massive Exploration of Neural Machine Translation Architectures](http://aclweb.org/anthology/D17-1151). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com/scholar?cites=17797498583666145091&as_sdt=2005&sciodt=0,5&hl=en): 114)
* Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. [Attention is All You Need](https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com/scholar?cites=2960712678066186980&as_sdt=2005&sciodt=0,5&hl=en): 1,748)
* Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. [Deliberation Networks: Sequence Generation Beyond One-Pass Decoding](https://papers.nips.cc/paper/6775-deliberation-networks-sequence-generation-beyond-one-pass-decoding.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=5359968740795634948): 38)
* Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, and Hang Li. 2017. [Neural machine translation with reconstruction](https://arxiv.org/pdf/1611.01874). In *Proceedings of AAAI 2017*. ([Citation](https://scholar.google.com/scholar?cites=1310099558617172101&as_sdt=2005&sciodt=0,5&hl=en): 75)
* Lukasz Kaiser, Aidan N. Gomez, and Francois Chollet. 2018. [Depthwise Separable Convolutions for Neural Machine Translation](https://openreview.net/pdf?id=S1jBcueAb). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com/scholar?cites=7520360878420709403&as_sdt=2005&sciodt=0,5&hl=en): 27)
* Yanyao Shen, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu. 2018. [Dense Information Flow for Neural Machine Translation](http://aclweb.org/anthology/N18-1117). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=12417301759540220817&as_sdt=2005&sciodt=0,5&hl=en): 3)
* Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, and Ming Zhou. 2018. [Generative Bridging Network for Neural Sequence Prediction](http://aclweb.org/anthology/N18-1154). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=16479416225427738693): 3)
* Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. 2018. [The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation](http://aclweb.org/anthology/P18-1008). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=1960239321427735403&as_sdt=2005&sciodt=0,5&hl=en): 22)
* Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. 2018. [Neural Hidden Markov Model for Machine Translation](http://aclweb.org/anthology/P18-2060). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=13737032050194395214&as_sdt=2005&sciodt=0,5&hl=en): 3)
* Jingjing Gong, Xipeng Qiu, Shaojing Wang, and Xuanjing Huang. 2018. [Information Aggregation via Dynamic Routing for Sequence Encoding](http://aclweb.org/anthology/C18-1232). In *COLING 2018*.
* Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li, and Jingbo Zhu. 2018. [Multi-layer Representation Fusion for Neural Machine Translation](http://aclweb.org/anthology/C18-1255). In *Proceedings of COLING 2018*.
* Yachao Li, Junhui Li, and Min Zhang. 2018. [Adaptive Weighting for Neural Machine Translation](http://aclweb.org/anthology/C18-1257). In *Proceedings of COLING 2018*.
* Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, and Tie-Yan Liu. 2018. [Double Path Networks for Sequence to Sequence Learning](http://aclweb.org/anthology/C18-1259). In *Proceedings of COLING 2018*.
* Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Shuming Shi, and Tong Zhang. 2018. [Exploiting Deep Representations for Neural Machine Translation](http://aclweb.org/anthology/D18-1457). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=8760242283445305561&as_sdt=2005&sciodt=0,5&hl=en): 1)
* Biao Zhang, Deyi Xiong, Jinsong Su, Qian Lin, and Huiji Zhang. 2018. [Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks](http://aclweb.org/anthology/D18-1459). In *Proceedings of EMNLP 2018*.
* Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. [Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures](http://aclweb.org/anthology/D18-1458). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=8994080673363827758&as_sdt=2005&sciodt=0,5&hl=en): 6)
* Ke Tran, Arianna Bisazza, and Christof Monz. 2018. [The Importance of Being Recurrent for Modeling Hierarchical Structure](http://aclweb.org/anthology/D18-1503). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=16387948292048936516&as_sdt=2005&sciodt=0,5&hl=en): 6)
* Parnia Bahar, Christopher Brix, and Hermann Ney. 2018. [Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation](http://aclweb.org/anthology/D18-1335). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=4611047151878523903&as_sdt=2005&sciodt=0,5&hl=en): 1)
* Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, and Tie-Yan Liu. 2018. [Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation](http://papers.nips.cc/paper/8019-layer-wise-coordination-between-encoder-and-decoder-for-neural-machine-translation.pdf). In *Proceedings of NeurIPS 2018*. ([Citation](https://scholar.google.com/scholar?cites=14258883426797488339&as_sdt=2005&sciodt=0,5&hl=en): 2)
* Harshil Shah and David Barber. 2018. [Generative Neural Machine Translation](http://papers.nips.cc/paper/7409-generative-neural-machine-translation.pdf). In *Proceedings of NeurIPS 2018*.
* Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. [Achieving Human Parity on Automatic Chinese to English News Translation](https://www.microsoft.com/en-us/research/uploads/prod/2018/03/final-achieving-human.pdf). Technical report. Microsoft AI & Research. ([Citation](https://scholar.google.com/scholar?cites=3670312788898741170&as_sdt=2005&sciodt=0,5&hl=en): 41)
* Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. 2019. [Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks](https://openreview.net/pdf?id=B1l6qiR5F7). In *Proceedings of ICLR 2019*.
* Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, and Michael Auli. 2019. [Pay Less Attention with Lightweight and Dynamic Convolutions](https://openreview.net/pdf?id=SkVhlh09tX). In *Proceedings of ICLR 2019*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=3358231780148394025): 1)
* Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser. 2019. [Universal Transformers](https://openreview.net/pdf?id=HyzdRiR9Y7). In *Proceedings of ICLR 2019*. ([Citation](https://scholar.google.com/scholar?cites=8443376534582904234&as_sdt=2005&sciodt=0,5&hl=en): 12)
* Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, and Tong Zhang. 2019. [Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement](https://arxiv.org/pdf/1902.05770.pdf). In *Proceedings of AAAI 2019*.
* Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/pdf/1901.02860). In *Proceedings of ACL 2019*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=7150055013029036741): 8)
* Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. [Star-Transformer](https://arxiv.org/pdf/1902.09113.pdf). In *Proceedings of NAACL 2019*.
* Sho Takase and Naoaki Okazaki. 2019. [Positional Encoding to Control Output Sequence Length](https://arxiv.org/pdf/1904.07418.pdf). In *Proceedings of NAACL 2019*.
* Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu, and Zhaopeng Tu. 2019. [Information Aggregation for Multi-Head Attention with Routing-by-Agreement](https://arxiv.org/pdf/1904.03100.pdf). In *Proceedings of NAACL 2019*.
* Baosong Yang, Longyue Wang, Derek Wong, Lidia S. Chao, and Zhaopeng Tu. 2019. [Convolutional Self-Attention Networks](https://arxiv.org/pdf/1904.03107.pdf). In *Proceedings of NAACL 2019*.
* Jie Hao, Xing Wang, Baosong Yang, Longyue Wang, Jinfeng Zhang, and Zhaopeng Tu. 2019. [Modeling Recurrence for Transformer](https://arxiv.org/pdf/1904.03092.pdf). In *Proceedings of NAACL 2019*.
* Nikolaos Pappas and James Henderson. 2019. [Deep Residual Output Layers for Neural Language Generation](https://arxiv.org/pdf/1905.05513.pdf). In *Proceedings of ICML 2019*.
* David R. So, Chen Liang, and Quoc V. Le. 2019. [The Evolved Transformer](https://arxiv.org/pdf/1901.11117). In *Proceedings of ICML 2019*.
* Ben Peters, Vlad Niculae, and André F.T. Martins. 2019. [Sparse Sequence-to-Sequence Models](https://arxiv.org/pdf/1905.05702). In *Proceedings of ACL 2019*.
* Roberto Dessì and Marco Baroni. 2019. [CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks](https://arxiv.org/pdf/1905.08527). In *Proceedings of ACL 2019*.
* Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. [Adaptive Attention Span in Transformers](https://arxiv.org/pdf/1905.07799). In *Proceedings of ACL 2019*.
* Yi Tay, Aston Zhang, Luu Anh Tuan, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, and Siu Cheung Hui. 2019. [Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks](https://arxiv.org/pdf/1906.04393). In *Proceedings of ACL 2019*.
* Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, and Lidia S. Chao. 2019. [Learning Deep Transformer Models for Machine Translation](https://arxiv.org/pdf/1906.01787). In *Proceedings of ACL 2019*.
* Fengshun Xiao, Jiangtong Li, Hai Zhao, Rui Wang, and Kehai Chen. 2019. [Lattice-Based Transformer Encoder for Neural Machine Translation](https://arxiv.org/pdf/1906.01282). In *Proceedings of ACL 2019*.
* Matthias Sperber, Graham Neubig, Ngoc-Quan Pham, and Alex Waibel. 2019. [Self-Attentional Models for Lattice Inputs](https://arxiv.org/pdf/1906.01617). In *Proceedings of ACL 2019*.
* Xing Wang, Zhaopeng Tu, Longyue Wang, and Shuming Shi. 2019. [Exploiting Sentential Context for Neural Machine Translation](https://arxiv.org/pdf/1906.01268). In *Proceedings of ACL 2019*.
* Kris Korrel, Dieuwke Hupkes, Verna Dankers, and Elia Bruni. 2019. [Transcoding compositionally: using attention to find more generalizable solutions](https://arxiv.org/pdf/1906.01234). In *Proceedings of ACL 2019*.
* Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2019. [Depth Growing for Neural Machine Translation](https://arxiv.org/pdf/1907.01968). In *Proceedings of ACL 2019*.
* Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2019. [Deep Equilibrium Models](https://arxiv.org/pdf/1909.01377.pdf). In *Proceedings of NeurIPS 2019*.
* Gonçalo M. Correia, Vlad Niculae, and André F.T. Martins. 2019. [Adaptively Sparse Transformers](https://arxiv.org/pdf/1909.00015). In *Proceedings of EMNLP 2019*.
* Yau-Shian Wang, Hung-Yi Lee, and Yun-Nung Chen. 2019. [Tree Transformer: Integrating Tree Structures into Self-Attention](https://arxiv.org/pdf/1909.06639). In *Proceedings of EMNLP 2019*.
* Mingxuan Wang, Jun xie, Zhixing Tan, Jinsong Su, Deyi Xiong and Lei Li. 2019. [Towards Linear Time Neural Machine Translation with Capsule Networks](https://www.aclweb.org/anthology/D19-1074.pdf). In *Proceedings of EMNLP 2019*.
* Biao Zhang, Ivan Titov and Rico Sennrich. 2019. [Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention](https://www.aclweb.org/anthology/D19-1083.pdf). In *Proceedings of EMNLP 2019*.
* Jiatao Gu, Changhan Wang, Junbo Zhao. 2019. [Levenshtein Transformer](https://papers.nips.cc/paper/9297-levenshtein-transformer). In *Proceedings of NeurIPS 2019*.
* Xin Sheng, Linli Xu, Junliang Guo, Jingchang Liu, Ruoyu Zhao and Yinlong Xu. 2020. [IntroVNMT: An Introspective Model for Variational Neural Machine Translation](https://www.aaai.org/ojs/index.php/AAAI/article/view/6411/6267). In *Proceedings of AAAI 2020*.
* Jian Li, Xing Wang, Baosong Yang, Shuming Shi, Michael R. Lyu and Zhaopeng Tu. 2020. [Neuron Interaction Based Representation Composition for Neural Machine Translation](https://arxiv.org/abs/1911.09877). In *Proceedings of AAAI 2020*.
* Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen. 2020. [Encoding word order in complex embeddings](https://openreview.net/forum?id=Hke-WTVtwr). In *Proceedings of ICLR 2020*.
* Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya. 2020. [Reformer: The Efficient Transformer](https://openreview.net/forum?id=rkgNKkHtvB). In *Proceedings of ICLR 2020*.
* Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli. 2020. [Depth-Adaptive Transformer](https://openreview.net/forum?id=SJg7KhVKPH). In *Proceedings of ICLR 2020*.
* Ofir Press, Noah A. Smith and Omer Levy. 2020. [Improving Transformer Models by Reordering their Sublayers](https://arxiv.org/abs/1911.03864). In *Proceedings of ACL 2020*.
* Yekun Chai, Shuo Jin and Xinwen Hou. 2020. [Highway Transformer: Self-Gating Enhanced Self-Attentive Networks](https://arxiv.org/abs/2004.08178). In *Proceedings of ACL 2020*.
* Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng and Weihua Luo. 2020. [Multiscale Collaborative Deep Models for Neural Machine Translation](https://arxiv.org/abs/2004.14021). In *Proceedings of ACL 2020*.
* Hendra Setiawan, Matthias Sperber, Udhyakumar Nallasamy and Matthias Paulik. 2020. [Variational Neural Machine Translation with Normalizing Flows](http://arxiv.org/abs/2005.13978). In *Proceedings of ACL 2020*.
* Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, and Cho-Jui Hsieh. 2020. [Learning to Encode Position for Transformer with Continuous Dynamical Model](https://arxiv.org/abs/2003.09229). In *Proceedings of ICML 2020*.
* Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. 2020. [On Layer Normalization in the Transformer Architecture](https://arxiv.org/abs/2002.04745). In *Proceedings of ICML 2020*.
* Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, and Julian McAuley. 2020. [ReZero is All You Need: Fast Convergence at Large Depth](https://arxiv.org/abs/2003.04887). *arXiv:2003.04887*.
* Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou, Zhengyuan Yang, Jie Zhou and Jiebo Luo. 2020. [A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.273/). In *Proceedings of ACL 2020*.
* Arya D. McCarthy, Xian Li, Jiatao Gu and Ning Dong. 2020. [Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.753/). In *Proceedings of ACL 2020*.
* Yong Wang, Longyue Wang, Victor Li, Zhaopeng Tu. 2020. [On the Sparsity of Neural Machine Translation Models](https://www.aclweb.org/anthology/2020.emnlp-main.78/). In *Proceedings of EMNLP 2020*.
* Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu. 2020. [Shallow-to-Deep Training for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.72/). In *Proceedings of EMNLP 2020*.
* Jianhao Yan, Fandong Meng, Jie Zhou. 2020. [Multi-Unit Transformers for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.77/). In *Proceedings of EMNLP 2020*.
* Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing, Weihua Luo. 2020. [Uncertainty-Aware Semantic Augmentation for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.216/). In *Proceedings of EMNLP 2020*.
* Xian Li, Asa Cooper Stickland, Yuqing Tang, Xiang Kong. 2020. [Deep Transformers with Latent Depth](https://papers.nips.cc/paper/2020/file/1325cdae3b6f0f91a1b629307bf2d498-Paper.pdf). In *Proceedings of NeurIPS 2020*.
* Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 2020. [Big Bird: Transformers for Longer Sequences](https://papers.nips.cc/paper/2020/file/c8512d142a2d849725f31a9a7a361ab9-Paper.pdf). In *Proceedings of NeurIPS 2020*.
* Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie Zhou, Dong Yu. 2020. [Token-level Adaptive Training for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.76/). In *Proceedings of EMNLP 2020*.
* Yufei Wang , Ian D. Wood , Stephen Wan , Mark Dras , Mark Johnson . 2021. Mention Flags (MF): [Mention Flags (MF): Constraining Transformer-based Text Generators](https://aclanthology.org/2021.acl-long.9/). In *Proceedings of ACL 2021* .
Attention Mechanism
* Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com.sg/scholar?cites=9430221802571417838&as_sdt=2005&sciodt=0,5&hl=en): 5,596)
* Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. [Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/pdf/1508.04025). In *Proceedings of EMNLP 2015*. ([Citation](https://scholar.google.com.sg/scholar?cites=12347446836257434866&as_sdt=2005&sciodt=0,5&hl=en): 1,466)
* Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, and Kenny Q. Zhu. 2016. [Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation](https://www.aclweb.org/anthology/C16-1290). In *Proceedings of COLING 2016*. ([Citation](https://scholar.google.com/scholar?&cites=1624882767342343496&as_sdt=2005&sciodt=0,5&hl=en): 18)
* Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. [Supervised Attentions for Neural Machine Translation](http://aclweb.org/anthology/D16-1249). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.sg/scholar?cites=16345118068023322142&as_sdt=2005&sciodt=0,5&hl=en): 43)
* Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. [A Structured
Self-attentive Sentence Embedding](https://arxiv.org/abs/1703.03130). In *Proceedings of ICLR 2017*. ([Citation](https://scholar.google.com.sg/scholar?cites=3666844900655302515&as_sdt=2005&sciodt=0,5&hl=en): 216)
* Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. [DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding](https://arxiv.org/pdf/1709.04696.pdf). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=7311258646982866903&as_sdt=2005&sciodt=0,5&hl=en): 60)
* Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, and Chengqi Zhang. 2018. [Bi-directional Block Self-attention for Fast and Memory-efficient Sequence Modeling](https://arxiv.org/abs/1804.00857). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=7203374430207428965&as_sdt=2005&sciodt=0,5&hl=en): 13)
* Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, Chengqi Zhang. 2018. [Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling](https://arxiv.org/abs/1801.10296). In *Proceedings of IJCAI 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=3809241292668177959&as_sdt=2005&sciodt=0,5&hl=en): 18)
* Peter Shaw, Jakob Uszkorei, and Ashish Vaswani. 2018. [Self-Attention with Relative Position Representations](http://aclweb.org/anthology/N18-2074). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.sg/scholar?cites=5563767891081728261&as_sdt=2005&sciodt=0,5&hl=en): 24)
* Lesly Miculicich Werlen, Nikolaos Pappas, Dhananjay Ram, and Andrei Popescu-Belis. 2018. [Self-Attentive Residual Decoder for Neural Machine Translation](http://aclweb.org/anthology/N18-1124). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=10357155207431596394): 3)
* Xintong Li, Lemao Liu, Zhaopeng Tu, Shuming Shi, and Max Meng. 2018. [Target Foresight Based Attention for Neural Machine Translation](http://aclweb.org/anthology/N18-1125). In *Proceedings of NAACL 2018*.
* Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. [Accelerating Neural Transformer via an Average Attention Network](http://aclweb.org/anthology/P18-1166). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=16436039193082710776&as_sdt=2005&sciodt=0,5&hl=en): 5)
* Tobias Domhan. 2018. [How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures](http://aclweb.org/anthology/P18-1167). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=16338550517026915979&as_sdt=2005&sciodt=0,5&hl=en): 3)
* Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, and Deyi Xiong. 2018. [Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings](http://aclweb.org/anthology/P18-1164). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=13357719581808108940&as_sdt=2005&sciodt=0,5&hl=en): 1)
* Chaitanya Malaviya, Pedro Ferreira, and André F. T. Martins. 2018. [Sparse and Constrained Attention for Neural Machine Translation](http://aclweb.org/anthology/P18-2059). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com/scholar?cites=11257363334017043172&as_sdt=2005&sciodt=0,5&hl=en): 4)
* Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. [Multi-Head Attention with Disagreement Regularization](http://aclweb.org/anthology/D18-1317). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=4230613606718109837&as_sdt=2005&sciodt=0,5&hl=en): 1)
* Wei Wu, Houfeng Wang, Tianyu Liu and Shuming Ma. 2018. [Phrase-level Self-Attention Networks for Universal Sentence Encoding](http://aclweb.org/anthology/D18-1408). In *Proceedings of EMNLP 2018*.
* Baosong Yang, Zhaopeng Tu, Derek F. Wong, Fandong Meng, Lidia S. Chao, and Tong Zhang. 2018. [Modeling Localness for Self-Attention Networks](https://arxiv.org/abs/1810.10182). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com/scholar?cites=16651306350908112709&as_sdt=2005&sciodt=0,5&hl=en): 2)
* Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, and Qi Su. 2018. [Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation](http://aclweb.org/anthology/D18-1331). In *Proceedings of EMNLP 2018*.
* Shiv Shankar, Siddhant Garg, and Sunita Sarawagi. 2018. [Surprisingly Easy Hard-Attention for Sequence to Sequence Learning](http://aclweb.org/anthology/D18-1065). In *Proceedings of EMNLP 2018*.
* Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. [Training Deeper Neural Machine Translation Models with Transparent Attention](http://aclweb.org/anthology/D18-1338). In *Proceedings of EMNLP 2018*.
* Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, and Pascal Poupart. 2018. [Variational Attention for Sequence-to-Sequence Models](http://aclweb.org/anthology/C18-1142). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=1653411252630135531): 14)
* Maha Elbayad, Laurent Besacier, and Jakob Verbeek. 2018. [Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction](http://aclweb.org/anthology/K18-1010). In *Proceedings of CoNLL 2018*. ([Citation](https://scholar.google.com/scholar?cites=14016975442337015010&as_sdt=2005&sciodt=0,5&hl=en): 4)
* Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, and Alexander M. Rush. 2018 [Latent Alignment and Variational Attention](https://papers.nips.cc/paper/8179-latent-alignment-and-variational-attention.pdf). In *Proceedings of NeurIPS 2018*. ([Citation](https://scholar.google.com/scholar?client=safari&rls=en&oe=UTF-8&um=1&ie=UTF-8&lr&cites=6335407498429393003))
* Wenpeng Yin and Hinrich Schütze. 2019. [Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms](https://arxiv.org/pdf/1710.00519). *Transactions of the Association for Computational Linguistics*.
* Shiv Shankar and Sunita Sarawagi. 2019. [Posterior Attention Models for Sequence to Sequence Learning](https://openreview.net/pdf?id=BkltNhC9FX). In *Proceedings of ICLR 2019*.
* Baosong Yang, Jian Li, Derek Wong, Lidia S. Chao, Xing Wang, and Zhaopeng Tu. 2019. [Context-Aware Self-Attention Networks](https://arxiv.org/pdf/1902.05766.pdf). In *Proceedings of AAAI 2019*.
* Reza Ghaeini, Xiaoli Z. Fern, Hamed Shahbazi, and Prasad Tadepalli. 2019. [Saliency Learning: Teaching the Model Where to Pay Attention](https://arxiv.org/pdf/1902.08649.pdf). In *Proceedings of NAACL 2019*.
* Sameen Maruf, André F. T. Martins, and Gholamreza Haffari. 2019. [Selective Attention for Context-aware Neural Machine Translation](https://arxiv.org/pdf/1903.08788.pdf). In *Proceedings of NAACL 2019*.
* Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. [Adaptive Attention Span in Transformers](https://arxiv.org/pdf/1905.07799). In *Proceedings of ACL 2019*.
* Kris Korrel, Dieuwke Hupkes, Verna Dankers, and Elia Bruni. 2019. [Transcoding compositionally: using attention to find more generalizable solutions](https://arxiv.org/pdf/1906.01234). In *Proceedings of ACL 2019*.
* Jesse Vig. 2019. [A Multiscale Visualization of Attention in the Transformer Model](https://arxiv.org/pdf/1906.05714). In *Proceedings of ACL 2019*.
* Sathish Reddy Indurthi, Insoo Chung, and Sangha Kim. 2019. [Look Harder: A Neural Machine Translation Model with Hard Attention](https://www.aclweb.org/anthology/P19-1290). In *Proceedings of ACL 2019*.
* Mingzhou Xu, Derek F. Wong, Baosong Yang, Yue Zhang, and Lidia S. Chao. 2019. [Leveraging Local and Global Patterns for Self-Attention Networks](https://www.aclweb.org/anthology/P19-1295). In *Proceedings of ACL 2019*.
* Sarthak Jain and Byron C. Wallace. 2019. [Attention is not Explanation](https://arxiv.org/pdf/1902.10186.pdf). In *Proceedings of NAACL 2019*.
* Sarah Wiegreffe and Yuval Pinter. 2019. [Attention is not not Explanation](https://arxiv.org/pdf/1908.04626). In *Proceedings of EMNLP 2019*.
* Xing Wang, Zhaopeng Tu, Longyue Wang, and Shuming Shi. 2019. [Self-Attention with Structural Position Representations](https://arxiv.org/pdf/1909.00383). In *Proceedings of EMNLP 2019*.
* Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, and Ruslan Salakhutdinov
. 2019. [Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel](https://arxiv.org/pdf/1908.11775). In *Proceedings of EMNLP 2019*.
* Kehai Chen, Rui Wang, Masao Utiyama and Eiichiro Sumita. 2019. [Recurrent Position Embedding for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1139/). In *Proceedings of EMNLP 2019*.
* Weiqiu You, Simeng Sun and Mohit Iyyer. 2020. [Hard-Coded Gaussian Attention for Neural Machine Translation](https://arxiv.org/abs/2005.00742). In *Proceedings of ACL 2020*.
* Emanuele Bugliarello and Naoaki Okazaki. 2020. [Enhancing Machine Translation with Dependency-Aware Self-Attention](http://arxiv.org/abs/1909.03149). In *Proceedings of ACL 2020*.
* Yann Dubois, Gautier Dagan, Dieuwke Hupkes, Elia Bruni. 2020. [Location Attention for Extrapolation to Longer Sequences](https://www.aclweb.org/anthology/2020.acl-main.39/). In *Proceedings of ACL 2020*.
* Michael Hahn. 2020. [Theoretical Limitations of Self-Attention in Neural Sequence Models](https://transacl.org/ojs/index.php/tacl/article/view/1815). *Transactions of the Association for Computational Linguistics*.
* Apoorv Vyas, Angelos Katharopoulos, François Fleuret. 2020. [Fast Transformers with Clustered Attention](https://papers.nips.cc/paper/2020/file/f6a8dd1c954c8506aadc764cc32b895e-Paper.pdf). In *Proceedings of NeurIPS 2020*.
* Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar. 2020. [O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers](https://papers.nips.cc/paper/2020/file/9ed27554c893b5bad850a422c3538c15-Paper.pdf). In *Proceedings of NeurIPS 2020*.
* Yu Lu1 , Jiali Zeng , Jiajun Zhang , Shuangzhi Wu , Mu Li . 2021 . [Attention Calibration for Transformer in Neural Machine Translation](https://aclanthology.org/2021.acl-long.103.pdf) . In *Proceedings of ACL 2021*.
Open Vocabulary
* Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, and Yoshua Bengio. 2015. [Embedding Word Similarity with Neural Machine Translation](https://arxiv.org/pdf/1412.6448.pdf). In *Proceedings of ICLR 2015*. ([Citation](https://scholar.google.com.hk/scholar?cites=3941248209566557946&as_sdt=2005&sciodt=0,5&hl=en): 24)
* Thang Luong, Ilya Sutskever, Quoc Le, Oriol Vinyals, and Wojciech Zaremba. 2015. [Addressing the Rare Word Problem in Neural Machine Translation](http://aclweb.org/anthology/P15-1002). In *Proceedings of ACL 2015*. ([Citation](https://scholar.google.com.hk/scholar?cites=1855379039969159341&as_sdt=2005&sciodt=0,5&hl=en): 367)
* Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. [On Using Very Large Target Vocabulary for Neural Machine Translation](http://www.aclweb.org/anthology/P15-1001). In *Proceedings of ACL 2015*. ([Citation](https://scholar.google.com.hk/scholar?cites=13222564911222792417&as_sdt=2005&sciodt=0,5&hl=en): 455)
* Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/pdf/1508.07909.pdf). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=1307964014330144942&as_sdt=2005&sciodt=0,5&hl=en): 795)
* Minh-Thang Luong and Christopher D. Manning. 2016. [Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models](http://aclweb.org/anthology/P16-1100). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=7652846715026310814&as_sdt=2005&sciodt=0,5&hl=en): 173)
* Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio. 2016. [A Character-level Decoder without Explicit Segmentation for Neural Machine Translation](http://aclweb.org/anthology/P16-1160). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=2193535701900882329&as_sdt=2005&sciodt=0,5&hl=en): 171)
* Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. [Fully Character-Level Neural Machine Translation without Explicit Segmentation](http://aclweb.org/anthology/Q17-1026). *Transactions of the Association for Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=13463489320810094413&as_sdt=2005&sciodt=0,5&hl=en): 116)
* Yang Feng, Shiyue Zhang, Andi Zhang, Dong Wang, and Andrew Abel. 2017. [Memory-augmented Neural Machine Translation](http://aclweb.org/anthology/D17-1146). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=825727884820810695&as_sdt=2005&sciodt=0,5&hl=en): 9)
* Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, and Jingbo Zhu. 2017. [Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation](http://aclweb.org/anthology/D17-1150). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=18313642653606285813&as_sdt=2005&sciodt=0,5&hl=en): 5)
* Peyman Passban, Qun Liu, and Andy Way. 2018. [Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation](http://aclweb.org/anthology/N18-1006). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=13968879243228181963&as_sdt=2005&sciodt=0,5&hl=en): 5)
* Huadong Chen, Shujian Huang, David Chiang, Xinyu Dai, and Jiajun Chen. 2018. [Combining Character and Word Information in Neural Machine Translation Using a Multi-Level Attention](http://aclweb.org/anthology/N18-1116). In *Proceedings of NAACL 2018*.
* Frederick Liu, Han Lu, and Graham Neubig. 2018. [Handling Homographs in Neural Machine Translation](http://aclweb.org/anthology/N18-1121). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=8530214186708420865&as_sdt=2005&sciodt=0,5&hl=en): 8)
* Taku Kudo. 2018. [Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates](http://aclweb.org/anthology/P18-1007). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=10996996628614665108&as_sdt=2005&sciodt=0,5&hl=en): 17)
* Makoto Morishita, Jun Suzuki, and Masaaki Nagata. 2018. [Improving Neural Machine Translation by Incorporating Hierarchical Subword Features](http://aclweb.org/anthology/C18-1052). In *Proceedings of COLING 2018*.
* Yang Zhao, Jiajun Zhang, Zhongjun He, Chengqing Zong, and Hua Wu. 2018. [Addressing Troublesome Words in Neural Machine Translation](http://aclweb.org/anthology/D18-1036). In *Proceedings of EMNLP 2018*.
* Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, and Wolfgang Macherey. 2018. [Revisiting Character-Based Neural Machine Translation with Capacity and Compression](http://aclweb.org/anthology/D18-1461). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=1263295983934592415&as_sdt=2005&sciodt=0,5&hl=en): 1)
* Rebecca Knowles and Philipp Koehn. 2018. [Context and Copying in Neural Machine Translation](http://aclweb.org/anthology/D18-1339). In *Proceedings of EMNLP 2018*.
* Matthias Huck, Viktor Hangya, and Alexander Fraser. 2019. [Better OOV Translation with Bilingual Terminology Mining](https://www.aclweb.org/anthology/P19-1581). In *Proceedings of ACL 2019*.
* Changhan Wang, Kyunghyun Cho, Jiatao Gu. 2020. [Neural Machine Translation with Byte-Level Subwords](https://arxiv.org/abs/1909.03341). In *Proceedings of AAAI 2020*
* Duygu Ataman, Wilker Aziz, Alexandra Birch. 2020. [A Latent Morphology Model for Open-Vocabulary Neural Machine Translation](https://openreview.net/forum?id=BJxSI1SKDH). In *Proceedings of ICLR 2020*.
* Xuanli He, Gholamreza Haffari and Mohammad Norouzi. 2020. [Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation](https://arxiv.org/abs/2005.06606). In *Proceedings of ACL 2020*.
* Yingqiang Gao, Nikola I. Nikolov, Yuhuang Hu and Richard H.R. Hahnloser. 2020. [Character-Level Translation with Self-attention](https://arxiv.org/abs/2004.14788). In *Proceedings of ACL 2020*.
* Ivan Provilkov, Dmitrii Emelianenko, Elena Voita. 2020. [BPE-Dropout: Simple and Effective Subword Regularization](https://arxiv.org/abs/1910.13267). In *Proceedings of ACL 2020*.
* Jindřich Libovický, Alexander Fraser. 2020. [Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems](https://www.aclweb.org/anthology/2020.emnlp-main.203/). In *Proceedings of EMNLP 2020*.
Training Objectives and Frameworks
* Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. [Sequence Level Training with Recurrent Neural Networks](https://arxiv.org/pdf/1511.06732). In *Proceedings of ICLR 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=4877899442083611721&as_sdt=2005&sciodt=0,5&hl=en): 373)
* Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. [Multi-task Sequence to Sequence Learning](https://arxiv.org/pdf/1511.06114). In *Proceedings of ICLR 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=6045967109711129604&as_sdt=2005&sciodt=0,5&hl=en): 282)
* Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. [Minimum Risk Training for Neural Machine Translation](http://aclweb.org/anthology/P16-1159). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=13568140432319924245&as_sdt=2005&sciodt=0,5&hl=en): 184)
* Sam Wiseman and Alexander M. Rush. 2016. [Sequence-to-Sequence Learning as Beam-Search Optimization](http://aclweb.org/anthology/D16-1137). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=8919612243620131744&as_sdt=2005&sciodt=0,5&hl=en): 141)
* Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma. 2016. [Dual Learning for Machine Translation](https://papers.nips.cc/paper/6469-dual-learning-for-machine-translation.pdf). In *Proceedings of NIPS 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=15841765927830550600&as_sdt=2005&sciodt=0,5&hl=en): 138)
* Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, and Yoshua Bengio. 2017. [An Actor-Critic Algorithm for Sequence Prediction](https://arxiv.org/pdf/1607.07086). In *Proceedings of ICLR 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=5228204938243984917&as_sdt=2005&sciodt=0,5&hl=en): 167)
* Julia Kreutzer, Artem Sokolov, Stefan Riezler. 2017. [Bandit Structured Prediction for Neural Sequence-to-Sequence Learning](http://aclweb.org/anthology/P17-1138). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=2303245646235792457,8131913197545815057): 11)
* Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. [Dual Supervised Learning](https://arxiv.org/pdf/1707.00415.pdf). In *Proceedings of ICML 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=17907972833117899731): 29)
* Yingce Xia, Jiang Bian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. [Dual Inference for Machine Learning](https://www.ijcai.org/proceedings/2017/0434.pdf). In *Proceedings of IJCAI 2017*. ([Citation](https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=15405750739898389436): 9)
* Di He, Hanqing Lu, Yingce Xia, Tao Qin, Liwei Wang, and Tieyan Liu. 2017. [Decoding with Value Networks for Neural Machine Translation](http://papers.nips.cc/paper/6622-decoding-with-value-networks-for-neural-machine-translation.pdf). In *Proceedings of NIPS 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=9924066051536654397&as_sdt=2005&sciodt=0,5&hl=en): 11)
* Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc’Aurelio Ranzato. 2018. [Classical Structured Prediction Losses for Sequence to Sequence Learning](http://aclweb.org/anthology/N18-1033). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=7858632228846408271&as_sdt=2005&sciodt=0,5&hl=en): 20)
* Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. [From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction](http://aclweb.org/anthology/P18-1155). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0,5&sciodt=0,5&cites=73472736706758753&scipsc=): 1)
* Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. [Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets](http://aclweb.org/anthology/N18-1122). In *Proceedings of NAACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=14312548252804187966&as_sdt=2005&sciodt=0,5&hl=en): 43)
* Kevin Clark, Minh-Thang Luong, Christopher D. Manning, and Quoc Le. 2018. [Semi-Supervised Sequence Modeling with Cross-View Training](http://aclweb.org/anthology/D18-1217). In *Proceedings of EMNLP 2018*.
* Lijun Wu, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. [A Study of Reinforcement Learning for Neural Machine Translation](http://aclweb.org/anthology/D18-1397). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9706797919793848294&as_sdt=2005&sciodt=0,5&hl=en): 2)
* Semih Yavuz, Chung-Cheng Chiu, Patrick Nguyen, and Yonghui Wu. 2018. [CaLcs: Continuously Approximating Longest Common Subsequence for Sequence Level Optimization](http://aclweb.org/anthology/D18-1406). In *Proceedings of EMNLP 2018*.
* Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. [Learning to Teach with Dynamic Loss Functions](https://papers.nips.cc/paper/7882-learning-to-teach-with-dynamic-loss-functions.pdf). In *Proceedings of NeurIPS 2018*.
* Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019. [Multi-Agent Dual Learning](https://openreview.net/pdf?id=HyGhN2A5tm). In *Proceedings of ICLR 2019*.
* Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, and Lawrence Carin. 2019. [Improving Sequence-to-Sequence Learning via Optimal Transport](https://openreview.net/pdf?id=S1xtAjR5tX). In *Proceedings of ICLR 2019*.
* Sachin Kumar and Yulia Tsvetkov. 2019. [Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs](https://openreview.net/pdf?id=rJlDnoA5Y7). In *Proceedings of ICLR 2019*.
* Xing Niu, Weijia Xu, and Marine Carpuat. 2019. [Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation](https://arxiv.org/pdf/1811.01116.pdf). In *Proceedings of NAACL 2019*.
* Weijia Xu, Xing Niu, and Marine Carpuat. 2019. [Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation](https://arxiv.org/pdf/1904.04079.pdf). In *Proceedings of NAACL 2019*.
* Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili, and Massimo Piccardi. [ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems](https://arxiv.org/pdf/1904.02461.pdf). In *Proceedings of NAACL 2019*.
* Reuben Cohn-Gordon and Noah Goodman. 2019. [Lost in Machine Translation: A Method to Reduce Meaning Loss](https://arxiv.org/pdf/1902.09514.pdf). In *Proceedings of NAACL 2019*.
* Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabas Poczos, and Tom M. Mitchell. 2019. [Competence-based Curriculum Learning for Neural Machine Translation](https://arxiv.org/pdf/1903.09848.pdf). In *Proceedings of NAACL 2019*.
* Gaurav Kumar, George Foster, Colin Cherry, and Maxim Krikun. 2019. [Reinforcement Learning based Curriculum Optimization for Neural Machine Translation](https://arxiv.org/pdf/1903.00041.pdf). In *Proceedings of NAACL 2019*.
* Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit. 2019. [Insertion Transformer: Flexible Sequence Generation via Insertion Operations](http://proceedings.mlr.press/v97/stern19a/stern19a.pdf). In *Proceedings of ICML 2019*.
* Laura Jehl, Carolin Lawrence, and Stefan Riezler. 2019. [Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss](https://arxiv.org/pdf/1907.03748). *Transactions of the Association for Computational Linguistics*.
* Motoki Sato, Jun Suzuki, and Shun Kiyono. 2019. [Effective Adversarial Regularization for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1020). In *Proceedings of ACL 2019*.
* Kehai Chen, Rui Wang, Masao Utiyama, and Eiichiro Sumita. 2019. [Neural Machine Translation with Reordering Embeddings](https://www.aclweb.org/anthology/P19-1174). In *Proceedings of ACL 2019*.
* Bram Bulte and Arda Tezcan. 2019. [Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation](https://www.aclweb.org/anthology/P19-1175). In *Proceedings of ACL 2019*.
* Mingming Yang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Min Zhang, and Tiejun Zhao. 2019. [Sentence-Level Agreement for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1296). In *Proceedings of ACL 2019*.
* Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. 2019. [Bridging the Gap between Training and Inference for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1426). In *Proceedings of ACL 2019*.
* John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel, and Graham Neubig. 2019. [Beyond BLEU:Training Neural Machine Translation with Semantic Similarity](https://www.aclweb.org/anthology/P19-1427). In *Proceedings of ACL 2019*.
* Zonghan Yang, Yong Cheng, Yang Liu, Maosong Sun. 2019. [Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach](https://www.aclweb.org/anthology/P19-1623). In *Proceedings of ACL 2019*.
* Kyra Yee, Nathan Ng, Yann N. Dauphin, and Michael Auli. 2019. [Simple and Effective Noisy Channel Modeling for Neural Machine Translation](https://arxiv.org/pdf/1908.05731). In *Proceedings of EMNLP 2019*.
* Sarthak Garg, Stephan Peitz, Udhyakumar Nallasamy, and Matthias Paulik. 2019. [Jointly Learning to Align and Translate with Transformer Models](https://arxiv.org/pdf/1909.02074). In *Proceedings of EMNLP 2019*.
* Tianchi Bi, Hao Xiong, Zhongjun He, Hua Wu and Haifeng Wang. 2019. [Multi-agent Learning for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1079.pdf). In *Proceedings of EMNLP 2019*.
* Zaixiang Zheng, Shujian Huang, Zhaopeng Tu, Xin-Yu Dai, and Jiajun Chen. 2019. [Dynamic Past and Future for Neural Machine Translation](https://www.aclweb.org/anthology/D19-1086/). In *Proceedings of EMNLP 2019*.
* Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Cheng Xiang Zhai, Tie-Yan Liu. 2019. [Neural Machine Translation with Soft Prototype](https://papers.nips.cc/paper/8861-neural-machine-translation-with-soft-prototype). In *Proceedings of NeurIPS 2019*.
* Mingjun Zhao, Haijiang Wu, Di Niu and Xiaoli Wang. 2020. [Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models](https://sites.ualberta.ca/~dniu/Homepage/Publications_files/AAAI-ZhaoM.7640.pdf). In *Proceedings of AAAI 2020*.
* Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang and Dong Yu. 2020. [Modeling Fluency and Faithfulness for Diverse Neural Machine Translation](https://arxiv.org/abs/1912.00178). In *Proceedings of AAAI 2020*.
* Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend. 2020. [On the Weaknesses of Reinforcement Learning for Neural Machine Translation](https://openreview.net/forum?id=H1eCw3EKvH). In *Proceedings of ICLR 2020*.
* Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen. 2020. [Mirror-Generative Neural Machine Translation](https://openreview.net/forum?id=HkxQRTNYPH). In *Proceedings of ICLR 2020*.
* Angela Fan, Edouard Grave, Armand Joulin. 2020. [Reducing Transformer Depth on Demand with Structured Dropout](https://openreview.net/forum?id=SylO2yStDr). In *Proceedings of ICLR 2020*.
* Yikai Zhou, Baosong Yang, Derek F. Wong, Yu Wan and Lidia S. Chao. 2020. [Uncertainty-Aware Curriculum Learning for Neural Machine Translation](https://arxiv.org/abs/1903.09848). In *Proceedings of ACL 2020*.
* Hongfei Xu, Josef van Genabith, Deyi Xiong and Qiuhui Liu. 2020. [Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change](https://arxiv.org/abs/2005.02008). In *Proceedings of ACL 2020*.
* Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong and Jingyi Zhang. 2020. [Lipschitz Constrained Parameter Initialization for Deep Transformers](https://arxiv.org/abs/1911.03179). In *Proceedings of ACL 2020*.
* Xintong Li, Lemao Liu, Rui Wang, Guoping Huang and Max Meng. 2020. [Regularized Context Gates on Transformer for Machine Translation](https://arxiv.org/abs/1908.11020). In *Proceedings of ACL 2020*.
* Sheng Shen, Zhewei Yao, Amir Gholami, Michael Mahoney, and Kurt Keutzer. 2020. [Rethinking Batch Normalization in Transformers](https://arxiv.org/abs/2003.07845). In *Proceedings of ICML 2020*.
* Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao. 2020. [Norm-Based Curriculum Learning for Neural Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.41/). In *Proceedings of ACL 2020*.
* Rongxiang Weng, Heng Yu, Xiangpeng Wei, Weihua Luo. 2020. [Towards Enhancing Faithfulness for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.212/). In *Proceedings of EMNLP 2020*.
* Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen. 2020. [Self-Paced Learning for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.80/). In *Proceedings of EMNLP 2020*.
* Wenxiang Jiao, Xing Wang, Shilin He, Irwin King, Michael Lyu, Zhaopeng Tu. 2020. [Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.176/). In *Proceedings of EMNLP 2020*.
* Shuhao Gu, Jinchao Zhang, Fandong Meng, Yang Feng, Wanying Xie, Jie Zhou, Dong Yu. 2020. [Token-level Adaptive Training for Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.76/). In *Proceedings of EMNLP 2020*.
* Xiao Pan, Mingxuan Wang, Liwei Wu, Lei Li. 2021. [Contrastive Learning for Many-to-many Multilingual Neural Machine Translation](https://aclanthology.org/2021.acl-long.21/). In *Proceedings of ACL 2021* .
* Zehui Lin , Liwei Wu , Mingxuan Wang, Lei Li . 2021. [Learning Language Specific Sub-netswork for Multilingual Machine Translation](https://aclanthology.org/2021.acl-long.25.pdf) . In *Proceedings of ACL 2021* .
Decoding
* Mingxuan Wang, Zhengdong Lu, Hang Li, and Qun Liu. 2016. [Memory-enhanced Decoder for Neural Machine Translation](http://aclweb.org/anthology/D16-1027). In *Proceedings of EMNLP 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=8953099567327192144&as_sdt=5,43&sciodt=0,43&hl=en): 30)
* Shonosuke Ishiwatari, Jingtao Yao, Shujie Liu, Mu Li, Ming Zhou, Naoki Yoshinaga, Masaru Kitsuregawa, and Weijia Jia. 2017. [Chunk-based Decoder for Neural Machine Translation](http://aclweb.org/anthology/P17-1174). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=12622466792413888553&as_sdt=5,43&sciodt=0,43&hl=en): 4)
* Hao Zhou, Zhaopeng Tu, Shujian Huang, Xiaohua Liu, Hang Li, and Jiajun Chen. 2017. [Chunk-Based Bi-Scale Decoder for Neural Machine Translation](http://aclweb.org/anthology/P17-2092). In *Proceedings of ACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=15037334213705032139&as_sdt=5,43&sciodt=0,43&hl=en): 6)
* Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, and Alex Smola. 2017. [Neural Machine Translation with Recurrent Attention Modeling](http://aclweb.org/anthology/E17-2061). In *Proceedings of EACL 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=5621977008323303060&as_sdt=5,43&sciodt=0,43&hl=en): 25)
* Markus Freitag and Yaser Al-Onaizan. 2017. [Beam Search Strategies for Neural Machine Translation](http://aclweb.org/anthology/W17-3207). In *Proceedings of the First Workshop on Neural Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=9963996198070293328&as_sdt=5,43&sciodt=0,43&hl=en): 14)
* Rajen Chatterjee, Matteo Negri, Marco Turchi, Marcello Federico, Lucia Specia, and Frédéric Blain. 2017. [Guiding Neural Machine Translation Decoding with External Knowledge](http://aclweb.org/anthology/W17-4716). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=16027327382881304751&as_sdt=5,43&sciodt=0,43&hl=en): 8)
* Cong Duy Vu Hoang, Gholamreza Haffari, and Trevor Cohn. 2017. [Towards Decoding as Continuous Optimisation in Neural Machine Translation](http://aclweb.org/anthology/D17-1014). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=3256665477810901088&as_sdt=5,43&sciodt=0,43&hl=en): 4)
* Yin-Wen Chang and Michael Collins. 2017. [Source-Side Left-to-Right or Target-Side Left-to-Right? An Empirical Comparison of Two Phrase-Based Decoding Algorithms](http://aclweb.org/anthology/D17-1157). In *Proceedings of EMNLP 2017*.
* Jiatao Gu, Kyunghyun Cho, and Victor O.K. Li. 2017. [Trainable Greedy Decoding for Neural Machine Translation](http://aclweb.org/anthology/D17-1210). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=8731447567218149379&as_sdt=2005&sciodt=0,5&hl=en): 18)
* Huda Khayrallah, Gaurav Kumar, Kevin Duh, Matt Post, and Philipp Koehn. 2017. [Neural Lattice Search for Domain Adaptation in Machine Translation](http://www.aclweb.org/anthology/I17-2004). In *Proceedings of IJCNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cluster=1478484647323458623&hl=zh-CN&as_sdt=0,5): 4)
* Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, and Noam Shazeer. 2018. [Fast Decoding in Sequence Models Using Discrete Latent Variables](https://arxiv.org/pdf/1803.03382.pdf). In *Proceedings of ICML 2018*. ([Citation](https://scholar.google.com/scholar?cites=4042994175439965815&as_sdt=2005&sciodt=0,5&hl=en): 3)
* Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, and Hongji Wang. 2018. [Asynchronous Bidirectional Decoding for Neural Machine Translation](https://arxiv.org/pdf/1801.05122). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=8717464809531813198&as_sdt=2005&sciodt=0,5&hl=en): 10)
* Jiatao Gu, Daniel Jiwoong Im, and Victor O.K. Li. 2018. [Neural machine translation with gumbel-greedy decoding](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17299/16059). In *Proceedings of AAAI 2018*. ([Citation](https://scholar.google.com/scholar?cites=13306026917760415053&as_sdt=2005&sciodt=0,5&hl=en): 5)
* Philip Schulz, Wilker Aziz, and Trevor Cohn. 2018. [A Stochastic Decoder for Neural Machine Translation](http://aclweb.org/anthology/P18-1115). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=2090499795836532737&as_sdt=2005&sciodt=0,5&hl=en): 3)
* Raphael Shu and Hideki Nakayama. 2018. [Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation](http://aclweb.org/anthology/P18-2054). In *Proceedings of ACL 2018*.
* Junyang Lin, Xu Sun, Xuancheng Ren, Shuming Ma, Jinsong Su, and Qi Su. 2018. [Deconvolution-Based Global Decoding for Neural Machine Translation](http://aclweb.org/anthology/C18-1276). In *Proceedings of COLING 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=7984371866238647123&as_sdt=2005&sciodt=0,5&hl=en): 2)
* Chunqi Wang, Ji Zhang, and Haiqing Chen. 2018. [Semi-Autoregressive Neural Machine Translation](http://aclweb.org/anthology/D18-1044). In *Proceedings of EMNLP 2018*.
* Xinwei Geng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2018. [Adaptive Multi-pass Decoder for Neural Machine Translation](http://aclweb.org/anthology/D18-1048). In *Proceedings of EMNLP 2018*.
* Wen Zhang, Liang Huang, Yang Feng, Lei Shen, and Qun Liu. 2018. [Speeding Up Neural Machine Translation Decoding by Cube Pruning](http://aclweb.org/anthology/D18-1460). In *Proceedings of EMNLP 2018*.
* Xinyi Wang, Hieu Pham, Pengcheng Yin, and Graham Neubig. 2018. [A Tree-based Decoder for Neural Machine Translation](http://aclweb.org/anthology/D18-1509). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=9083843868999368969&as_sdt=2005&sciodt=0,5&hl=en): 1)
* Chenze Shao, Xilin Chen, and Yang Feng. 2018. [Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation](http://aclweb.org/anthology/D18-1510). In *Proceedings of EMNLP 2018*.
* Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Hai Zhao. 2018. [Exploring Recombination for Efficient Decoding of Neural Machine Translation](http://aclweb.org/anthology/D18-1511). In *Proceedings of EMNLP 2018*.
* Jetic Gū, Hassan S. Shavarani, and Anoop Sarkar. 2018. [Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing](http://aclweb.org/anthology/D18-1037). In *Proceedings of EMNLP 2018*.
* Yilin Yang, Liang Huang, and Mingbo Ma. 2018. [Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation](http://aclweb.org/anthology/D18-1342). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=7003078853740771503&as_sdt=2005&sciodt=0,5&hl=en): 3)
* Yun Chen, Victor O.K. Li, Kyunghyun Cho, and Samuel R. Bowman. 2018. [A Stable and Effective Learning Strategy for Trainable Greedy Decoding](http://aclweb.org/anthology/D18-1035). In *Proceedings of EMNLP 2018*.
* Wouter Kool, Herke van Hoof, and Max Welling. 2019. [Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement](http://proceedings.mlr.press/v97/kool19a/kool19a.pdf). In *Proceedings of ICML 2019*.
* Ashwin Kalyan, Peter Anderson, Stefan Lee, and Dhruv Batra. 2019. [Trainable Decoding of Sets of Sequences for Neural Sequence Models](http://proceedings.mlr.press/v97/kalyan19a/kalyan19a.pdf). In *Proceedings of ICML 2019*.
* Eldan Cohen and Christopher Beck. 2019. [Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models](http://proceedings.mlr.press/v97/cohen19a/cohen19a.pdf). In *Proceedings of ICML 2019*.
* Kartik Goyal, Chris Dyer, and Taylor Berg-Kirkpatrick. 2019. [An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search](https://arxiv.org/pdf/1904.06834.pdf). In *Proceedings of NAACL 2019*.
* Mingbo Ma, Renjie Zheng, and Liang Huang. 2019. [Learning to Stop in Structured Prediction for Neural Machine Translation](https://arxiv.org/pdf/1904.01032.pdf). In *Proceedings of NAACL 2019*.
* Han Fu, Chenghao Liu, and Jianling Sun. 2019. [Reference Network for Neural Machine Translation](https://www.aclweb.org/anthology/P19-1287). In *Proceedings of ACL 2019*.
* Long Zhou, Jiajun Zhang, and Chengqing Zong. 2019. [Synchronous Bidirectional Neural Machine Translation](https://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00256). *Transactions of the Association for Computational Linguistics*.
* Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, and Zhi-Hong Deng. 2019. [Fast Structured Decoding for Sequence Models](https://arxiv.org/pdf/1910.11555). In *Proceedings of NeurIPS 2019*.
* Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman and Kevin Gimpel. 2020. [ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation](http://arxiv.org/abs/2005.00850). In *Proceedings of ACL 2020*.
* Pinzhen Chen, Nikolay Bogoychev, Kenneth Heafield, and Faheem Kirefu. 2020. [Parallel Sentence Mining by Constrained Decoding](https://www.aclweb.org/anthology/2020.acl-main.152/). In *Proceedings of ACL 2020*.
* Julia Kreutzer, George Foster, Colin Cherry. 2020. [Inference Strategies for Machine Translation with Conditional Masking](https://www.aclweb.org/anthology/2020.emnlp-main.465/). In *Proceedings of EMNLP 2020*.
* Yuntian Deng, Alexander Rush. 2020. [Cascaded Text Generation with Markov Transformers](https://papers.nips.cc/paper/2020/file/01a0683665f38d8e5e567b3b15ca98bf-Paper.pdf). In *Proceedings of NeurIPS 2020*.
* Clara Meister, Ryan Cotterell, Tim Vieira. 2020. [If beam search is the answer, what was the question?](https://www.aclweb.org/anthology/2020.emnlp-main.170/). In *Proceedings of EMNLP 2020*.
* Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis. 2021. [Nearest Neighbor Machine Translation](https://openreview.net/pdf?id=7wCBOfJ8hJM). In *Proceedings of ICLR 2021*.
* Mathias Muller , Rico Sennrich. 2021. [Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation](https://aclanthology.org/2021.acl-long.22/). In *Proceedings of ACL 2021*.
* Hongfei Xu, Qiuhui Liu , Josef van Genabith , Deyi Xiong , Meng Zhang . 2021. [Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation](https://aclanthology.org/2021.acl-long.23/). In *Proceedings of ACL 2021*.
* Yang Feng , Shuhao Gu , Dengji Guo , Zhengxin Yang , Chenze Shao .2021. [Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation](https://aclanthology.org/2021.acl-long.223.pdf) . In *Proceedings of ACL 2021*.
Low-resource Language Translation
* Rico Sennrich and Biao Zhang. 2019. [Revisiting Low-Resource Neural Machine Translation: A Case Study](https://arxiv.org/pdf/1905.11901). In *Proceedings of ACL 2019*.
* Danni Liu , Jan Niehues , James Cross , Francisco Guzman , Xian Li . 2021. [Improving Zero-Shot Translation by Disentangling Positional Information](https://aclanthology.org/2021.acl-long.101.pdf) . In *Proceedings of ACL 2021*.
Semi-supervised Learning
* Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. [Improving Neural Machine Translation Models with Monolingual Data](https://arxiv.org/pdf/1511.06709). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=16647011114557315277&as_sdt=2005&sciodt=0,5&hl=en): 220)
* Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. [Semi-Supervised Learning for Neural Machine Translation](http://aclweb.org/anthology/P16-1185). In *Proceedings of ACL 2016*. ([Citation](https://scholar.google.com.hk/scholar?cites=4238720597816763796&as_sdt=2005&sciodt=0,5&hl=en): 59)
* Tobias Domhan and Felix Hieber. 2017. [Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning](http://aclweb.org/anthology/D17-1158). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=3638267208501348823&as_sdt=2005&sciodt=0,5&hl=en): 11)
* Anna Currey, Antonio Valerio Miceli Barone, and Kenneth Heafield. 2017. [Copied Monolingual Data Improves Low-Resource Neural Machine Translation](http://aclweb.org/anthology/W17-4715). In *Proceedings of the Second Conference on Machine Translation*. ([Citation](https://scholar.google.com.hk/scholar?cites=5102771697654796737&as_sdt=2005&sciodt=0,5&hl=en): 14)
* Shuo Wang, Yang Liu, Chao Wang, Huanbo Luan, and Maosong Sun. 2019. [Improving Back-Translation with Uncertainty-based Confidence Estimation](https://arxiv.org/pdf/1909.00157). In *Proceedings of EMNLP 2019*.
* Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen. 2020. [Mirror-Generative Neural Machine Translation](https://openreview.net/forum?id=HkxQRTNYPH). In *Proceedings of ICLR 2020*.
Unsupervised Learning
* Nima Pourdamghani and Kevin Knight. 2017. [Deciphering Related Languages](http://aclweb.org/anthology/D17-1266). In *Proceedings of EMNLP 2017*. ([Citation](https://scholar.google.com.hk/scholar?cites=1168382888604094286&as_sdt=2005&sciodt=0,5&hl=en): 5)
* Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. [Unsupervised Neural Machine Translation](https://openreview.net/pdf?id=Sy2ogebAW). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=6109181985493123662&as_sdt=2005&sciodt=0,5&hl=en): 78)
* Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2018. [Unsupervised Machine Translation Using Monolingual Corpora Only](https://openreview.net/pdf?id=rkYTTf-AZ). In *Proceedings of ICLR 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=682955820897938264&as_sdt=2005&sciodt=0,5&hl=en): 78)
* Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. [Unsupervised Neural Machine Translation with Weight Sharing](http://aclweb.org/anthology/P18-1005). In *Proceedings of ACL 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=16608767535553803928&as_sdt=2005&sciodt=0,5&hl=en): 6)
* Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2018. [Phrase-Based & Neural Unsupervised Machine Translation](http://aclweb.org/anthology/D18-1549). In *Proceedings of EMNLP 2018*. ([Citation](https://scholar.google.com.hk/scholar?cites=17725098892021008539&as_sdt=2005&sciodt=0,5&hl=en): 24)
* Iftekhar Naim, Parker Riley, and Daniel Gildea. 2018. [Feature-Based Decipherment for Machine Translation](http://aclweb.org/anthology/J18-3006). *Computational Linguistics*. ([Citation](https://scholar.google.com.hk/scholar?cites=17725098892021008539&as_sdt=2005&sciodt=0,5&hl=en): 24)
* Jiawei Wu, Xin Wang, and William Yang Wang. 2019. [Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation](https://arxiv.org/pdf/1904.02331.pdf). In *Proceedings of NAACL 2019*.
* Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, and Jonathan May. 2019. [Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation](https://arxiv.org/pdf/1906.05683). In *Proceedings of ACL 2019*.
* Jiaming Luo, Yuan Cao, and Regina Barzilay. 2019. [Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B](https://arxiv.org/pdf/1906.06718). In *Proceedings of ACL 2019*.
* Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, and Tie-Yan Liu. 2019. [Unsupervised Pivot Translation for Distant Languages](https://www.aclweb.org/anthology/P19-1017). In *Proceedings of ACL 2019*.
* Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. [An Effective Approach to Unsupervised Machine Translation](https://www.aclweb.org/anthology/P19-1019). In *Proceedings of ACL 2019*.
* Viktor Hangya and Alexander Fraser. 2019. [Unsupervised Parallel Sentence Extraction with Parallel Segment Detection Helps Machine Translation](https://www.aclweb.org/anthology/P19-1119). In *Proceedings of ACL 2019*.
* Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, and Tiejun Zhao. 2019. [Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation](https://www.aclweb.org/anthology/P19-1119). In *Proceedings of ACL 2019*.
* Jiatao Gu, Yong Wang, Kyunghyun Cho, and Victor O.K. Li. 2019. [Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations](https://www.aclweb.org/anthology/P19-1121). In *Proceedings of ACL 2019*.
* Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2019. [Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders](https://www.aclweb.org/anthology/P19-1297). In *Proceedings of ACL 2019*.
* Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou and Shuai Ma. 2019. [Explicit Cross-lingual Pre-training for Unsupervised Machine Translation](https://www.aclweb.org/anthology/D19-1071.pdf). In *Proceedings of EMNLP 2019*.
* Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita and Tiejun Zhao. 2020. [Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation](https://arxiv.org/abs/2004.10171). In *Proceedings of ACL 2020*.
* Xiangyu Duan, Baijun Ji, Hao Jia, Min Tan, Min Zhang, Boxing Chen, Weihua Luo and Yue Zhang. 2020. [Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences](https://www.aclweb.org/anthology/2020.acl-main.143/). In *Proceedings of ACL 2020*.
* Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou and Shuai Ma. 2020. [A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation](https://www.aclweb.org/anthology/2020.acl-main.320/). In *Proceedings of ACL 2020*.
* Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser. 2020. [Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT](https://www.aclweb.org/anthology/2020.emnlp-main.214/). In *Proceedings of EMNLP 2020*.
* Jerin Philip, Alexandre Berard, Matthias Gallé, Laurent Besacier. 2020. [Monolingual Adapters for Zero-Shot Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.361/). In *Proceedings of EMNLP 2020*.
* Dana Ruiter, Josef van Genabith, Cristina España-Bonet. 2020. [Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation](https://www.aclweb.org/anthology/2020.emnlp-main.202