Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation By Jointly Learning To Align and Translate. In ICLR, 1–15. 2014. URL:, arXiv:1409.0473, doi:10.1146/annurev.neuro.26.041002.131047.


Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. Convolutional sequence to sequence learning. CoRR, 2017. URL:, arXiv:1705.03122.


Yang Liu and Mirella Lapata. Learning structured text representations. CoRR, 2017. URL:, arXiv:1705.09207.


Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective Approaches to Attention-based Neural Machine Translation. In Proc of EMNLP. 2015.


Minh-Thang Luong, Ilya Sutskever, Quoc Le, Oriol Vinyals, and Wojciech Zaremba. Addressing the Rare Word Problem in Neural Machine Translation. In Proc of ACL. 2015.


Abigail See, Peter J. Liu, and Christopher D. Manning. Get to the point: summarization with pointer-generator networks. CoRR, 2017. URL:, arXiv:1704.04368.


Rico Sennrich and Barry Haddow. Linguistic input features improve neural machine translation. arXiv preprint arXiv:1606.02892, 2016.


Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, 2017. URL:, arXiv:1706.03762.


Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. Switchout: an efficient data augmentation algorithm for neural machine translation. CoRR, 2018. URL:, arXiv:1808.07512.


Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, and others. Google's neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.


Biao Zhang, Deyi Xiong, and Jinsong Su. Accelerating neural transformer via an average attention network. CoRR, 2018. URL:, arXiv:1805.00631.