A PREPRINT - JULY 8, 2020
[Agi
´
c and Vuli
´
c, 2019]
Agi
´
c,
ˇ
Z. and Vuli
´
c, I. (2019). JW300: A wide-coverage parallel corpus for low-resource
languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages
3204–3210, Florence, Italy. Association for Computational Linguistics.
[Aroonmanakun et al., 2007]
Aroonmanakun, W. et al. (2007). Thoughts on word and sentence segmentation in thai.
In Proceedings of the Seventh Symposium on Natural language Processing, Pattaya, Thailand, December 13–15,
pages 85–90.
[Bahdanau et al., 2014]
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning
to align and translate. ArXiv, 1409.
[Byrne et al., 2019]
Byrne, B., Krishnamoorthi, K., Sankar, C., Neelakantan, A., Duckworth, D., Yavuz, S., Goodrich,
B., Dubey, A., Cedilnik, A., and Kim, K.-Y. (2019). Taskmaster-1: Toward a realistic and diverse dialog dataset.
arXiv preprint arXiv:1909.05358.
[Cettolo et al., 2015]
Cettolo, M., Niehues, J., St
¨
uker, S., Bentivogli, L., Cattoni, R., and Federico, M. (2015). The
iwslt 2015 evaluation campaign.
[Chen and Kan, 2011]
Chen, T. and Kan, M.-Y. (2011). Creating a live, public short message service corpus: The nus
sms corpus. Language Resources and Evaluation, 47.
[Christodouloupoulos and Steedman, 2015] Christodouloupoulos, C. and Steedman, M. (2015). A massively parallel
corpus: The bible in 100 languages. Lang. Resour. Eval., 49(2):375–395.
[Dolan and Brockett, 2005]
Dolan, W. B. and Brockett, C. (2005). Automatically constructing a corpus of sentential
paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
[Espl and Transducens, 2009]
Espl, M. and Transducens, G. (2009). Bitextor, a free/open-source software to harvest
translation memories from multilingual websites.
[Espl
`
a et al., 2019]
Espl
`
a, M., Forcada, M., Ram
´
ırez-S
´
anchez, G., and Hoang, H. (2019). ParaCrawl: Web-scale
parallel corpora for the languages of the EU. In Proceedings of Machine Translation Summit XVII Volume 2:
Translator, Project and User Tracks, pages 118–119, Dublin, Ireland. European Association for Machine Translation.
[Gehring et al., 2017]
Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. (2017). Convolutional
sequence to sequence learning. CoRR, abs/1705.03122.
[Hassan et al., 2018]
Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-
Dowmunt, M., Lewis, W., Li, M., Liu, S., Liu, T.-Y., Luo, R., Menezes, A., Qin, T., Seide, F., Tan, X., Tian, F., Wu,
L., Wu, S., Xia, Y., Zhang, D., Zhang, Z., and Zhou, M. (2018). Achieving human parity on automatic chinese to
english news translation. ArXiv, abs/1803.05567.
[Keskar et al., 2019]
Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., and Socher, R. (2019). Ctrl: A conditional
transformer language model for controllable generation.
[Koehn, 2005]
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In MT summit,
volume 5, pages 79–86. Citeseer.
[Koehn and Knowles, 2017]
Koehn, P. and Knowles, R. (2017). Six challenges for neural machine translation. In
Proceedings of the First Workshop on Neural Machine Translation, pages 28–39, Vancouver. Association for
Computational Linguistics.
[Kudo and Richardson, 2018]
Kudo, T. and Richardson, J. (2018). SentencePiece: A simple and language independent
subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium. Association for
Computational Linguistics.
[Lison and Tiedemann, 2016]
Lison, P. and Tiedemann, J. (2016). OpenSubtitles2016: Extracting large parallel corpora
from movie and TV subtitles. In Proceedings of the Tenth International Conference on Language Resources and
Evaluation (LREC’16), pages 923–929, Portoro
ˇ
z, Slovenia. European Language Resources Association (ELRA).
16