Dual Learning for Machine Translation and Beyond

Qin, Tao

doi:10.1007/978-981-15-8884-6_4

Dual Learning for Machine Translation and Beyond

Tao Qin²

Chapter
First Online: 14 November 2020

1420 Accesses

Abstract

As aforementioned, dual learning has been studied and applied in many applications, including machine translation, image translation, speech processing, text summarization, code generation and commenting, etc. In this chapter, we focus on machine translation, which is the first application it was studied and also one of the best applications it can fit into. We introduce several representative works based on the dual reconstruction principle for semi-supervised and unsupervised neural machine translation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We focus on text translation in this chapter.
2.
https://en.wikipedia.org/wiki/Georgetown-IBM_experiment.
3.
1+ million bilingual sentence pairs were used to pre-train the two models in [15].
4.
https://www.ethnologue.com/guides/how-many-languages.
5.
Since both translation models are not perfect, we can only get a noisy translation for a sentence.
6.
The public implementation can be found at https://github.com/artetxem/vecmap.

References

Artetxe, M., Labaka, G., & Agirre, E. (2017). Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp. 451–462).
Google Scholar
Artetxe, M., Labaka, G., Agirre, E., & Cho, K. (2018). Unsupervised neural machine translation. In 6th International Conference on Learning Representations.
Google Scholar
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.
Google Scholar
Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J., et al. (1990). A statistical approach to machine translation. Computational Linguistics, 16(2), 79–85.
Google Scholar
Cao, R., Zhu, S., Liu, C., Li, J., & Yu, K. (2019). Semantic parsing with dual learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 51–64).
Google Scholar
Cao, R., Zhu, S., Yang, C., Liu, C., Ma, R., Zhao, Y., et al. (2020). Unsupervised dual paraphrasing for two-stage semantic parsing. Preprint. arXiv:2005.13485.
Google Scholar
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1724–1734).
Google Scholar
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., & Jégou, H. (2017). Word translation without parallel data. Preprint. arXiv:1710.04087.
Google Scholar
Dietterich, T. G. (2002). Ensemble learning. The Handbook of Brain Theory and Neural Networks, 2 (pp. 110–125). MIT Press.
Google Scholar
Edunov, S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 489–500).
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (Vol. 70, pp. 1243–1252). JMLR. org.
Google Scholar
Gulcehre, C., Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H.-C., et al. (2015). On using monolingual corpora in neural machine translation. Preprint. arXiv:1503.03535.
Google Scholar
Gulcehre, C., Firat, O., Xu, K., Cho, K., & Bengio, Y. (2017). On integrating a language model into neural machine translation. Computer Speech & Language, 45, 137–148.
Article Google Scholar
Hassan Awadalla, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., et al. (March 2018). Achieving human parity on automatic chinese to English news translation. arXiv:1803.05567.
Google Scholar
He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.-Y., et al. (2016). Dual learning for machine translation. In Advances in Neural Information Processing Systems (pp. 820–828).
Google Scholar
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. Preprint. arXiv:1503.02531.
Google Scholar
Jia, R., & Liang, P. (2016). Data recombination for neural semantic parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12–22).
Google Scholar
Kim, Y., & Rush, A. M. (2016). Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1317–1327).
Google Scholar
Koehn, P. (2009). Statistical machine translation. New York: Cambridge University Press.
Book Google Scholar
Lample, G., Conneau, A., Denoyer, L., & Ranzato, M. (2018). Unsupervised machine translation using monolingual corpora only. In 6th International Conference on Learning Representations, ICLR 2018.
Google Scholar
Lample, G., Ott, M., Conneau, A., Denoyer, L., & Ranzato, M. (2018). Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31–November 4, 2018 (pp. 5039–5049).
Google Scholar
Luo, F., Li, P., Yang, P., Zhou, J., Tan, Y., Chang, B., et al. (2019). Towards fine-grained text sentiment transfer. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2013–2022).
Google Scholar
Luo, F., Li, P., Zhou, J., Yang, P., Chang, B., Sun, X., et al. (2019). A dual reinforcement learning framework for unsupervised text style transfer. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 5116–5122). AAAI Press.
Google Scholar
Meng, C., Ren, P., Chen, Z., Sun, W., Ren, Z., Tu, Z., et al. (2020). Dukenet: A dual knowledge interaction network for knowledge-grounded conversation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1151–1160).
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).
Google Scholar
Nirenburg, S. (1989). Knowledge-based machine translation. Machine Translation, 4(1), 5–24.
Article Google Scholar
Nirenburg, S., Carbonell, J., Tomita, M., & Goodman, K. (1994). Machine translation: A knowledge-based approach. San Mateo, CA: Morgan Kaufmann Publishers Inc.
Google Scholar
Ranzato, M., Chopra, S., Auli, M., & Zaremba, W. (2015) Sequence level training with recurrent neural networks. Preprint. arXiv:1511.06732.
Google Scholar
Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 379–389).
Google Scholar
Sennrich, R., Haddow, B., & Birch, A. (2016). Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 86–96).
Google Scholar
Sestorain, L., Ciaramita, M., Buck, C., & Hofmann, T. (2018). Zero-shot dual machine translation. Preprint. arXiv:1805.10338.
Google Scholar
Shen, L., & Feng, Y. (2020). CDL: Curriculum dual learning for emotion-controllable response generation. Preprint. arXiv:2005.00329.
Google Scholar
Su, S.-Y., Huang, C.-W., & Chen, Y.-N. (2020). Towards unsupervised language understanding and generation by joint dual learning. In ACL 2020: 58th Annual Meeting of the Association for Computational Linguistics (pp. 671–680).
Google Scholar
Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association.
Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104–3112).
Google Scholar
Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, (pp. 1057–1063).
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008).
Google Scholar
Wang, Y., Xia, Y, He, T., Tian, F., Qin, T., Xiang Zhai, C., et al. (2019). Multi-agent dual learning. In 7th International Conference on Learning Representations, ICLR 2019.
Google Scholar
Yang, M., Zhao, Z., Zhao, W., Chen, X., Zhu, J., Zhou, L., et al. (2017). Personalized response generation via domain adaptation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1021–1024).
Google Scholar
Zelle, J. M., & Mooney, R. J. (1996). Learning to parse database queries using inductive logic programming. In Proceedings of the National Conference on Artificial Intelligence (pp. 1050–1055).
Google Scholar
Zhang, S., & Bansal, M. (2019). Addressing semantic drift in question generation for semi-supervised question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 2495–2509).
Google Scholar
Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. New York: CRC Press.
Book Google Scholar
Zhu, S., Cao, R., & Yu, K. (2020). Dual learning for semi-supervised natural language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 28, 1936–1947.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Asia (China), Beijing, China
Tao Qin

Authors

Tao Qin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Qin, T. (2020). Dual Learning for Machine Translation and Beyond. In: Dual Learning. Springer, Singapore. https://doi.org/10.1007/978-981-15-8884-6_4

Download citation

DOI: https://doi.org/10.1007/978-981-15-8884-6_4
Published: 14 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8883-9
Online ISBN: 978-981-15-8884-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics