Skip to main content

Dual Learning for Machine Translation and Beyond

  • Chapter
  • First Online:
  • 1420 Accesses

Abstract

As aforementioned, dual learning has been studied and applied in many applications, including machine translation, image translation, speech processing, text summarization, code generation and commenting, etc. In this chapter, we focus on machine translation, which is the first application it was studied and also one of the best applications it can fit into. We introduce several representative works based on the dual reconstruction principle for semi-supervised and unsupervised neural machine translation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We focus on text translation in this chapter.

  2. 2.

    https://en.wikipedia.org/wiki/Georgetown-IBM_experiment.

  3. 3.

    1+ million bilingual sentence pairs were used to pre-train the two models in [15].

  4. 4.

    https://www.ethnologue.com/guides/how-many-languages.

  5. 5.

    Since both translation models are not perfect, we can only get a noisy translation for a sentence.

  6. 6.

    The public implementation can be found at https://github.com/artetxem/vecmap.

References

  1. Artetxe, M., Labaka, G., & Agirre, E. (2017). Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp. 451–462).

    Google Scholar 

  2. Artetxe, M., Labaka, G., Agirre, E., & Cho, K. (2018). Unsupervised neural machine translation. In 6th International Conference on Learning Representations.

    Google Scholar 

  3. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.

    Google Scholar 

  4. Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J., et al. (1990). A statistical approach to machine translation. Computational Linguistics, 16(2), 79–85.

    Google Scholar 

  5. Cao, R., Zhu, S., Liu, C., Li, J., & Yu, K. (2019). Semantic parsing with dual learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 51–64).

    Google Scholar 

  6. Cao, R., Zhu, S., Yang, C., Liu, C., Ma, R., Zhao, Y., et al. (2020). Unsupervised dual paraphrasing for two-stage semantic parsing. Preprint. arXiv:2005.13485.

    Google Scholar 

  7. Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1724–1734).

    Google Scholar 

  8. Conneau, A., Lample, G., Ranzato, M., Denoyer, L., & Jégou, H. (2017). Word translation without parallel data. Preprint. arXiv:1710.04087.

    Google Scholar 

  9. Dietterich, T. G. (2002). Ensemble learning. The Handbook of Brain Theory and Neural Networks, 2 (pp. 110–125). MIT Press.

    Google Scholar 

  10. Edunov, S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 489–500).

    Google Scholar 

  11. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (Vol. 70, pp. 1243–1252). JMLR. org.

    Google Scholar 

  12. Gulcehre, C., Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H.-C., et al. (2015). On using monolingual corpora in neural machine translation. Preprint. arXiv:1503.03535.

    Google Scholar 

  13. Gulcehre, C., Firat, O., Xu, K., Cho, K., & Bengio, Y. (2017). On integrating a language model into neural machine translation. Computer Speech & Language, 45, 137–148.

    Article  Google Scholar 

  14. Hassan Awadalla, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., et al. (March 2018). Achieving human parity on automatic chinese to English news translation. arXiv:1803.05567.

    Google Scholar 

  15. He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.-Y., et al. (2016). Dual learning for machine translation. In Advances in Neural Information Processing Systems (pp. 820–828).

    Google Scholar 

  16. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. Preprint. arXiv:1503.02531.

    Google Scholar 

  17. Jia, R., & Liang, P. (2016). Data recombination for neural semantic parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 12–22).

    Google Scholar 

  18. Kim, Y., & Rush, A. M. (2016). Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1317–1327).

    Google Scholar 

  19. Koehn, P. (2009). Statistical machine translation. New York: Cambridge University Press.

    Book  Google Scholar 

  20. Lample, G., Conneau, A., Denoyer, L., & Ranzato, M. (2018). Unsupervised machine translation using monolingual corpora only. In 6th International Conference on Learning Representations, ICLR 2018.

    Google Scholar 

  21. Lample, G., Ott, M., Conneau, A., Denoyer, L., & Ranzato, M. (2018). Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31–November 4, 2018 (pp. 5039–5049).

    Google Scholar 

  22. Luo, F., Li, P., Yang, P., Zhou, J., Tan, Y., Chang, B., et al. (2019). Towards fine-grained text sentiment transfer. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 2013–2022).

    Google Scholar 

  23. Luo, F., Li, P., Zhou, J., Yang, P., Chang, B., Sun, X., et al. (2019). A dual reinforcement learning framework for unsupervised text style transfer. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 5116–5122). AAAI Press.

    Google Scholar 

  24. Meng, C., Ren, P., Chen, Z., Sun, W., Ren, Z., Tu, Z., et al. (2020). Dukenet: A dual knowledge interaction network for knowledge-grounded conversation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1151–1160).

    Google Scholar 

  25. Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.

    Google Scholar 

  26. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).

    Google Scholar 

  27. Nirenburg, S. (1989). Knowledge-based machine translation. Machine Translation, 4(1), 5–24.

    Article  Google Scholar 

  28. Nirenburg, S., Carbonell, J., Tomita, M., & Goodman, K. (1994). Machine translation: A knowledge-based approach. San Mateo, CA: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  29. Ranzato, M., Chopra, S., Auli, M., & Zaremba, W. (2015) Sequence level training with recurrent neural networks. Preprint. arXiv:1511.06732.

    Google Scholar 

  30. Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 379–389).

    Google Scholar 

  31. Sennrich, R., Haddow, B., & Birch, A. (2016). Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 86–96).

    Google Scholar 

  32. Sestorain, L., Ciaramita, M., Buck, C., & Hofmann, T. (2018). Zero-shot dual machine translation. Preprint. arXiv:1805.10338.

    Google Scholar 

  33. Shen, L., & Feng, Y. (2020). CDL: Curriculum dual learning for emotion-controllable response generation. Preprint. arXiv:2005.00329.

    Google Scholar 

  34. Su, S.-Y., Huang, C.-W., & Chen, Y.-N. (2020). Towards unsupervised language understanding and generation by joint dual learning. In ACL 2020: 58th Annual Meeting of the Association for Computational Linguistics (pp. 671–680).

    Google Scholar 

  35. Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Thirteenth Annual Conference of the International Speech Communication Association.

    Google Scholar 

  36. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (pp. 3104–3112).

    Google Scholar 

  37. Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, (pp. 1057–1063).

    Google Scholar 

  38. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008).

    Google Scholar 

  39. Wang, Y., Xia, Y, He, T., Tian, F., Qin, T., Xiang Zhai, C., et al. (2019). Multi-agent dual learning. In 7th International Conference on Learning Representations, ICLR 2019.

    Google Scholar 

  40. Yang, M., Zhao, Z., Zhao, W., Chen, X., Zhu, J., Zhou, L., et al. (2017). Personalized response generation via domain adaptation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1021–1024).

    Google Scholar 

  41. Zelle, J. M., & Mooney, R. J. (1996). Learning to parse database queries using inductive logic programming. In Proceedings of the National Conference on Artificial Intelligence (pp. 1050–1055).

    Google Scholar 

  42. Zhang, S., & Bansal, M. (2019). Addressing semantic drift in question generation for semi-supervised question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 2495–2509).

    Google Scholar 

  43. Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. New York: CRC Press.

    Book  Google Scholar 

  44. Zhu, S., Cao, R., & Yu, K. (2020). Dual learning for semi-supervised natural language understanding. IEEE Transactions on Audio, Speech, and Language Processing, 28, 1936–1947.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Qin, T. (2020). Dual Learning for Machine Translation and Beyond. In: Dual Learning. Springer, Singapore. https://doi.org/10.1007/978-981-15-8884-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-8884-6_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-8883-9

  • Online ISBN: 978-981-15-8884-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics