Improving Statistical Word Alignments with Morpho-syntactic Transformations

de Gispert, Adrià; Gupta, Deepa; Popović, Maja; Lambert, Patrik; Mariño, Jose B.; Federico, Marcello; Ney, Hermann; Banchs, Rafael

doi:10.1007/11816508_38

Adrià de Gispert²¹,
Deepa Gupta²²,
Maja Popović²³,
Patrik Lambert²¹,
Jose B. Mariño²¹,
Marcello Federico²²,
Hermann Ney²³ &
…
Rafael Banchs²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

International Conference on Natural Language Processing (in Finland)

1580 Accesses
3 Citations

Abstract

This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Smadja, F.A., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22, 1–38 (1996)
Google Scholar
Diab, M., Resnik, P.: An unsupervised method for word sense tagging using parallel corpora. In: Proc. of the Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 255–262 (2002)
Google Scholar
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proc. of the 1st International Conference on Human Language Technology Research (HLT), pp. 161–168 (2001)
Google Scholar
Kuhn, J.: Experiments in parallel-text based grammar induction. In: Proc. of the 42th Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 470–477 (2004)
Google Scholar
Brown, P., Della Pietra, S., Della Pietra, V., Mercer, R.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)
Google Scholar
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS, vol. 2479, p. 18. Springer, Heidelberg (2002)
Chapter Google Scholar
Mariño, J., Banchs, R., Crego, J.M., de Gispert, A., Lambert, P., Fonollosa, J., Ruiz, M.: Bilingual n-gram statistical machine translation. In: Proc. of Machine Translation Summit X, Phuket, Thailand, pp. 275–282 (2005)
Google Scholar
Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29, 19–51 (2003)
Article Google Scholar
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proc. of the Annual Meeting of the Association for Computational Linguistics, Toulouse, France (2001)
Google Scholar
Och, F., Ney, H.: A comparison of alignment models for statistical machine translation. In: Proc. of the 18th Int. Conf. on Computational Linguistics, Saarbrucken, Germany, pp. 1086–1090 (2000)
Google Scholar
Toutanova, K., Ilhan, H.T., Manning, C.D.: Extensions to hmm-based statistical word alignment models. In: Proc. of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA (2002)
Google Scholar
Tiedemann, J.: Combining clues for word alignment. In: Proc. of the 10th Conf. of the European Chapter of the ACL (EACL), Budapest, Hungary (2003)
Google Scholar
de Gispert, A.: Phrase linguistic classification and generalization for improving statistical machine translation. In: Proc. of the ACL Student Research Workshop, pp. 67–72 (2005)
Google Scholar
Popović, M., Ney, H.: Improving word alignment quality using morpho-syntactic information. In: Proc. of the 20th Int. Conf. on Computational Linguistics, COLING 2004, Geneva, Switzerland, pp. 310–314 (2004)
Google Scholar
Popović, M., Ney, H.: POS-based word reorderings for statistical machine translation. In: Proc. 5th Int. Conf. on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 1278–1283 (2006)
Google Scholar
Costa-jussà, M., Crego, J., de Gispert, A., Lambert, P., Khalilov, M., Banchs, R., Mariño, J., Fonollosa, J.: Talp phrase-based statistical translation system for european language pairs. In: Proc. of the HLT/NAACL Workshop on Statistical Machine Translation, New York (2006)
Google Scholar
Brants, T.: Tnt — a statistical part-of-speech tagger. In: Proc. of Applied Natural Language Processing (ANLP), Seattle, WA (2000)
Google Scholar
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K., Tengi, R.: Five papers on wordnet. Special Issue of International Journal of Lexicography 3, 235–312 (1991)
Article Google Scholar
Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: An open-source suite of language analyzers. In: Proc. of the 4th Int. Conf. on Linguistic Resources and Evaluation (LREC), Lisbon, Portugal (2004)
Google Scholar
Lambert, P., de Gispert, A., Banchs, R., Mariño, J.: Guidelines for word alignment and manual alignment. Language Resources and Evaluation (2006), doi:10.1007/s10579-005-4822-5
Google Scholar
Och, F.: Giza++: Training of statistical translation models (2000), http://www.fjoch.com/GIZA++.html

Download references

Author information

Authors and Affiliations

TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain
Adrià de Gispert, Patrik Lambert, Jose B. Mariño & Rafael Banchs
ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy
Deepa Gupta & Marcello Federico
Lehrstuhl für Informatik 6, RWTH Aachen University, Aachen, Germany
Maja Popović & Hermann Ney

Authors

Adrià de Gispert
View author publications
You can also search for this author in PubMed Google Scholar
Deepa Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Maja Popović
View author publications
You can also search for this author in PubMed Google Scholar
Patrik Lambert
View author publications
You can also search for this author in PubMed Google Scholar
Jose B. Mariño
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Federico
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Ney
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Banchs
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Turku Centre for Computer Science (TUCS), Department of Information Technology, University of Turku, Joukahaisenkatu 3-5 B, FIN-20520, Turku, Finland
Tapio Salakoski
Turku Centre for Computer Science (TUCS) and Department of IT, University of Turku, Lemminkäisenkatu 14 A, 20520, Turku, Finland
Filip Ginter & Sampo Pyysalo &
Department of Information Technology, University of Turku, Lemminkäisenkatu 14–18 A, FIN-20520, Turku, Finland
Tapio Pahikkala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Gispert, A. et al. (2006). Improving Statistical Word Alignments with Morpho-syntactic Transformations. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_38

Download citation

DOI: https://doi.org/10.1007/11816508_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics