Abstract
Recent years have seen increasing interest in automatic metrics for the evaluation of generation systems. When a system can generate syntactic variation, automatic evaluation becomes more difficult. In this paper, we compare the performance of several automatic evaluation metrics using a corpus of automatically generated paraphrases. We show that these evaluation metrics can at least partially measure adequacy (similarity in meaning), but are not good measures of fluency (syntactic correctness). We make several proposals for improving the evaluation of generation systems that produce variation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Elhadad, M., Robin, J.: Controlling content realization with functional unification grammar. In: Dale, R., Rösner, D., Stock, O., Hovy, E. (eds.) IWNLG 1992. LNCS, vol. 587. Springer, Heidelberg (1992)
Bangalore, S., Rambow, O.: Exploiting a probabilistic hierarchical model for generation. In: Proceedings of COLING 2000 (2000)
Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of ANLP 2000 (2000)
McKeown, K.: Paraphrasing using given and new information in a question-answer system. In: Proceedings of ACL 1979 (1979)
Murata, M., Isahara, H.: Universal model for paraphrasing – using transformation based on a defined criteria. In: Proccedings of the NLPRS 2001 workshop on Automatic Paraphrasing: Theories and Applications (2001)
Barzilay, R., Lee, L.: Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In: Proceedings of HLT-NAACL 2003 (2003)
Barzilay, R., McKeown, K.: Extracting paraphrases from a parallel corpus. In: Proceedings of ACL/EACL 2001 (2001)
Ibrahim, A., Katz, B., Lin, J.: Extracting structural paraphrases from aligned corpora. In: Proceedings of the 2nd International Workshop on Paraphrasing (2003)
Pang, B., Knight, K., Marcu, D.: Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In: Proceedings of HLT-NAACL 2003 (2003)
Shinyama, Y., Sekine, S., Sudo, K., Grishman, R.: Automatic paraphrase acquisition from news articles. In: Proceedings of HLT-NAACL 2002 (2002)
NIST: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics (2002)
Papenini, K., Roukos, S., Ward, T., Zhu, W.: BLEU: A method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), Thomas J. Watson Research Center, IBM Research Division (2001)
Turian, J., Shen, L., Melamed, I.D.: Evalaution of machine translation and its evaluation. In: Proceedings of MT Summit IX (2003)
Bangalore, S., Rambow, O., Whittaker, S.: Evaluation metrics for generation. In: Proceedings of INLG 2000 (2000)
Langkilde, I.: An empirical verification of coverage and correctness for a general-purpose sentence generator. In: Proceedings of INLG 2002 (2002)
Callaway, C.: Evaluating coverage for large symbolic NLG grammars. In: Proceedings of IJCAI 2003 (2003)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (1990)
Daelemans, W., Buchholz, S., Veenstra, J.: Memory-based shallow parsing. In: Proceedings of CoNLL 1999 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stent, A., Marge, M., Singhai, M. (2005). Evaluating Evaluation Methods for Generation in the Presence of Variation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)