Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk

Jurčíček, F.; Keizer, S.; Gašić, Milica; Mairesse, François; Thomson, B.; Yu, K.; Young, Steve

doi:10.21437/Interspeech.2011-766

Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk

F. Jurčíček, S. Keizer, Milica Gašić, François Mairesse, B. Thomson, K. Yu, Steve Young

This paper describes a framework for evaluation of spoken dialogue systems. Typically, evaluation of dialogue systems is performed in a controlled test environment with carefully selected and instructed users. However, this approach is very demanding. An alternative is to recruit a large group of users who evaluate the dialogue systems in a remote setting under virtually no supervision. Crowdsourcing technology, for example Amazon Mechanical Turk (AMT), provides an efficient way of recruiting subjects. This paper describes an evaluation framework for spoken dialogue systems using AMT users and compares the obtained results with a recent trial in which the systems were tested by locally recruited users. The results suggest that the use of crowdsourcing technology is feasible and it can provide reliable results.

doi: 10.21437/Interspeech.2011-766

Cite as: Jurčíček, F., Keizer, S., Gašić, M., Mairesse, F., Thomson, B., Yu, K., Young, S. (2011) Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk. Proc. Interspeech 2011, 3061-3064, doi: 10.21437/Interspeech.2011-766

@inproceedings{jurcicek11_interspeech,
  author={F. Jurčíček and S. Keizer and Milica Gašić and François Mairesse and B. Thomson and K. Yu and Steve Young},
  title={{Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={3061--3064},
  doi={10.21437/Interspeech.2011-766}
}