Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations
This review provides an extensive overview of evaluation methods for task-oriented dialogue systems, with a focus on practical applications such as customer service. It identifies a wide variety of constructs and metrics used in previous work, discusses challenges in dialogue system evaluation, and develops a research agenda for the future of this field.