This study investigates the effect of user feedback from follow-up utterances on the evaluation of task-oriented dialogue systems (TDSs) by both crowdworkers and large language models (LLMs). The researchers conducted experiments with two setups: one that provided only the initial user query and the system's response, and another that included the user's follow-up utterance.
The key findings are:
Both crowdworkers and LLMs exhibit sensitivity to user feedback from follow-up utterances, with significant differences in their ratings across the two setups, except for relevance.
Crowdworkers are more susceptible to user feedback on usefulness and interestingness compared to LLMs, who are more influenced by user feedback on interestingness and relevance.
User feedback leads to a more personalized assessment of usefulness by crowdworkers, aligning closely with the user's explicit feedback.
In cases of ambiguous or complex user requests, user feedback improves agreement among crowdworkers, helping to clarify the user's intent.
These findings highlight the importance of incorporating user feedback in the evaluation of dialogue systems and suggest the potential for automated feedback integration in future research.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Clemencia Si... lúc arxiv.org 04-22-2024
https://arxiv.org/pdf/2404.12994.pdfYêu cầu sâu hơn