This paper investigates the limitations of commonly used automatic metrics in evaluating meeting summarization. Through a comprehensive literature review, the authors identify key challenges in meeting summarization, such as handling spoken language, speaker dynamics, coreference, discourse structure, and contextual turn-taking. They also define a taxonomy of observable errors that can arise when these challenges are not adequately addressed, including missing information, redundancy, wrong references, incorrect reasoning, hallucination, and incoherence.
The authors then conduct an empirical study using the QMSum dataset, annotating meeting transcripts and model-generated summaries to establish direct correlations between the challenges and the resulting errors. They evaluate a suite of nine prevalent automatic metrics, including count-based (ROUGE, BLEU, METEOR), model-based (BERTScore, Perplexity, BLANC, LENS), and QA-based (QuestEval) approaches, to understand how well they align with human assessments of the errors.
The analysis reveals that current metrics struggle to accurately capture the nuances of meeting summarization. While some metrics show sensitivity to certain error types, like ROUGE's correlation with missing information, many exhibit weak to moderate correlations, and a significant portion either overlook or even reward errors. For instance, Perplexity tends to favor incorrect references, and LENS correlates positively with structural disorganization. The authors also find that the metrics generally fail to discern the severity of errors, further highlighting the need for more refined evaluation methods in this domain.
The authors conclude by discussing the potential of leveraging large language models with chain-of-thought or tree-of-thought prompting techniques to develop more effective evaluation metrics for meeting summarization. They also plan to expand the annotated dataset to support the community's efforts in advancing meeting summarization techniques and evaluation.
To Another Language
from source content
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Frederic Kir... : arxiv.org 04-18-2024
https://arxiv.org/pdf/2404.11124.pdfDaha Derin Sorular