Concepts de base
The author introduces ROUGE-K, a keyword-oriented evaluation metric, to assess the inclusion of essential words in summaries. Through experiments and analysis, it is revealed that current summarization models often miss crucial keywords, highlighting the importance of keyword inclusion in evaluations.
Résumé
The study introduces ROUGE-K, an evaluation metric focusing on keywords in summaries. It reveals that existing metrics may overlook essential information and proposes methods to guide models to include more keywords without compromising overall quality.
The research highlights the significance of including keywords in summaries for efficient information conveyance. Human annotators prefer summaries with more keywords as they capture important information better. The proposed ROUGE-K metric complements traditional metrics by providing a better index for evaluating summary relevance.
Experiments show that strong baseline models frequently fail to include essential words in their summaries. The study also evaluates large language models using ROUGE-K and demonstrates how it can differentiate system performance effectively. Additionally, four approaches are proposed to enhance keyword inclusion in transformer-based models.
Overall, the research emphasizes the importance of keyword inclusion in summaries and introduces a new metric, ROUGE-K, to address this aspect comprehensively.
Stats
Hypothesis 1: 27.45
Hypothesis 2: 26.09
Table 1: An example where ROUGE and BERTScore (BS) can lead to misinterpretations.
Table 3: Agreement ratios (%) of each metric and human annotator on summary relevance.
Table 4: Statistic of datasets and extracted keywords.
Table 5: BART performance evaluated by ROUGE-1/2/-L and our ROUGE-K.
Table 6: Pearson Correlation between the number of words in summaries and evaluation metrics.
Table 7: Pearson Correlation between ROUGE-K and existing metrics.
Table 8: Results on SciTLDR, XSum, and ScisummNet.
Table 9: ROUGE-K scores on keywords seen (IN-SRC) vs. unseen (OUT-SRC) in source documents.
Figure 1: Overview of TDSum model.
Figure 2: ROUGE-K and keyword length.
Citations
"We propose a simple heuristic that exploits the common structure of summarization datasets to extract keywords automatically."
"ROUGE-K provides a quantitative answer to how well do summaries include keywords."
"Our experiments reveal that current strong baseline models often miss essential information in their summaries."