FENICE addresses the challenge of factual inconsistencies in automatically generated summaries. It proposes an interpretable metric that aligns claims with information from the source document, setting a new benchmark for factuality evaluation. The metric is efficient, accurate, and applicable to long-form summarization.
Recent advancements in text summarization have led to high performance but also raised concerns about factual inconsistencies. FENICE aims to overcome these challenges by providing an interpretable and efficient factuality-oriented metric. By aligning claims with atomic facts extracted from the summary, FENICE achieves superior accuracy compared to existing metrics.
The approach involves extracting atomic facts (claims) from the summary and aligning them with specific sections of the input document using NLI-based alignments. This strategy enhances interpretability by highlighting relevant sections for claim verification or identifying hallucinations. FENICE outperforms other metrics on standard benchmarks and excels in evaluating long-form summarization factuality through human annotation processes.
In experiments, FENICE demonstrates state-of-the-art performance on AGGREFACT datasets, showcasing its effectiveness in evaluating both standard and long-form summarization factuality. The metric's adaptability to different text granularities ensures robustness and accuracy in assessing factual consistency across various types of documents.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies