Current automatic metrics struggle to accurately capture the nuances of meeting summarization, often masking or rewarding errors, and failing to reflect the severity of issues in generated summaries.
CREAM, a novel framework, addresses the unique challenges of evaluating meeting summaries by leveraging comparison-based metrics and an Elo ranking system to assess conciseness and completeness without requiring reference texts.