핵심 개념
The MATEval framework enhances the reliability and efficiency of evaluating open-ended text generated by large language models through a multi-agent discussion process that integrates self-reflection and Chain-of-Thought strategies, along with feedback mechanisms to reach consensus.
초록
The paper introduces the MATEval framework, which aims to address the challenges in evaluating open-ended text generated by large language models (LLMs). The key aspects of the framework are:
Multi-Agent Approach: The framework employs a collaborative discussion process involving different agent roles - Evaluator Agent, Feedback Agent, and Summarizer Agent. This multi-agent approach is designed to improve the reliability and depth of the text evaluation.
Self-Reflection and Chain-of-Thought (CoT) Strategies: The agents utilize a combination of self-reflection and CoT strategies to decompose the evaluation task, focus on specific sub-problems, and refine their assessments through iterative discussions.
Feedback Mechanism: After each discussion round, a Feedback Agent evaluates the quality and efficiency of the discussion, providing guidance to reduce repetition and resolve disagreements, thereby facilitating consensus among the agents.
Comprehensive Evaluation Report: The framework generates a detailed evaluation report, including error type identification, localization, in-depth explanations, and scoring. This report is provided in both a Q&A format for correlation analysis and a text-based format for practical model iteration in industrial scenarios.
The experimental results on English and Chinese story text datasets, including a dataset based on Alipay's business data, demonstrate the effectiveness of the MATEval framework. It outperforms existing open-ended text evaluation methods and achieves the highest correlation with human evaluations, significantly improving the efficiency of model iteration in industrial applications.
통계
"Bob and Mike had desired to go for a fishing trip to the lake."
"With clear skies at sunrise, they were free to play chess all day."
"Hoping for better weather in the morning, they went to sleep early."
"They packed up and brought the camper so everyone could stay the night."