Core Concepts
Proposing Debatrix, an automated debate judging framework based on Large Language Models (LLMs), to enhance multi-turn debate analysis.
Abstract
Debatrix introduces iterative chronological analysis and dimensional collaboration for systematic judgments. PanelBench benchmark evaluates its performance.
Introduction to Debatrix and PanelBench
Challenges in automated debate judging
Structure of Debatrix: Memories and Judges
Iterative Chronological Analysis in Debatrix
Dimensional Collaboration for Systematic Judgment
Performance comparison on PanelBench debates
Stats
"Large language models (LLM) such as ChatGPT and GPT-4 have shown a solid ability to evaluate text quality."
"Debatrix increased winner prediction accuracy compared to directly prompting the LLM with raw speeches."
"On PanelBench, Debatrix consistently outperforms all baseline models on both debate collections."
Quotes
"Debating is the formal process of gaining consensus among groups with different opinions."
"Automating debate assessment is helpful to improve debate quality in political, commercial, or educational scenarios."