This research paper introduces SEPARABILITY, a novel meta-evaluation metric designed to assess the reliability of human preference judgments in evaluating large language models (LLMs). The authors argue that traditional pairwise comparisons often suffer from inconsistencies, particularly when model outputs are very similar or exhibit high variability due to stochastic decoding.
The paper identifies two key factors contributing to this challenge: high cross-alignment (similarity between generations from different models) and low self-alignment (variability within a single model's generations). SEPARABILITY addresses these factors by quantifying the distinguishability of model outputs for a given input.
The authors demonstrate the effectiveness of SEPARABILITY through experiments on various generation tasks and benchmarks, comparing different LLM pairs. Results show that instances with high SEPARABILITY scores consistently receive more consistent preference ratings from both human and automated evaluators.
Furthermore, the paper explores the application of SEPARABILITY in ELO ratings, a popular method for ranking LLMs. By incorporating SEPARABILITY into the ELO update rule, the authors propose a more nuanced ranking system that accounts for the reliability of individual preference comparisons.
The paper concludes that SEPARABILITY provides a valuable tool for LLM developers and users to:
The authors suggest future research directions, including applying SEPARABILITY to filter preference tuning data for learning from human feedback.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies