This research paper investigates the adversarial vulnerability of pairwise evaluation using large language models (LLMs). The authors argue that while LLMs are increasingly used for automated evaluation of generated text, their reliability is compromised by biases, particularly in pairwise comparisons.
The paper compares pairwise evaluation, where an LLM compares two outputs directly, with pointwise evaluation, where each output is assessed independently. The study finds that while pairwise evaluation performs well on standard datasets, it struggles with adversarial examples, which are specifically designed to exploit LLM biases. In contrast, pointwise evaluation demonstrates greater robustness against these adversarial examples.
The authors analyze the reasoning process of LLM evaluators and discover that even when making incorrect judgments, they can still identify shortcomings in low-quality outputs. This suggests that the issue lies not in the LLMs' inability to recognize flaws but rather in the amplification of biases within the pairwise evaluation setup.
To address this vulnerability, the authors propose PREPAIR, a hybrid approach that incorporates pointwise reasoning into pairwise evaluation. PREPAIR analyzes each output independently before making a final pairwise decision. Experimental results demonstrate that PREPAIR improves the performance of pairwise evaluators on adversarial datasets while maintaining comparable performance on standard datasets.
The authors acknowledge that PREPAIR is not a definitive solution, as the ultimate goal is to enable LLMs to understand and adhere to human preference hierarchies even in adversarial scenarios. However, they emphasize the significance of their findings in highlighting the need for more robust LLM evaluation methods. The paper concludes by encouraging further research into strategies for enhancing evaluation reliability, particularly in adversarial contexts.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Hawon Jeong,... klo arxiv.org 10-04-2024
https://arxiv.org/pdf/2406.12319.pdfSyvällisempiä Kysymyksiä