The paper explores the hypothesis that LLMs are not universally better at discriminating among their own previously generated alternatives than generating initial responses (SELF-[IN]CORRECT). The authors propose a unified framework to compare the generative and discriminative capabilities of LLMs across various tasks, including mathematics, world knowledge acquisition, truthful question answering, and instruction following.
The key findings are:
For the majority of the tested LLMs and tasks, the difference in performance between the discrimination and generation phases (DG-DIFF) is small or negative, indicating that the models' discrimination capability is not reliably better than their generation capability.
This pattern holds even for LLMs that have been fine-tuned on instruction-following tasks, suggesting that SELF-[IN]CORRECT is not simply an artifact of suboptimal prompt design.
Further experiments reveal that SELF-[IN]CORRECT does not manifest in LLMs that are not pre-trained with autoregressive objectives, hinting at a potential connection between autoregressive pre-training and the observed limitations.
When the discrimination phase is simplified by using more easily distinguishable incorrect options, the models demonstrate improved DG-DIFF, suggesting the sensitivity of the discriminative phase to the data distribution.
The authors discuss the implications of SELF-[IN]CORRECT for existing self-improvement methods and highlight areas for further exploration, such as the impact of pre-training data and prompt length on the observed phenomenon.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Dongwei Jian... at arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04298.pdfDeeper Inquiries