The paper explores the hypothesis that LLMs are not universally better at discriminating among their own previously generated alternatives than generating initial responses (SELF-[IN]CORRECT). The authors propose a unified framework to compare the generative and discriminative capabilities of LLMs across various tasks, including mathematics, world knowledge acquisition, truthful question answering, and instruction following.
The key findings are:
For the majority of the tested LLMs and tasks, the difference in performance between the discrimination and generation phases (DG-DIFF) is small or negative, indicating that the models' discrimination capability is not reliably better than their generation capability.
This pattern holds even for LLMs that have been fine-tuned on instruction-following tasks, suggesting that SELF-[IN]CORRECT is not simply an artifact of suboptimal prompt design.
Further experiments reveal that SELF-[IN]CORRECT does not manifest in LLMs that are not pre-trained with autoregressive objectives, hinting at a potential connection between autoregressive pre-training and the observed limitations.
When the discrimination phase is simplified by using more easily distinguishable incorrect options, the models demonstrate improved DG-DIFF, suggesting the sensitivity of the discriminative phase to the data distribution.
The authors discuss the implications of SELF-[IN]CORRECT for existing self-improvement methods and highlight areas for further exploration, such as the impact of pre-training data and prompt length on the observed phenomenon.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Dongwei Jian... lúc arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04298.pdfYêu cầu sâu hơn