Large Language Models Struggle to Refine Self-Generated Responses: An Empirical Study on Generative and Discriminative Capabilities
Large language models (LLMs) do not reliably outperform their own initial generations when discriminating among previously generated alternatives, suggesting limitations in their self-improvement capabilities.