Large language models (LLMs) can leverage self-correction to improve their alignment and performance on tasks like mitigating social bias and defending against jailbreak attacks, particularly when equipped with accurate self-criticism mechanisms.
Smaller Large Language Models (LLMs), contrary to prior beliefs, can be equipped for moral self-correction, particularly those with 3.8B parameters or more, highlighting the significant impact of safety alignment during fine-tuning.
This paper introduces Chain of Self-Correction (CoSC), a novel mechanism designed to improve the mathematical reasoning abilities of Large Language Models (LLMs) by enabling them to self-correct their reasoning process.
While moral self-correction instructions can improve the ethicality of Large Language Model outputs, this improvement may be superficial, relying on shortcuts rather than truly mitigating underlying biases stored within the model.
Large language models (LLMs) can self-correct without external feedback using a novel prompting method called Progressive Correction (PROCO), which employs an iterative verify-then-correct framework to refine responses by identifying key conditions and formulating verification questions.