Large Language Model Self-Correction

Entrar

insight - Large Language Model Self-Correction

A Theoretical Analysis of Self-Correction in Large Language Models through In-Context Alignment

Large language models (LLMs) can leverage self-correction to improve their alignment and performance on tasks like mitigating social bias and defending against jailbreak attacks, particularly when equipped with accurate self-criticism mechanisms.

Moral Self-Correction is Possible in Smaller Large Language Models

Smaller Large Language Models (LLMs), contrary to prior beliefs, can be equipped for moral self-correction, particularly those with 3.8B parameters or more, highlighting the significant impact of safety alignment during fine-tuning.

Enhancing Mathematical Reasoning in Large Language Models Through Embedded Self-Correction

This paper introduces Chain of Self-Correction (CoSC), a novel mechanism designed to improve the mathematical reasoning abilities of Large Language Models (LLMs) by enabling them to self-correct their reasoning process.

Intrinsic Moral Self-Correction in Large Language Models: Superficial Shortcut or True Moral Enhancement?

While moral self-correction instructions can improve the ethicality of Large Language Model outputs, this improvement may be superficial, relying on shortcuts rather than truly mitigating underlying biases stored within the model.

Large Language Models Can Self-Correct Using a Verify-then-Correct Framework

Large language models (LLMs) can self-correct without external feedback using a novel prompting method called Progressive Correction (PROCO), which employs an iterative verify-then-correct framework to refine responses by identifying key conditions and formulating verification questions.

Sobre

Produtos

Recursos