Conceitos Básicos
This research explores fine-tuning techniques to enhance causal reasoning in language models by leveraging counterfactual feedback, demonstrating that directly targeting causal consistency leads to significant improvements in reasoning performance.
Estatísticas
LLMs still perform substantially better on recall-based tasks that do not explicitly require reasoning.
LLMs struggle with counterfactual questions compared to purely factual questions.
Fine-tuning generalizes best when performed inductively.
Citações
"While the ever-increasing accuracy of these systems is now undeniable, it is still rather unclear to what extent this accuracy is due to effective recall of their training data vs. a genuine ability to reason by extracting, understanding, and adapting the fundamental concepts underlying that training data."
"We improve the causal reasoning of LLMs by adapting established methods of fine-tuning. In particular, we consider supervised fine-tuning (SFT) and direct preference optimization (DPO)."
"When the goal of fine-tuning is specifically to improve reasoning, a unique problem arises in evaluating the fine-tuned LLMs: we cannot just measure performance for a held-out set of test samples within the same reasoning task. If we do, it would be impossible to tell whether the LLM actually learned to reason or whether it is still recalling the demonstrations we have made during fine-tuning."