Grunnleggende konsepter
Large language models can be trained to recognize the limits of their own knowledge and refuse to answer questions that are beyond their parametric knowledge.
Sammendrag
The paper proposes a novel instruction tuning method called Refusal-Aware Instruction Tuning (R-Tuning) to address the hallucination issue in large language models (LLMs). The key insight is that traditional instruction tuning forces LLMs to complete sentences regardless of whether the model knows the answer, leading to the generation of non-existent facts.
R-Tuning consists of two main steps:
- Identifying the knowledge gap between the parametric knowledge of the pre-trained LLM and the instruction tuning data. This is done by comparing the model's predictions with the ground-truth labels and splitting the data into certain (D1) and uncertain (D0) sets.
- Constructing refusal-aware training data by appending uncertainty expressions (e.g., "I am unsure") to the uncertain questions, while keeping the original labels for the certain questions.
The authors conduct single-task and multi-task experiments on various datasets. The results show that R-Tuning significantly outperforms the traditional instruction tuning approach in terms of accuracy on the questions the model is willing to answer, as well as the overall precision-recall tradeoff measured by the Average Precision (AP) score.
Further analysis reveals that the refusal ability learned by R-Tuning is a meta-skill that can be generalized to other tasks. The authors also find that learning uncertainty during training leads to better uncertainty estimation and question-answering performance than directly applying uncertainty-based filtering on the test data.
Statistikk
The parametric knowledge of the pre-trained model covers a large volume of factual knowledge, while the instruction tuning data may involve knowledge that is not necessarily in the parametric knowledge.
The accuracy of the willingly answered questions by R-Tuning is higher than the traditional instruction tuning approach.
R-Tuning achieves higher AP scores compared to the baselines, demonstrating a better precision-recall tradeoff.
The refusal ability learned by R-Tuning can be generalized to other tasks, indicating it is a meta-skill.
Learning uncertainty during training leads to better uncertainty estimation and question-answering performance than directly applying uncertainty-based filtering on the test data.
Sitater
"Training a model exclusively on correct answers inadvertently teaches it to guess rather than admit its ignorance. Consequently, if we never train the model to articulate "I don't know" as a response, it remains unequipped to do so when confronted with unknowns."
"One way to interpret our method is that it involves learning the uncertainty of the training data as part of instruction tuning. Further analysis surprisingly shows that learning uncertainty during training and then using it to filter and respond to questions yields better results than directly applying uncertainty filtering on test data."