Core Concepts
Sequence-level certainty reduces hallucination in Knowledge Grounded Dialogue Generation by proposing Certainty-based Response Ranking (CRR) methods.
Abstract
1. Abstract:
- Proposes sequence-level certainty as a common theme over hallucination in KGDG.
- Introduces Certainty-based Response Ranking (CRR) to mitigate hallucination during decoding.
2. Introduction:
- Discusses previous works on hallucination in KGDG.
- Defines model response hallucination and proposes sequence-level certainty as a solution.
3. Sequence-Level Certainty:
- Dissects sequence-level certainty into probabilistic and semantic certainty.
- Defines probabilistic certainty as the mean log-probability of the entire sequence.
- Defines semantic certainty using Agreement Score (AS) for semantic entailment.
4. Certainty-Based Response Ranking:
- Introduces Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR) methods.
- Ranks response candidates based on their certainty level to reduce hallucination.
5. Experiments:
- Tests CRR methods on different models, decoding methods, and datasets.
- Shows a negative correlation between certainty and hallucination probability.
6. Background on Uncertainty and Hallucination:
- Discusses previous works on uncertainty estimation and its relation to hallucination.
7. Conclusion:
- Validates the effectiveness of P-CRR and S-CRR in reducing model hallucination in KGDG.
8. Experimental Details:
- Details the task definition, training, and inference methods for KGDG models.
Stats
모델 응답의 확률적 확신과 의미론적 확신은 모델의 환각 확률과 부정적으로 상관관계가 있음.
GPT2-small의 P-CRR 및 S-CRR은 충실도 개선에 기여함.
GPT2-medium, T5-base 및 OpenLlama-3B에서 P-CRR 및 S-CRR은 충실한 응답 비율을 향상시킴.
Quotes
"Empirical results reveal that a higher level of both types of sequence-level certainty in model responses is correlated with a lower level of hallucination."
"Through extensive experiments, we validate the effectiveness of the CRR methods in reducing model hallucination."
"Both P-CRR and S-CRR contribute to improvements in faithfulness."