Core Concepts
LLM-based classifiers effectively detect hallucination and coverage errors in retrieval augmented generation for controversial topics.
Abstract
The article explores error detection methods in Large Language Models (LLMs) used in chatbots for controversial topics. It introduces the NPOV Response Task, focusing on response generation from provided perspectives. Methods like ROUGE, salience, and LLM-based classifiers are evaluated for detecting errors. Synthetic error datasets are used to train and evaluate these methods. Results show that classifiers trained on synthetic errors perform well, with ROUGE being a strong baseline method. Salience is effective for word-level error detection, capturing semantics better than ROUGE in some cases.
Stats
Our results demonstrate that LLM-based classifiers achieve high error detection performance.
ROC AUC scores of 95.3% for hallucination and 90.5% for coverage error detection were achieved.
Even without access to annotated data, good results were obtained on hallucination (84.0%) and coverage error (85.2%) detection.
Classifier performance improves with more training data, especially on full organic errors.
Salience performs equally to or better than ROUGE for detecting both hallucinated words and uncovered words.
Quotes
"Large Language Models have achieved state-of-the-art performance but struggle with factuality and bias."
"Our work focuses on response generation after pro and con arguments are provided to an LLM."
"Classifier performance improves with more training data."
"Salience is effective for word-level error detection."