מושגי ליבה
The author explores the challenges of recognizing and abstaining from answering unanswerable text excerpts in Open-Domain Question Answering, highlighting the limitations of existing training strategies.
תקציר
The content discusses the limitations of current Open-Domain Question Answering systems in handling irrelevant text excerpts. It emphasizes the importance of recognizing unanswerable questions and proposes a new approach using unanswerable pairs from the SQuAD 2.0 dataset to improve model accuracy.
The study reveals that models trained on random text excerpts struggle to generalize effectively, leading to a substantial decrease in predictive accuracy. By incorporating unanswerable questions from SQuAD 2.0, models achieve near-perfect performance when faced with challenging text excerpts.
The analysis delves into the model's ability to abstain from answering, extract correct answers, or hallucinate responses when presented with semantically related but practically irrelevant text excerpts. The results suggest a confirmation bias in model behavior, favoring familiar answers even in unrelated contexts.
Overall, the research underscores the critical need for ODQA systems to recognize and abstain from answering when faced with unanswerable questions, highlighting the significance of addressing this issue for developing trustworthy question-answering systems.
סטטיסטיקה
The performance decreased from 98% to 1%.
Models achieve nearly perfect (≈100%) accuracy.
The rate of abstention drops to just 1.1%.
The rate of hallucination is non-trivial at 26.5%.
ציטוטים
"No relevant text excerpts could be found for 50.5% of initial seed questions."
"Models trained on random text excerpts exhibit a significant decrease in their ability to abstain from answering."
"The inclusion of unanswerable questions from SQuAD 2.0 results in flawless performance across various types of irrelevant text excerpts."