The article explores the impact of retrieval augmentation on language models (LMs) by analyzing question popularity and model size. It introduces a new QA dataset, WITQA, with supporting passages for each QA pair. Experiments with 10 LMs and four retrievers reveal insights into recall abilities, retrieval assistance, and error patterns. Findings suggest that larger models excel in recalling popular facts but struggle with minor details. Retrievers enhance smaller models' accuracy but may override larger models' recall capabilities. Selective memory integration based on question popularity improves QA performance significantly.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Seiji Maekaw... ב- arxiv.org 03-15-2024
https://arxiv.org/pdf/2402.13492.pdfשאלות מעמיקות