แนวคิดหลัก
This paper presents a novel, fully automated evaluation framework for Retrieval-Augmented Generation (RAG) systems, called AutoNuggetizer, which leverages large language models (LLMs) to automatically create and assign "nuggets" of information to assess the quality of system-generated answers. Initial results from the TREC 2024 RAG Track demonstrate a strong correlation between this automated approach and manual evaluation by human assessors, suggesting its potential as a reliable and efficient alternative for evaluating and iterating on RAG systems.
Pradeep, R., Thakur, N., Upadhyay, S., Campos, D., Craswell, N., & Lin, J. (2024). Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework. arXiv preprint arXiv:2411.09607.
This paper introduces and evaluates the AutoNuggetizer framework, a novel approach for automatically evaluating the performance of Retrieval-Augmented Generation (RAG) systems using a "nugget" based evaluation methodology. The primary objective is to determine the effectiveness and reliability of this automated approach by comparing its results to those obtained through manual evaluation by human assessors.