Core Concepts
This paper introduces LONG2RAG, a new benchmark for evaluating how well large language models (LLMs) can use retrieved information to generate long-form answers, along with a new metric called Key Point Recall (KPR) that focuses on the model's ability to incorporate key points from retrieved documents into its responses.
Qi, Z., Xu, R., Guo, Z., Wang, C., Zhang, H., & Xu, W. (2024). LONG2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall. arXiv preprint arXiv:2410.23000.
This paper introduces a new benchmark dataset and evaluation metric to address the limitations of existing methods in assessing the ability of large language models (LLMs) to effectively utilize retrieved information for long-form answer generation in retrieval-augmented generation (RAG) systems.