核心概念
Existing information retrieval systems mostly optimize for relevance to the question, ignoring diversity. This work proposes a benchmark and task to evaluate the ability of retrieval systems to surface diverse perspectives on complex and contentious questions.
摘要
The authors study the task of retrieving a set of documents that covers various perspectives on a complex and contentious question (e.g., "Will ChatGPT do more harm than good?"). They curate a Benchmark for Retrieval Diversity for Subjective questions (BERDS), where each example consists of a question and diverse perspectives associated with the question, sourced from survey questions and debate websites.
The authors evaluate the performance of different retrievers (BM25, DPR, CONTRIEVER) paired with various corpora (Wikipedia, web snapshot, and corpus constructed on the fly with retrieved pages from the search engine) on the BERDS dataset. They find that existing retrievers struggle to surface documents covering all perspectives, even when retrieving from the web.
To enhance the diversity of the retrieval results, the authors implement simple re-ranking and query expansion approaches. The query expansion approach, which first generates multiple perspectives using a large language model and then uses them to guide the retrieval, shows strong gains on the dense base retriever (CONTRIEVER).
The authors further provide rich analysis, studying the coverage of each corpus, retriever sycophancy, and whether retrievers prefer supporting or opposing perspectives to the input query.
統計資料
"Given a complex and contentious question (Xu et al., 2024), such as "Will ChatGPT do more harm than good?", a retrieval system should be able to surface diverse opinions in their top retrieval outputs."
"Surfacing diverse documents can be useful to the users directly, but also can be improve retrieval-augmented language models (RALMs)."
"Prompting large language models (LLMs) to generate an answer that encompasses diverse perspectives on its own is challenging (Sorensen et al., 2024; Hayati et al., 2023a), and retrieval-augmentation (Divekar and Durrett, 2024) can facilitate LLMs to generate more comprehensive answers that represent diverse perspectives."
引述
"Existing information retrieval (IR) tasks and systems mostly optimize for relevance to the question, ignoring diversity."
"Surfacing diverse documents can be useful to the users directly, but also can be improve retrieval-augmented language models (RALMs)."
"Prompting large language models (LLMs) to generate an answer that encompasses diverse perspectives on its own is challenging (Sorensen et al., 2024; Hayati et al., 2023a), and retrieval-augmentation (Divekar and Durrett, 2024) can facilitate LLMs to generate more comprehensive answers that represent diverse perspectives."