Core Concepts
Existing retrieval models struggle to effectively comprehend exclusionary queries, where users explicitly express what they do not want to retrieve. Generative retrieval models exhibit unique advantages in handling such queries compared to sparse and dense retrieval methods.
Abstract
The paper introduces ExcluIR, a new dataset and benchmark for evaluating the capability of retrieval models in handling exclusionary queries. Exclusionary queries are those where users explicitly express what information they do not want to retrieve.
The key highlights and insights from the paper are:
Existing retrieval models with different architectures, including sparse, dense, and generative retrieval methods, perform poorly on the ExcluIR benchmark. Their performance is far from satisfactory, indicating the challenges in comprehending exclusionary queries.
Integrating the ExcluIR training set, which contains a large number of exclusionary queries, can improve the performance of retrieval models on the ExcluIR benchmark. However, there still exists a significant gap compared to human performance.
Generative retrieval models have a natural advantage in handling exclusionary queries compared to sparse and dense retrieval methods. This is because the multi-level cross-attention mechanism in generative models allows them to focus on the exclusionary phrases in the query, effectively capturing the user's intent.
Late interaction models like ColBERT struggle to comprehend exclusionary queries, as their token-level relevance calculation is not well-suited for handling complex exclusionary semantics.
Expanding the training data domain and increasing the model size do not consistently lead to improved performance on ExcluIR, suggesting the need for more targeted training strategies and architectural innovations to address the challenges of exclusionary retrieval.
Stats
"Existing retrieval models with different architectures struggle to effectively comprehend exclusionary queries."
"Generative retrieval models have a natural advantage in handling exclusionary queries compared to sparse and dense retrieval methods."
"Late interaction models like ColBERT struggle to comprehend exclusionary queries, as their token-level relevance calculation is not well-suited for handling complex exclusionary semantics."
Quotes
"Exclusionary retrieval emphasizes a crucial need for precision and relevance in information retrieval. It shows how users leverage their knowledge and expectations to find information that meets their specific needs."
"Failure to understand exclusionary queries can present a potentially serious problem."
"Generative retrieval models adopt a sequence-to-sequence framework, such as T5 or BART, which estimates the probability of generating the document IDs given the query using a conditional probability model: P(d|q)."