Core Concepts
By leveraging the knowledge and reasoning capabilities of large language models (LLMs), we can generate multiple effective queries to enhance the retrieval performance for complex conversational information seeking tasks.
Abstract
The content discusses methods for improving conversational passage retrieval by leveraging large language models (LLMs). The key points are:
Existing approaches in conversational information seeking (CIS) often model the user's information need with a single rewritten query, which can be limiting for complex queries that require reasoning over multiple facts.
The authors propose a "generate-then-retrieve" (GR) pipeline that first prompts the LLM to generate an answer to the user's query, and then uses that answer to generate multiple searchable queries.
Three GR-based approaches are proposed:
AD: Using the LLM-generated answer as a single long query.
QD: Prompting the LLM to directly generate multiple queries.
AQD: Generating an answer first, then using that to generate multiple queries.
AQDA: A variant of AQD that re-ranks the final results based on the generated answer.
Experiments on the TREC iKAT dataset show that the GR-based approaches, especially AQDA, significantly outperform the traditional "retrieve-then-generate" (RG) baselines.
The authors also address the issue of limited relevance judgments in the official iKAT dataset by creating a new assessment pool using GPT-3.5, which shows high agreement with human labels.
Stats
Travel distance between NYU and Trento
Travel distance between Columbia University and Trento
Travel distance between Rutgers University and Trento