Are Large Language Models Capable of Accurately Judging the Utility of Evidence for Open-Domain Question Answering?
Large language models (LLMs) can distinguish between the relevance and utility of passages in supporting open-domain question answering, and their utility judgments can provide more valuable guidance than relevance judgments in identifying ground-truth evidence necessary for answering questions. However, the performance of LLMs in utility judgments is affected by various factors in the instruction design, such as the input form of passages, the sequence of input between the question and passages, and additional requirements like chain-of-thought and reasoning.