Large language models (LLMs) can distinguish between the relevance and utility of passages in supporting open-domain question answering, and their utility judgments can provide more valuable guidance than relevance judgments in identifying ground-truth evidence necessary for answering questions. However, the performance of LLMs in utility judgments is affected by various factors in the instruction design, such as the input form of passages, the sequence of input between the question and passages, and additional requirements like chain-of-thought and reasoning.
MFORT-QA leverages few-shot learning, chain-of-thought prompting, and retrieval-augmented generation to accurately answer complex questions by extracting relevant information from tables and associated hyperlinked contexts.