Core Concepts
Incorporating a small encoder model to effectively encode longer contexts and leverage cross-attention mechanism to improve the performance of open-domain question answering.
Abstract
The paper proposes a method to enhance open-domain question answering (ODQA) by leveraging a small encoder model to effectively encode longer contexts. The key highlights are:
The method utilizes a small encoder model to encode additional contexts beyond the maximum length that the original task model can handle. This is achieved by applying a cross-attention mechanism between the encoded contexts and the original task model's inputs.
Experiments are conducted on two ODQA datasets (TriviaQA and Natural Questions) and evaluated on held-in, held-out, and in-context learning (ICL) settings. The results show that the proposed method consistently outperforms the baseline, which is fine-tuned on data with limited context length.
The computing resource requirements of the proposed method are close to the baseline, and the runtime remains competitive. This is achieved by using a 10x smaller encoder model compared to the original task model, avoiding the need for complex techniques to reduce the computation graph during backpropagation.
The paper also analyzes the impact of training strategies for the encoder model, demonstrating the importance of carefully optimizing the encoder parameters to maintain its encoding capability.
Further experiments show the effectiveness of the proposed method in a more challenging setting where the original task model is provided with only the question-answer pairs without any context information.
Overall, the paper presents a simple yet effective method to enhance ODQA by leveraging a small encoder model to encode longer contexts, which leads to improved performance across various evaluation settings.
Stats
The length of context that the model can cover increases from 2k (in text form) to a maximum of 10k (in dense form, which is condensed by the encoder).
The run time of the proposed method remains competitive compared to the baseline, as shown in Figure 2.