Core Concepts
An innovative method called FecTek is introduced to enhance feature context representations and incorporate term-level knowledge guidance for improving lexicon-based retrieval performance.
Abstract
The paper presents an innovative method called FecTek to enhance the performance of lexicon-based retrieval. The key highlights are:
FecTek introduces two specialized components:
Feature Context Module (FCM): This module enriches the feature context representations of term weights by leveraging BERT's representations to determine dynamic weights for each element in the embedding.
Term-level Knowledge Guidance Module (TKGM): This module effectively utilizes term-level knowledge to guide the modeling process of term weights. Terms found in both the query and passage are assigned a label of 1, while the remaining terms are labeled as 0.
The text-level branch of FecTek, consisting of the FCM and a projector module, is responsible for acquiring term weights. The term-level branch, including the TKGM and another projector module, injects term-level knowledge into the system.
Evaluation on the MS Marco benchmark demonstrates that FecTek consistently outperforms previous state-of-the-art approaches, establishing a new benchmark in lexicon-based retrieval. When integrated with distillation from a reranker, FecTek achieves an impressive 38.7% MRR@10.
Ablation studies confirm the effectiveness of the FCM and TKGM modules in improving the performance of FecTek. Utilizing a more powerful reranker model for distillation also yields greater performance gains.
Stats
Lexicon-based retrieval methods heavily depend on frequency-based term weight estimation, which often fails to adequately capture contextual information.
Existing neural retrieval methods emphasize capturing spatial context representations, while neglecting the importance of feature context representations.
Text-level contrastive learning approaches eliminate the need for term-level labeling but lack clear guidance from term-level knowledge.
Quotes
"To address the first challenge, we devised a feature context module (FCM) inspired by the remarkable improvements achieved through the application of channel attention in CNN models [10]. This module enriches the feature context representations of term weight effectively."
"Regarding the second problem, we developed a term-level knowledge guidance module (TKGM) as the central solution in FecTek."