Core Concepts
This paper introduces EMVB, a novel framework for efficient query processing in multi-vector dense retrieval. EMVB employs a highly efficient pre-filtering step using optimized bit vectors, a column-wise SIMD reduction for candidate passage retrieval, and a late interaction mechanism that combines product quantization with per-document term filtering to significantly improve the efficiency of multi-vector dense retrieval systems.
Abstract
The paper presents EMVB, a novel framework for efficient multi-vector dense retrieval, which advances the state-of-the-art PLAID approach. EMVB introduces four key contributions:
Pre-filtering of candidate passages: EMVB employs a highly efficient pre-filtering step using optimized bit vectors to quickly discard non-relevant passages, significantly speeding up the candidate passage filtering phase.
Efficient centroid interaction: EMVB computes the centroid interaction in a more efficient manner by leveraging SIMD instructions for column-wise max reduction, reducing the latency of this step.
Late interaction with Product Quantization: EMVB uses Product Quantization (PQ) to reduce the memory footprint of storing vector representations while enabling fast late interaction, providing up to 3.6x speedup compared to PLAID's residual compression.
Per-document term filtering for late interaction: EMVB introduces a dynamic per-document term filtering approach for the late interaction phase, further improving efficiency by up to 30%.
The authors evaluate EMVB against PLAID on the MS MARCO and LoTTE datasets. Results show that EMVB is up to 2.8x faster than PLAID on the in-domain MS MARCO dataset, while reducing the memory footprint by 1.8x with no loss in retrieval quality. On the out-of-domain LoTTE dataset, EMVB offers up to 2.9x speedup with minimal retrieval quality degradation.
Stats
The MS MARCO dataset used in the experiments contains about 600M d-dimensional vectors, with d = 128.
EMVB reduces the memory footprint by 1.8x compared to PLAID on the MS MARCO dataset.