Conceptos Básicos
SPLATE is a lightweight adaptation of the ColBERTv2 model that learns a sparse vocabulary-based representation, enabling efficient candidate generation for late interaction retrieval pipelines.
Resumen
The paper introduces SPLATE, a novel approach to efficiently implement the candidate generation step in late interaction retrieval pipelines based on contextual language models like ColBERTv2.
The key insights are:
- SPLATE adapts the frozen token embeddings of ColBERTv2 by learning a lightweight "MLM adapter" module that maps the dense representations to a sparse vocabulary space. This allows SPLATE to leverage traditional sparse retrieval techniques for the candidate generation step.
- By bridging the gap between sparse and dense retrieval models, SPLATE can provide ColBERTv2 with a set of candidate documents to re-rank, while being particularly efficient in mono-CPU environments.
- Experiments show that SPLATE can achieve comparable effectiveness to the original ColBERTv2 pipeline, while greatly reducing the latency of the candidate generation step (e.g., from 186ms to around 10ms on MS MARCO).
- SPLATE also offers more interpretability, as the candidate generation operates directly in the vocabulary space, unlike previous optimized late interaction pipelines.
Overall, SPLATE demonstrates how to efficiently integrate sparse and dense retrieval models, providing a practical solution to deploy ColBERTv2-like architectures in resource-constrained environments.
Estadísticas
SPLATE with (kq, kd) = (5, 50) can retrieve over 90% of ColBERTv2's top-10 documents in its top-50 candidates.
SPLATE (e2e) with (kq, kd) = (10, 100) and k = 50 achieves 40.0 MRR@10 on MS MARCO dev, on par with ColBERTv2 and PLAID ColBERTv2.
The mean response time for SPLATE (R) with (kq, kd) = (5, 50) is only 2.9ms, compared to 186ms reported for the PLAID ColBERTv2 pipeline.
Citas
"SPLATE is motivated by two core ideas: 1. PLAID [38] draws inspiration from traditional BoW retrieval to optimize the late interaction pipeline; 2. dense embeddings can seemingly be mapped to the vocabulary space [36]."
"By adapting ColBERT's frozen dense representations with a SPLADE module, SPLATE aims to approximate late interaction scoring with an efficient sparse dot product."