Core Concepts
Unsupervised Multilingual Dense Retrieval using Generative Pseudo Labeling enhances multilingual information access.
Abstract
Dense retrieval methods show promise in multilingual information retrieval.
UMR introduces an unsupervised approach for training multilingual dense retrievers.
The framework consists of two stages: unsupervised multilingual reranking and knowledge-distilled retriever training.
Experimental results on XOR-TYDI QA demonstrate the effectiveness of UMR.
Contributions include proposing UMR, outperforming supervised baselines, and analyzing the impact of different components.
Stats
Dense retrieval methods haben vielversprechende Leistungen in der mehrsprachigen Informationssuche gezeigt.
UMR führt einen unüberwachten Ansatz für das Training mehrsprachiger dichter Retriever ein.
Experimentelle Ergebnisse zu XOR-TYDI QA zeigen die Wirksamkeit von UMR.
Quotes
"Our approach leverages the sequence likelihood estimation capabilities of multilingual language models to acquire pseudo labels for training dense retrievers."
"UMR outperforms supervised baselines, showcasing the potential of training multilingual retrievers without paired data."