Khái niệm cốt lõi
QueryBuilder, an interactive system, allows novice users to efficiently create fine-grained queries for cross-lingual information retrieval by leveraging an English development corpus and a combination of probabilistic and neural information retrieval models.
Tóm tắt
The QueryBuilder system addresses the challenge of rapid and efficient corpus exploration and query generation for users dealing with overarching analytic tasks. It provides a novel user interface that displays relevant sentences to the user and allows the user to provide relevance feedback, which is then used to refine the queries.
The system uses a combination of a fast probabilistic information retrieval model and a BERT-based neural information retrieval model to retrieve relevant sentences. The probabilistic model uses both lexical and event-related features, while the neural model captures high-level semantic meaning.
The user workflow involves two main steps:
- Initial Query Creation: The user uses keywords or phrases to search the English development corpus and identifies a small set of representative relevant sentences.
- Query Enrichment: The user leverages the BERT-based neural IR system to retrieve sentences similar to the existing set of relevant sentences, allowing them to quickly curate a larger and richer set of sentences to provide more context for the query.
Experiments on an Arabic-English cross-lingual information retrieval task show that with a small amount of effort (at most 10 minutes per sub-topic), novice users can form useful fine-grained queries, outperforming the use of just the overarching task by about 12% in terms of nDCG. The system also provides beneficial capabilities compared to the more labor-intensive, expert-driven query generation process followed at NIST.
Thống kê
The English document collection consists of about 750K news articles.
There are 8 overarching analytic tasks, each with 5-9 sub-topics (analytic requests), for a total of 54 analytic requests.
The Arabic document collection consists of about 865K documents.
Trích dẫn
"QueryBuilder performs near real-time retrieval of documents based on user-entered search terms; the user looks through the retrieved documents and marks sentences as relevant to the information needed."
"The final product is a fine-grained query used in Cross-Lingual Information Retrieval (CLIR)."
"Our experiments using analytic tasks and requests from the IARPA BETTER IR datasets show that with a small amount of effort (at most 10 minutes per sub-topic), novice users can form useful fine-grained queries including in languages they don't understand."