المفاهيم الأساسية
Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. However, many questions require retrieving multiple tables and joining them through a join plan that cannot be discerned from the user query itself. The proposed method uncovers useful join relations during table retrieval to improve performance.
الملخص
The paper addresses the problem of table retrieval for open-domain question answering, which is a critical step for leveraging structured tables as a knowledge source. Previous methods assume the answer can be found either in a single table or multiple tables identified through query decomposition or rewriting. However, these approaches are insufficient, as many questions require retrieving multiple tables and joining them through a join plan that cannot be discerned from the user query.
To address this, the paper introduces a method that uncovers useful join relations for any query and database during table retrieval. It uses a novel re-ranking approach formulated as a mixed-integer program that considers not only table-query relevance but also table-table relevance that requires inferring join relationships.
The key components are:
- Query-table relevance: The method computes both coarse-grained and fine-grained relevance scores between the query and candidate tables.
- Table-table relevance: The method computes compatibility scores between tables to infer join relationships, considering both column relevance and the likelihood of satisfying key-foreign key constraints.
- Re-ranking: The method formulates an optimization problem to select the best set of tables by jointly maximizing the query-table relevance and table-table relevance.
The proposed method outperforms state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy on the Spider and Bird datasets.
الإحصائيات
Many questions require retrieving and joining multiple tables, but the join plan cannot be discerned from the user query.
The proposed method outperforms baselines by up to 9.3% in F1 score for table retrieval and up to 5.4% in accuracy for end-to-end QA.
اقتباسات
"To answer a question or verify a fact relevant to structural tables, we may need multiple tables."
"Retrieving multiple tables requires understanding the relationship among candidate tables, which is independent of how the query is framed."
"The relationship among tables needs to be inferred. Previous works in Text-to-SQLs have considered table joins when generating SQL expressions, but they all assumed that the key-foreign-key constraints (join relationships) had already been provided with the database."