Główne pojęcia
The robustness of state-of-the-art tabular question answering models is limited when applied to scientific tables and associated text, as demonstrated by the low performance on the SciTabQA dataset.
Streszczenie
The authors introduce a new dataset called "SciTabQA" for evaluating tabular question answering (QA) models on scientific tables and associated text. The dataset consists of 822 question-answer pairs from 198 table-description pairs in the computer science domain.
The authors benchmark three state-of-the-art tabular QA models - TAPAS, TAPEX, and OmniTab - on the SciTabQA dataset. They find that the best F1 score is only 0.462, indicating the challenging nature of the dataset.
Surprisingly, the authors observe that adding the table caption and description information to the input actually degrades the overall performance of the models. However, they find that for questions that require both textual and tabular information to answer, adding the caption and description helps improve the performance.
The authors also analyze the effect of input truncation on the models' performance and find that it is a significant factor, especially for the TAPAS model. They further show that the models trained on the SciTabQA dataset perform much better than those directly transferred from the WikiTableQuestions dataset.
Statystyki
The dataset consists of 822 question-answer pairs from 198 table-description pairs in the computer science domain.