Visual Grounding Methods for Visual Question Answering Fail to Improve Performance for the Right Reasons
Existing visual grounding methods for Visual Question Answering (VQA) do not actually improve performance through better visual grounding, but rather through a regularization effect that prevents overfitting to linguistic priors.