The article introduces a novel approach to knowledge-based visual question answering (KB-VQA) by proposing a Knowledge Condensation model and a Knowledge Reasoning model. The Knowledge Condensation model distills relevant information from retrieved lengthy passages, while the Knowledge Reasoning model integrates this condensed knowledge to predict answers accurately. Experimental results show significant performance improvements compared to existing methods on OK-VQA and A-OKVQA datasets.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Dongze Hao,J... at arxiv.org 03-18-2024
https://arxiv.org/pdf/2403.10037.pdfDeeper Inquiries