A Comprehensive Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
PDF-MVQA is a new dataset that enables the examination of semantically hierarchical layout structures in text-dominant documents, allowing the development of innovative models capable of navigating and interpreting real-world documents at a multi-page or entire document level.