High-Resolution Visual Document Assistant (HRVDA): Bridging the Gap Between Multimodal Language Models and Visual Document Understanding
HRVDA bridges the gap between multimodal large language models (MLLMs) and visual document understanding by employing a content filtering mechanism and an instruction filtering module to efficiently process high-resolution document images.