JDocQA is a large-scale Japanese document question answering dataset that requires understanding of both textual and visual information to answer questions.
Visually guided generative text-layout pre-training enhances document understanding by optimizing hierarchical language and layout modeling objectives.
ViTLP proposes visually guided generative text-layout pre-training to enhance document understanding by optimizing hierarchical language and layout modeling objectives.
Introducing ANLS*, a versatile metric for evaluating generative models in document processing tasks.