Transformer-based language models face challenges with long sequences.
GRAM extends single-page models to multi-page settings without pretraining.
2. Introduction:
DocVQA research focuses on single-page documents, but MP-DocVQA is more realistic.
Limited attention to MP-DocVQA due to lack of suitable datasets.
3. GRAM:
Introduces doc tokens for global reasoning across pages.
Bias adaptation method enhances the significance of doc tokens during finetuning.
4. Experiments:
GRAM outperforms existing methods on MPDocVQA and DUDE datasets.
Ablation study shows the impact of doc tokens, bias adaptation, and compression transformer on performance.
5. Conclusion:
GRAM efficiently handles multi-page documents without extensive pretraining.
Customize Summary
Rewrite with AI
Generate Citations
Translate Source
To Another Language
Generate MindMap
from source content
Visit Source
arxiv.org
GRAM
Thống kê
"Extensive experiments showcase GRAM’s state-of-the-art performance."
"Proposed NLP-based solutions can be divided into two main directions."
"Introduced document learnable tokens and bias adaptation."
"Results for DUDE can be broken apart to several types of questions."