Hierarchical Multimodal Pre-training for Enhanced Webpage Understanding
The authors introduce WebLM, a multimodal pre-training network designed to enhance understanding of visually rich webpages by integrating hierarchical structure. Through empirical results, they demonstrate the superiority of WebLM over previous models in webpage understanding tasks.