Enhancing Web Navigation with Dual-View Contextualized Representations of HTML Elements
Leveraging the visual and textual context of HTML elements, as captured by their "dual view" in webpage screenshots, can significantly improve the performance of web navigation agents.