WebVoyager is an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. It introduces a new benchmark for evaluating open-ended web agents, showcasing exceptional capabilities and reliability. The agent processes user queries through observations from screenshots and textual content, formulating actions like clicking, typing, or scrolling on websites. By leveraging both visual and textual signals, WebVoyager outperforms baselines in various website tasks. The study also proposes an automatic evaluation protocol using GPT-4V to assess online agents effectively.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Hongliang He... om arxiv.org 03-01-2024
https://arxiv.org/pdf/2401.13919.pdfDiepere vragen