Large multimodal models like GPT-4V can serve as powerful generalist web agents, as demonstrated by SEEACT's integration of visual understanding and acting on the web.
Large language model-based agents are evaluated for their ability to perform knowledge work tasks on the ServiceNow platform, highlighting a performance gap and the need for further exploration.
Large Multimodal Models (LMMs) empower WebVoyager to excel in real-world web tasks.