핵심 개념
Large multimodal models like GPT-4V can serve as powerful generalist web agents, as demonstrated by SEEACT's integration of visual understanding and acting on the web.
통계
"GPT-4V presents a great potential for web agents—it can successfully complete 51.1% of tasks on live websites."