核心概念
Large multimodal models like GPT-4V can serve as powerful generalist web agents, as demonstrated by SEEACT's integration of visual understanding and acting on the web.
統計資料
"GPT-4V presents a great potential for web agents—it can successfully complete 51.1% of tasks on live websites."