Large multimodal models like GPT-4V can serve as powerful generalist web agents, as demonstrated by SEEACT's integration of visual understanding and acting on the web.