Benchmarking Multihop Multimodal Internet Agents for Realistic Web Tasks
Autonomous embodied agents can navigate and complete complex user tasks by hopping across evolving real-world multimodal websites, but current state-of-the-art models struggle with long-chain multihop reasoning.