Navigation with VLM Framework: Achieving Human-Level Exploration in Open Scenes for Any Language Goal
This paper introduces NavVLM, a novel framework that leverages the power of Vision-Language Models (VLMs) to enable agents to navigate towards any language goal, specific or non-specific, in open scenes without prior training, achieving human-level exploration performance.