Challenges include understanding free-form instructions and generalizing to new environments in a zero-shot manner.
OpenFMNav leverages foundation models for effective language-guided exploration and exploitation.
Introduction:
Object navigation is crucial for robots to interact with objects.
Existing methods face challenges with free-form instructions and generalization to diverse environments.
Related Work:
Embodied navigation tasks vary in goal specifications.
Object navigation is challenging due to semantic recognition requirements.
Method:
ProposeLLM extracts proposed objects from instructions, DiscoverVLM discovers candidate objects, PerceptVLM detects and segments objects, ReasonLLM conducts common sense reasoning.
Experiments:
OpenFMNav outperforms baselines on success rate and SPL metrics.
Ablation studies show the importance of components like GPT-4, CoT prompting, DiscoverVLM, and scoring prompting.
Navigation in the Real World:
Real robot demonstrations validate the method's ability to understand free-form instructions and perform open-set zero-shot navigation effectively.