The paper presents BehAV, a novel approach for autonomous robot navigation in outdoor scenes, guided by human instructions and leveraging Vision-Language Models (VLMs). The key components of BehAV are:
Human Instruction Decomposition: BehAV uses a Large Language Model (LLM) to decompose high-level human instructions into navigation actions, navigation landmarks, behavioral actions, and behavioral targets.
Behavioral Cost Map Generation: BehAV constructs a behavioral cost map that captures both the probable locations of the behavioral targets and the desirability of the associated behavioral actions. This is achieved by using a lightweight VLM (CLIPSeg) to generate segmentation maps for behavioral targets, and then combining them with the behavioral action costs obtained from the LLM.
Visual Landmark Estimation: BehAV utilizes VLMs to identify landmarks from the navigation instructions and generate navigation goals.
Behavior-Aware Planning: BehAV introduces a novel unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. The planner also incorporates a behavior-aware gait-switching mechanism to adjust the robot's gait during specific behavioral instructions.
The evaluation of BehAV on a quadruped robot across diverse real-world scenarios demonstrates a 22.49% improvement in alignment with human-teleoperated actions, as measured by Fréchet distance, and a 40% higher navigation success rate compared to state-of-the-art methods.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor