The paper presents BehAV, a novel approach for autonomous robot navigation in outdoor scenes, guided by human instructions and leveraging Vision-Language Models (VLMs). The key components of BehAV are:
Human Instruction Decomposition: BehAV uses a Large Language Model (LLM) to decompose high-level human instructions into navigation actions, navigation landmarks, behavioral actions, and behavioral targets.
Behavioral Cost Map Generation: BehAV constructs a behavioral cost map that captures both the probable locations of the behavioral targets and the desirability of the associated behavioral actions. This is achieved by using a lightweight VLM (CLIPSeg) to generate segmentation maps for behavioral targets, and then combining them with the behavioral action costs obtained from the LLM.
Visual Landmark Estimation: BehAV utilizes VLMs to identify landmarks from the navigation instructions and generate navigation goals.
Behavior-Aware Planning: BehAV introduces a novel unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. The planner also incorporates a behavior-aware gait-switching mechanism to adjust the robot's gait during specific behavioral instructions.
The evaluation of BehAV on a quadruped robot across diverse real-world scenarios demonstrates a 22.49% improvement in alignment with human-teleoperated actions, as measured by Fréchet distance, and a 40% higher navigation success rate compared to state-of-the-art methods.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Kasun Weerak... alle arxiv.org 09-26-2024
https://arxiv.org/pdf/2409.16484.pdfDomande più approfondite