The paper proposes VLM-Social-Nav, a novel approach for social robot navigation that integrates vision-language models (VLMs) with optimization-based or scoring-based local motion planners and a state-of-the-art perception model.
The key highlights are:
VLM-Social-Nav leverages a VLM to analyze and reason about the current social interaction and generate an immediate preferred robot action to guide the motion planner. This enables robots to detect social entities efficiently and make real-time decisions on socially compliant robot behavior.
The paper introduces a VLM-based scoring module that translates the current robot observation and textual instructions into a relevant social cost term, which is then used by the bottom-level motion planner to output appropriate robot actions.
The approach is evaluated in four different real-world indoor social navigation scenarios. VLM-Social-Nav achieves at least 36.37% improvement in average success rate and 20.00% improvement in average collision rate compared to other methods. The user study also shows that VLM-Social-Nav generates the most socially compliant navigation behavior.
To Another Language
from source content
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Daeun Song,J... lúc arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.00210.pdfYêu cầu sâu hơn