The paper presents the GSON framework, which integrates RGB cameras and 2D LiDAR data for dynamic crowd perception, combining human detection, foot detection, and tracking. The key innovation is the use of LMMs to enable zero-shot reasoning of social structure among pedestrians, which is then leveraged by the planning system to generate socially aware navigation.
The perception module first builds a robust pipeline for pedestrian detection and tracking, then applies visual prompting techniques with LMMs to predict the social relationships and grouping of individuals in the scene. The planning module consists of a global path planner, a mid-level planner, and a local motion planner. The mid-level planner uses the estimated social group information to update the cost map and generate a revised path that avoids disrupting the social structure. The local planner then generates a safe trajectory for the robot to execute, using a combination of Model Predictive Control and Control Barrier Functions.
The proposed method is validated through extensive experiments in simulation and the real world, involving complex social interaction scenarios. The results demonstrate that GSON outperforms baseline methods in minimizing perturbations to the social structure, while maintaining comparable performance on traditional navigation metrics.
To Another Language
from source content
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Shangyi Luo,... : arxiv.org 09-27-2024
https://arxiv.org/pdf/2409.18084.pdfDaha Derin Sorular