Core Concepts
Next-generation robotic systems should incorporate pragmatic reasoning to understand user intentions and achieve desired goals.
Abstract
Introduction:
Interactive Object Grasping (IOG) involves identifying and grasping objects through human-robot natural language interaction.
Current IOG systems rely on users specifying the target object's category, limiting understanding beyond literal instructions.
Pragmatic Reasoning:
Humans often convey intentions using context to achieve communicative goals, known as pragmatics.
Proposed Pragmatic-IOG task focuses on interpreting intention-oriented utterances without specifying the target object's category.
System Overview:
PROGrasp system incorporates modules for visual grounding, question asking, object grasping, and answer interpretation for pragmatic inference.
Experimental results show effectiveness in offline (target object discovery) and online (IOG with a physical robot arm) settings.
Dataset:
Intention-oriented Multi-modal Dialogue (IM-Dial) dataset contains images and dialogues regarding everyday objects for training and evaluation.
Implementation Details:
Modules trained using OFALarge model with AdamW optimizer, sequence-to-sequence learning, and specific input/output formats.
Comparative Analysis:
PROGrasp outperforms baselines in both offline experiments (object identification accuracy) and online experiments (object grasping success rate).
Qualitative Analysis:
Visualizations demonstrate PROGrasp's ability to correct target object identification through pragmatic reasoning in multiple rounds of dialogue.
Supplementary Materials:
IM-Dial dataset collection process explained along with implementation details and comparison with multimodal foundation models like GPT-4V(ision).
Stats
PROGraspはオフライン実験でベースラインよりも35%の精度向上を達成しました。
PROGraspはオンライン実験で成功率を17%向上させました。