The paper introduces a novel approach called IMRL (Integrated Multi-Dimensional Representation Learning) to enhance the robustness and generalizability of behavior cloning (BC) for food acquisition in robotic assistive feeding.
The key highlights are:
IMRL integrates visual, physical, temporal, and geometric representations to provide a richer understanding of foods beyond just surface-level visual information. This includes learning to classify food types and capture their physical properties (e.g., liquid, solid, granular, semi-solid, mixture), modeling the temporal dynamics of acquisition actions, and extracting geometric information like optimal scooping points and bowl fullness.
The enhanced representations enable IMRL to adaptively adjust scooping strategies based on the context, improving the robot's capability to handle diverse food acquisition scenarios, including unseen foods and bowl configurations.
Experiments on a real UR3 robot demonstrate that IMRL achieves up to a 35% improvement in success rate compared to the best-performing baseline. IMRL also shows strong zero-shot generalization abilities, maintaining high performance on unseen foods and bowl types.
Ablation studies confirm the effectiveness of each representation module (visual-physical, temporal, geometric) in enhancing the overall performance of the behavior cloning policy.
The paper addresses key limitations of existing methods that rely primarily on surface-level geometric information derived from visual cues, which lack adaptability and robustness, especially when handling foods with similar physical properties but different appearances.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor