Self-Explainable Affordance Learning with Embodied Captions for Robotic Manipulation
The core message of this work is to introduce the novel concept of Self-Explainable Affordance (SEA) Learning, which enables robots to not only localize affordance regions in objects but also generate corresponding embodied captions to articulate their intended actions and objects. This approach addresses key challenges in visual affordance learning, such as action ambiguity and multi-object complexity.