Nasiriany, S., Kirmani, S., Ding, T., Smith, L., Zhu, Y., Driess, D., Sadigh, D., & Xiao, T. (2024). RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation. arXiv preprint arXiv:2411.02704.
This research paper explores the use of affordances as intermediate representations for robot manipulation policies to improve generalization and data efficiency in learning new tasks.
The researchers developed RT-Affordance (RT-A), a hierarchical model that first predicts an affordance plan based on language instructions and the initial scene. This plan, represented as a sequence of robot end-effector poses at key task stages, is then used to condition the robot's policy. The model is trained using a combination of robot trajectories, web-scale datasets with spatial and affordance labels, and a small set of in-domain images annotated with affordances.
Affordances offer a powerful intermediate representation for robot manipulation policies, enabling improved generalization, data efficiency, and robustness. The hierarchical approach of RT-Affordance, combined with leveraging diverse data sources, presents a promising direction for scalable and generalizable robot learning.
This research contributes to the field of robot learning by introducing a novel approach to policy representation that addresses limitations of existing methods. The use of affordances and the ability to learn from readily available data sources have the potential to significantly advance the development of more versatile and adaptable robots.
While RT-Affordance demonstrates strong performance, future work could explore generalization to entirely novel motions or skills beyond the training data. Additionally, investigating the integration of affordances with other policy representations could further enhance robot capabilities.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Soroush Nasi... at arxiv.org 11-06-2024
https://arxiv.org/pdf/2411.02704.pdfDeeper Inquiries