Task-Oriented Hierarchical Object Decomposition for Visuomotor Control in Robotics (HODOR)
This paper introduces HODOR, a novel, task-oriented, hierarchical, object-centric visual representation for robot manipulation that enhances learning efficiency and out-of-distribution generalization by leveraging pre-trained vision and language models to selectively represent task-relevant scene entities at multiple levels of detail.