VLMs can be leveraged for robotic manipulation tasks through a point-based affordance representation, as demonstrated by MOKA.


coremsg

moka-open-vocabulary-robotic-manipulation-through-mark-based-visual-prompting


MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting


title_rewrite


MOKA introduces a novel approach that utilizes Vision-Language Models (VLMs) to solve robotic manipulation tasks specified by free-form language descriptions. By bridging affordance representation with motion generation, MOKA enables effective control of robots in diverse environments.


MOKA-Leveraging-VLMs-for-Robotic-Manipulation-Tasks


MOKA: Leveraging VLMs for Robotic Manipulation Tasks