The paper introduces the LABOR (LAnguage-model-based Bimanual ORchestration) agent, which utilizes a Large Language Model (LLM) to orchestrate the control of a bimanual robot, the NICOL, in solving various everyday tasks.
The key highlights are:
The LABOR agent adopts two types of control policies for the bimanual robot - sequential control for uncoordinated and asymmetric processes, and simultaneous control for symmetric processes. The LLM is used to explicitly choose the appropriate control policy at each step.
The LLM is provided with a prompt containing background information about the robot's workspace and manipulation rules, as well as a skill library of primitive actions. Based on this, the LLM generates a chain of skills to accomplish the task, iteratively correcting any mistakes.
The LABOR agent is evaluated on the NICOL robot in three everyday tasks - ControlScissors, ServeWater, and HoldBowl. Experiments show that the LABOR agent with the GPT-4 LLM achieves high success rates, with failures mainly due to challenges in spatial reasoning, temporal coordination, and skill association.
Compared to a simpler LLM like GPT-3.5, the GPT-4-based LABOR agent demonstrates remarkable performance in orchestrating long-horizon bimanual manipulation, generating skill chains comparable to human-designed optimal solutions.
The work showcases the potential of leveraging powerful LLMs to enable versatile bimanual control in complex robotic tasks, paving the way for more advanced robot capabilities.
翻譯成其他語言
從原文內容
arxiv.org
深入探究