Core Concepts
Language models can effectively collaborate with each other and human partners to complete complex tasks that require coordination, communication, and task planning.
Abstract
The paper introduces a collaborative blocks world environment (COBLOCK) to evaluate the collaboration abilities of language models (LLMs). In COBLOCK, two agents (either human or LLM) with complementary goals and skills work together to build a target structure.
The key highlights and insights are:
The COBLOCK environment has three types of collaboration tasks with increasing levels of interdependence between the agents' goals and skills: independent tasks, skill-dependent tasks, and goal-dependent tasks.
To guide the LLM agents in COBLOCK, the authors propose a chain-of-thought (CoT) prompting approach that includes three key steps:
Modeling the partner agent's state and intent to understand their needs
Reflecting on past actions and communication to identify and correct errors
Predicting the next action based on the world state, partner state, and reflection
Experiments show that the baseline LLM agents struggle in the skill-dependent and goal-dependent tasks due to issues like prioritizing partner's goals over their own. However, the proposed approach with partner-state modeling and self-reflection significantly improves the collaboration performance, leading to higher task success rates, better workload balance, and fewer completion timesteps.
While human-machine collaboration has slightly higher success rates than machine-machine collaboration, humans often take on more responsibility when the LLM agent struggles, especially in the more challenging goal-dependent tasks.
The findings and the COBLOCK environment provide valuable insights and resources for future research on communication, coordination, and collaboration in multi-agent settings.
Stats
The number of blocks in the target structure is 10.
The number of unique colors in the target structure is 6.
Quotes
"To test LLM's ability to collaborate, we design a blocks-world environment, where two agents, each having unique goals and skills, build a target structure together."
"We further adopt chain-of-thought prompts that include intermediate reasoning steps to model the partner's state and identify and correct execution errors."
"Both human-machine and machine-machine experiments show that LLM agents have strong grounding capacities, and our approach significantly improves the evaluation metric."