Core Concepts
ReDel is a new open-source toolkit designed to simplify the creation, analysis, and debugging of recursive multi-agent systems that leverage the power of large language models (LLMs) for complex task solving.
Abstract
This research paper introduces ReDel, a new open-source toolkit for building and analyzing recursive multi-agent systems powered by LLMs. The authors argue that while LLMs are increasingly used for complex tasks, existing tools lack support for recursive multi-agent systems where agents dynamically delegate tasks.
ReDel addresses this gap by providing:
- Tool Usage: A modular interface for developers to create Python-based tools that LLMs can use, such as web browsing or accessing specific databases.
- Delegation Schemes: Built-in and customizable strategies for agents to delegate tasks to sub-agents, enabling different workflows like synchronous or asynchronous delegation.
- Events & Logging: An event-driven architecture that logs all actions and decisions, allowing for detailed post-hoc analysis of the system's behavior.
- Web Interface: A user-friendly interface to interact with the system in real-time, visualize the delegation graph, replay past runs, and debug errors.
The authors demonstrate ReDel's capabilities by evaluating its performance on three benchmarks: FanOutQA, TravelPlanner, and WebArena. Results show that ReDel significantly outperforms single-agent baselines and even surpasses the state-of-the-art in some cases.
Furthermore, the paper highlights two common failure modes in recursive multi-agent systems: overcommitment (an agent tries to handle a task too complex for itself) and undercommitment (an agent delegates a task without doing any work). ReDel's visualization tools help identify these issues, paving the way for future research on improving the robustness of such systems.
The authors conclude by emphasizing ReDel's potential to advance research and development of LLM-powered multi-agent systems, encouraging further exploration and application of this technology.
Stats
ReDel with GPT-4o achieves 67.49% on the CS-Micro metric for TravelPlanner, compared to 61.1% for the previous state-of-the-art.
ReDel with GPT-4o shows a 22.7% overcommitment rate on FanOutQA, while GPT-3.5-turbo exhibits a 40.8% rate on the same benchmark.
ReDel with GPT-3.5-turbo has an undercommitment rate of 44.8% on WebArena, highlighting the challenge of under-delegation.
Quotes
"In a recursive multi-agent system, rather than a human defining the layout of multiple agents, a single root agent is given a tool to spawn additional agents."
"ReDel is the only fully open-source toolkit that supports dynamic multi-agent systems with a rich event-driven base and web interface."
"We find that overcommitment commonly occurs when an agent performs multiple tool calls and fills its context window with retrieved information."