Core Concepts
MMAC-Copilot is a collaborative framework that leverages the collective expertise of diverse agents to enhance the interaction capabilities of autonomous virtual agents with operating systems.
Abstract
MMAC-Copilot is a novel framework designed to improve the interaction capabilities of autonomous virtual agents with operating systems. It consists of specialized agents, including Planner, Librarian, Programmer, Viewer, Video Analyst, and Mentor, each with distinct roles and capabilities.
The key highlights of MMAC-Copilot are:
Collaborative Approach: The framework utilizes a team collaboration chain, enabling each agent to contribute insights based on their domain expertise. This approach helps mitigate the hallucination associated with knowledge domain gaps.
Multi-Modal Processing: MMAC-Copilot integrates various modalities, such as text, images, and videos, to provide a more comprehensive understanding of the operating system environment and user requests.
Dynamic Planning and Refinement: The framework employs a dynamic planning process, where the initial coarse-grained plan is continuously refined by the specialized agents based on real-time feedback and visual information.
Benchmark Evaluation: MMAC-Copilot was evaluated on the GAIA benchmark, where it outperformed existing systems by 6.8% on average. Additionally, the framework was tested on the newly introduced Visual Interaction Benchmark (VIBench), which focuses on non-API-interactable applications across diverse domains, showcasing its exceptional performance in managing various methods of interaction within systems and applications.
The results demonstrate MMAC-Copilot's potential in advancing the field of autonomous virtual agents through its innovative approach to agent collaboration and multi-modal processing.
Stats
The MMAC-Copilot achieved an average score of 25.91% on the GAIA benchmark, outperforming the closest competing system, FRIDAY, by 6.8%.
On the Visual Interaction Benchmark (VIBench), MMAC-Copilot achieved an average score of 70.32%, significantly outperforming the previous best system, FRIDAY, which scored 35.07%.
Quotes
"MMAC-Copilot enhances the interaction capabilities of autonomous virtual agents with operating systems by leveraging multi-modality in processing to tasks."
"The team collaboration chain allows participating agents to adapt the initial plan crafted based on their domain expertise, mitigating the hallucination associated with knowledge domain gaps."