Concetti Chiave
ZSC-Eval, a comprehensive toolkit and benchmark, generates diverse evaluation partners, selects representative partners, and measures generalization performance to accurately evaluate multi-agent zero-shot coordination capabilities.
Sintesi
The paper introduces ZSC-Eval, a comprehensive evaluation toolkit and benchmark for multi-agent zero-shot coordination (ZSC) algorithms. ZSC is a cooperative multi-agent reinforcement learning challenge that aims to train an agent to work with diverse, unseen partners during deployment.
Key highlights:
- Generation of evaluation partner candidates: ZSC-Eval generates a diverse set of behavior-preferring agents using event-based reward functions to approximate the distribution of deployment-time partners.
- Selection of evaluation partners: ZSC-Eval selects a representative subset of evaluation partners by maximizing the Best Response Diversity (BR-Div), ensuring the selected partners require diverse skills to coordinate with.
- Measurement of ZSC capability: ZSC-Eval introduces the Best Response Proximity (BR-Prox) metric, which measures the performance similarity between the ego agent and the approximate best responses to the evaluation partners, capturing the generalization gap and balancing the evaluation results.
The paper verifies the effectiveness of ZSC-Eval by demonstrating that the generated evaluation partners exhibit more diverse high-level behaviors than those used in current evaluation methods. It also shows that the evaluation results from ZSC-Eval are highly consistent with human evaluation in the Overcooked environment.
The paper further provides benchmark results of current ZSC algorithms in Overcooked and Google Research Football, and analyzes the failure of these algorithms to generate enough diverse expert training partners. It also proposes guidelines for designing ZSC testbeds, suggesting that increasing the complexity of coordination and task requirements is crucial for effectively evaluating ZSC capabilities.
Statistiche
The paper does not contain any specific numerical data or statistics to extract. The key insights are qualitative in nature, focusing on the design and evaluation of the ZSC-Eval toolkit.
Citazioni
"ZSC-Eval consists of: 1) Generation of evaluation partner candidates through behavior-preferring rewards to approximate deployment-time partners' distribution; 2) Selection of evaluation partners by Best-Response Diversity (BR-Div); 3) Measurement of generalization performance with various evaluation partners via the Best-Response Proximity (BR-Prox) metric."
"We use ZSC-Eval to benchmark ZSC algorithms in Overcooked and Google Research Football environments and get novel empirical findings."
"We also conduct a human experiment of current ZSC algorithms to verify the ZSC-Eval's consistency with human evaluation."