核心概念
Diversity confers resilience in natural systems, yet traditional multi-agent reinforcement learning techniques often enforce homogeneity. This work introduces a novel metric, System Neural Diversity (SND), to quantify behavioral heterogeneity in multi-agent systems, enabling the measurement and control of diversity.
摘要
This paper introduces System Neural Diversity (SND), a novel metric to measure behavioral heterogeneity in multi-agent reinforcement learning (MARL) systems.
The key highlights are:
- SND is the first diversity metric that can be computed in closed-form for continuous stochastic action distributions, avoiding approximations.
- SND satisfies desirable properties, such as being invariant to the number of equidistant agents and providing a measure of behavioral redundancy.
- Experiments on static and dynamic cooperative multi-robot tasks show that SND enables the measurement of previously unobservable performance and resilience properties of multi-agent systems.
- SND can be used to explicitly control for a target diversity during training, bootstrapping the search for optimal policies and enabling the emergence of novel strategies.
The authors first define a pairwise inter-agent behavioral distance using the Wasserstein metric, which captures the distance between the stochastic action distributions of the agents. They then aggregate these pairwise distances into the system-level SND metric.
The paper compares SND to the state-of-the-art Hierarchic Social Entropy (HSE) metric, showing that SND has desirable properties that HSE lacks. Specifically, SND is invariant to the number of equidistant agents and provides a measure of behavioral redundancy, which HSE does not capture.
Experiments on static tasks, such as a multi-agent goal navigation problem, demonstrate that heterogeneous policies can outperform homogeneous ones when the task requires specialized behaviors. In dynamic tasks, where the environment undergoes repeated disturbances, the authors show that SND can reveal latent resilience skills acquired by the agents, while other proxies like task performance fail to do so.
Finally, the paper shows how SND can be used to control diversity, allowing the enforcement of a desired heterogeneity set-point or range. This paradigm can be used to bootstrap the exploration phase, finding optimal policies faster and enabling novel and more efficient MARL paradigms.
統計資料
The reward is proportional to the reduction in the errors from the reference velocity and team distance every consecutive timestep.
引述
"Diversity is key to collective intelligence (Woolley et al., 2015) and commonplace in natural systems (Kellert, 1997)."
"Just as biologists and ecologists have demonstrated the role of functional diversity in ecosystem survival (Cadotte et al., 2011), it has also been shown to provide resilience and performance benefits in Multi-Agent Reinforcement Learning (MARL) (Bettini et al., 2023)."
"Developing a principled diversity measure would allow us to directly quantify previously unobservable properties of the system (such as resilience) as well as enable its control (e.g., in a closed-loop fashion)."