The Laser Learning Environment (LLE) is a collaborative multi-agent reinforcement learning environment where coordination is central to the agents' success. LLE has three key properties:
The authors test several state-of-the-art value-based MARL algorithms, including Independent Q-Learning (IQL), Value Decomposition Network (VDN), and QMIX, on LLE. The results show that these algorithms consistently fail at the collaborative task due to their inability to escape the state space bottlenecks, even though they successfully achieve perfect coordination.
The authors discuss how Q-learning extensions, such as prioritized experience replay and n-step returns, hinder exploration in environments with zero-incentive dynamics. They also find that intrinsic curiosity with random network distillation is not sufficient to help agents escape the state space bottlenecks.
The authors conclude that LLE is a relevant benchmark for future work in cooperative MARL and highlights the need for novel methods to solve this problem.
To Another Language
from source content
arxiv.org
Deeper Inquiries