Keskeiset käsitteet
Formulating the multi-agent graph bandit problem, proposing the Multi-G-UCB algorithm, and analyzing its expected regret.
Tiivistelmä
In this paper, the authors introduce the concept of Cooperative Multi-Agent Graph Bandits, extending the traditional Multi-Armed Bandit problem to a multi-agent setting. The problem involves N cooperative agents traveling on a connected graph with K nodes, where they observe rewards at each node. A learning algorithm called Multi-G-UCB is proposed to manage exploration and exploitation trade-offs efficiently. The theoretical analysis shows that the expected regret of Multi-G-UCB is bounded by O(γN log(T)[√KT + DK]). Numerical simulations demonstrate the algorithm's superior performance compared to alternative methods in various scenarios like drone-enabled internet access and factory production. The paper concludes by highlighting future research directions in decentralized learning scenarios and instance-dependent regret analysis.
Tilastot
The expected regret of Multi-G-UCB is bounded by O(γN log(T)[√KT + DK])