Core Concepts
Proposing a multiplayer bandit model for IoT systems to maximize data rates while ensuring fairness.
Abstract
In this technical report, a multiplayer multi-armed bandit model is introduced for intelligent Internet of Things (IoT) systems. The model focuses on facilitating data collection and incorporating fairness considerations. A distributed cooperative bandit algorithm, DC-ULCB, is designed to enable servers to collaboratively select sensors to maximize data rates while maintaining fairness. Extensive simulations validate the superiority of DC-ULCB over existing algorithms in maximizing reward and ensuring fairness. The paper also discusses related works in the field of multi-player multi-armed bandits and distributed cooperative learning algorithms.
The proposed MMAB model considers server communications and fair sensor selection for intelligent IoT systems. The DC-ULCB algorithm allows servers to select sensors fairly and efficiently, maximizing total rewards. The regret analysis shows that DC-ULCB achieves logarithmic reward/fairness regret upper bounds. Simulations demonstrate the effectiveness of DC-ULCB compared to other algorithms, showcasing its superior performance in both reward regret and fairness regret.
The study explores the impact of connection probability on DC-ULCB's performance, showing that higher connectivity results in lower regret. Additionally, the algorithm's performance with and without fairness considerations is evaluated, highlighting the importance of fair sensor selection in achieving lower regret and fewer collisions.
Stats
λ1 = 1
δ0/N ≤ δ0/M
ϵg decreases as |λx| decreases
Quotes
"DC-ULCB significantly outperforms all competitors in both reward regret and fairness regret."
"DC-ULCB achieves order-optimal logarithmic reward/fairness regret upper bounds."