This paper proposes a distributed online learning algorithm called ε-EXP3 that achieves sublinear regret in multi-stage systems with end-to-end bandit feedback.
The authors propose novel algorithms for distributed estimation and learning that enable networked agents to collectively estimate unknown statistical properties of their privately observed samples while preserving the privacy of their signals and network neighborhoods.