This paper proposes a distributed online learning algorithm called ε-EXP3 that achieves sublinear regret in multi-stage systems with end-to-end bandit feedback.