Improved Theoretical Guarantees for Thompson Sampling in Stochastic Bandits
We derive a new problem-dependent regret bound for Thompson Sampling with Gaussian priors that significantly improves the existing bound. Additionally, we propose two parameterized Thompson Sampling-based algorithms, TS-MA-α and TS-TD-α, that achieve a favorable trade-off between utility (regret) and computation (number of drawn posterior samples).