核心概念
This thesis develops new mathematical tools for statistical sequential decision-making, with a focus on applications to personalized healthcare recommendations, particularly in the context of postoperative patient follow-up.
摘要
This introductory chapter provides an overview of the key mathematical concepts and models underlying statistical sequential decision-making, with a focus on stochastic bandits.
The chapter first introduces the general mathematical setting and notations used throughout the thesis. It then presents an overview of stochastic bandits, which model the learning of an optimal sequence of actions (a policy) by an agent in an uncertain environment to maximize observed rewards.
The chapter also covers the statistical models commonly used to describe the reward distributions in bandit problems, such as sub-Gaussian, bounded, and exponential family distributions.
Finally, the chapter delves into the critical concept of concentration of measure, which is pivotal for the design and analysis of provably efficient bandit algorithms. Original results on Bregman and empirical Chernoff concentration are presented.
Overall, this chapter lays the necessary mathematical foundations for the more advanced topics covered in the subsequent parts of the thesis.