In this study, the authors explore the integration of epsilon-greedy policy with Thompson Sampling to balance exploration and exploitation in Bayesian optimization. The research focuses on improving the performance of Thompson Sampling by randomly switching between exploration and exploitation strategies based on epsilon values. By incorporating epsilon-greedy policy, the study aims to optimize costly objective functions efficiently. The empirical evaluations demonstrate that epsilon-greedy Thompson Sampling, with an appropriate epsilon value, outperforms traditional methods and competes effectively with other approaches.
Thompson sampling (TS) is a stochastic policy used to address the exploitation-exploration dilemma in multi-armed bandit problems. When applied to Bayesian optimization (BO), TS generates input variable points through random sampling from unknown posterior distributions. The study introduces two extremes of TS for BO: generic TS and sample-average TS, focusing on exploration and exploitation, respectively. By incorporating the epsilon-greedy policy, which randomly switches between these extremes based on a small value of epsilon (ε), the study aims to improve the exploitation strategy of TS.
The research highlights that a proper selection of ε can significantly impact the performance of epsilon-greedy TS. Additionally, varying Ns values show how different numbers of sample paths affect optimization results. The computational cost analysis indicates that for suitable ε values and a sufficient number of sample paths, the method's efficiency is comparable to traditional approaches.
Overall, this study provides valuable insights into enhancing Bayesian optimization techniques by integrating reinforcement learning strategies like epsilon-greedy policies with existing methodologies.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Bach Do,Ruda... om arxiv.org 03-04-2024
https://arxiv.org/pdf/2403.00540.pdfDiepere vragen