The paper explores continuous-time and state-space optimal stopping (OS) problems from a reinforcement learning (RL) perspective. It begins by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time. To encourage exploration and facilitate learning, the authors introduce a regularized version of the problem by penalizing the performance criterion with the cumulative residual entropy of the randomized stopping time.
The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel. The authors address this through the dynamic programming principle, which enables them to identify the unique optimal exploratory strategy. For a specific real option problem, they derive a semi-explicit solution to the regularized problem, allowing them to assess the impact of entropy regularization and analyze the vanishing entropy limit.
Finally, the authors propose a reinforcement learning algorithm based on policy iteration. They show both policy improvement and policy convergence results for the proposed algorithm.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Jodi Dianett... lúc arxiv.org 10-03-2024
https://arxiv.org/pdf/2408.09335.pdfYêu cầu sâu hơn