Optimal Policy Learning with Observational Data: Estimating Reward, Accounting for Risk, and Potential Limitations
This paper discusses optimal policy learning (OPL) with observational data, focusing on estimation, risk preference, and potential failures. It provides a review of key approaches to estimating the reward function and optimal policy, analyzes the impact of decision-maker's risk preferences on the optimal choice, and highlights limitations of data-driven decision-making.