Optimal Policy Learning for Balancing Short-Term and Long-Term Rewards
The core message of this paper is to propose a principled policy learning approach that effectively balances the short-term and long-term rewards, addressing the challenges of confounding bias and missing long-term outcomes.