The content discusses the development of SharpeRatio@k as a novel evaluation metric for OPE estimators. It highlights how existing metrics like MSE and Regret fail to capture the risk-return dynamics in policy selection. The experiments conducted demonstrate the effectiveness of SharpeRatio@k in identifying efficient estimators based on their risk-return tradeoff.
Published as a conference paper at ICLR 2024, this study by Haruka Kiyohara et al. from Cornell University and Tokyo Institute of Technology introduces SharpeRatio@k as a new metric for evaluating Off-Policy Evaluation (OPE) estimators. The research addresses the limitations of existing metrics in capturing the risk-return tradeoff during policy selection.
The authors propose SharpeRatio@k to assess OPE efficiency by balancing return and risk when deploying top-k policies. Through benchmark experiments, they show that conventional metrics like MSE and Regret do not adequately evaluate an estimator's performance in terms of risk-return dynamics.
In two example scenarios, SharpeRatio@k distinguishes between high-risk and low-risk OPE estimators effectively, offering valuable insights into policy portfolio formation. The study emphasizes the importance of considering both return and risk in evaluating OPE estimators for optimal policy selection during online deployment.
The research also provides an open-source software named SCOPE-RL that integrates SharpeRatio@k for comprehensive benchmarking experiments on various OPE estimators and RL tasks, focusing on their risk-return tradeoffs.
翻譯成其他語言
從原文內容
arxiv.org
深入探究