Efficient Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy
The core message of this paper is to introduce a novel doubly-robust off-policy evaluation (OPE) estimator, called DRUnknown, that can efficiently estimate the value of a target policy when both the logging policy and the value function are unknown.