Δ-OPE: Off-Policy Estimation with Pairs of Recommendation Policies
The difference between the value of a target recommendation policy and a production policy can often be estimated with significantly reduced variance compared to estimating the value of each policy individually.