Data Poisoning Attacks Significantly Degrade the Reliability of Off-Policy Policy Evaluation Methods
Existing off-policy policy evaluation methods are highly vulnerable to small adversarial perturbations in the training data, which can lead to large errors in the estimated policy values.