The paper introduces a potential outcome framework to model the interference caused by data training loops in A/B tests. It demonstrates that the standard data-driven pipeline in recommendation systems, where user interactions are fed back into the training of machine learning models, can lead to violations of the Standard Unit Treatment Value Assumption (SUTVA) and introduce bias in A/B test estimates.
To address this challenge, the paper proposes a weighted training approach. The key idea is to train an additional model that predicts the probability of each data point appearing in either the treatment or control group. These predicted probabilities are then used to assign weights to the training data, allowing the machine learning models to be trained in a way that effectively recovers the original treatment and control data distributions.
The paper provides theoretical justification for the proposed approach, showing that the weighted training method achieves the minimum variance among all estimators that do not cause shifts in the training distributions. Extensive simulation studies demonstrate the lower bias and variance of the weighted training approach compared to other methods, such as data pooling, snapshot, and data splitting.
The paper also discusses the potential limitations of the data splitting method, which may suffer from high variance due to reduced data efficiency and compromised external validity. Additionally, the weighted training approach is shown to incur only slightly higher experimentation costs compared to the global treatment and control regimes, while the data splitting method exhibits the highest costs.
To Another Language
from source content
arxiv.org
Djupare frågor