Core Concepts
Using linear regression (including its double-machine-learning variant, the Partially Linear Model) to rank treatment effectiveness can lead to incorrect conclusions when treatment effect heterogeneity is present, making alternative methods like Augmented Inverse-Propensity Weighting (AIPW) more suitable for such scenarios.
This research paper investigates the limitations of linear regression, particularly the Partially Linear Model (PLM), in ranking treatment effectiveness when treatment effects vary across individuals (treatment effect heterogeneity).
Research Objective:
The paper examines whether the rankings of treatment effects derived from PLM, a popular method in causal inference, align with the true rankings of average treatment effects, especially in the presence of treatment effect heterogeneity.
Methodology:
The authors use a combination of theoretical analysis, numerical examples, and Monte Carlo simulations to demonstrate the potential for ranking reversals when using PLM. They compare PLM's performance with AIPW, a more robust estimator under heterogeneity.
Key Findings:
The study reveals that PLM can produce Weighted Average Treatment Effects (WATE) with rankings that contradict the true rankings of Average Treatment Effects (ATE), leading to "ranking reversals." This discrepancy arises from the overlap-weighting inherent in linear models, which assigns greater weight to individuals with propensity scores closer to 0.5. The paper derives a necessary and sufficient condition for these ranking reversals to occur, highlighting the role of treatment effect heterogeneity and the covariance between regression weights and treatment effects.
Main Conclusions:
The authors conclude that while PLM is a valuable tool in causal inference, it can be unreliable for ranking treatments when substantial treatment effect heterogeneity exists. They recommend AIPW as a more appropriate alternative in such cases, as it provides consistent ATE estimates regardless of heterogeneity.
Significance:
This research has significant implications for decision-making in fields like healthcare, economics, and online platforms where accurate treatment ranking is crucial. It cautions against relying solely on PLM for ranking treatments and emphasizes the importance of considering treatment effect heterogeneity.
Limitations and Future Research:
The study primarily focuses on binary treatments and a single binary covariate. Future research could explore the generalizability of these findings to more complex scenarios with multiple treatments, continuous covariates, and different heterogeneity patterns. Additionally, investigating the performance of other causal inference methods under heterogeneity would be beneficial.
Stats
With linear propensity scores, PLM regression coefficients were calculated as 2.7714 and -1.8095 for treatments 1 and 2 respectively.
In contrast, IPW or AIPW correctly recovered the ATEs of 0 and 0.5 for treatments 1 and 2 respectively.