insight - Epidemiology - # Selection Bias in Target Trial Emulation

Addressing Selection Bias from Missing Eligibility Criteria in Electronic Health Record-Based Target Trial Emulation Studies with Time-to-Event Outcomes

Core Concepts

Failing to account for missing eligibility criteria in electronic health record-based target trial emulation studies can lead to selection bias, overestimating the true effect of an intervention, particularly when studying time-to-event outcomes.

Abstract

Bibliographic Information: Benz, L., Mukherjee, R., Wang, R. et al. Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials. arXiv preprint arXiv:2406.16830v2 (2024).
Research Objective: To address the issue of selection bias arising from missing eligibility criteria in electronic health record (EHR)-based target trial emulation (TTE) studies, particularly those focusing on time-to-event outcomes.
Methodology: The authors propose an inverse probability weighting (IPW) framework to handle missing eligibility data. They conduct simulation studies based on the DURABLE EHR database, focusing on the effect of bariatric surgery on microvascular outcomes, to evaluate their method. The authors then apply their method to a real-world analysis of bariatric surgery outcomes using the DURABLE database.
Key Findings: Simulations demonstrate that failing to account for missing eligibility criteria can lead to substantial bias in effect estimates. The proposed IPW method effectively mitigates this bias, yielding more accurate estimates of treatment effects. In the data application, accounting for missing eligibility criteria attenuated the estimated effect of bariatric surgery on microvascular outcomes, highlighting the importance of addressing this bias in real-world studies.
Main Conclusions: Missing eligibility criteria pose a significant threat to the validity of EHR-based TTE studies. The proposed IPW approach provides a practical and effective solution for addressing this challenge and improving the accuracy of causal effect estimates.
Significance: This research addresses a crucial gap in the TTE literature by providing a method for handling missing eligibility data, a common challenge in EHR-based studies. This work has significant implications for improving the reliability and validity of observational studies using EHR data.
Limitations and Future Research: The authors acknowledge the reliance on a missing at random assumption for their IPW approach. Future research could explore alternative methods, such as multiple imputation or combinations of IPW and imputation, to handle missing data under less restrictive assumptions. Additionally, investigating the performance of the proposed method in more complex scenarios with multiple sources of bias and missing data would be beneficial.

Customize Summary

Rewrite with AI

Generate Citations

Translate Source

To Another Language

Generate MindMap

from source content

Visit Source

arxiv.org

Stats

Under the shortest lookback windows in the data application, eligibility could not be ascertained for over 80% of subject-trials.
Increasing lookback windows for both BMI and A1c increased the number of subjects for whom eligibility could be ascertained.
Failure to account for the possibility of selection bias due to missing eligibility data may lead one to overstate effect sizes by up to 10%.

Quotes

"Missing data in variables that define study eligibility criteria presents an important challenge."
"Despite the popularity of the target trial framework, very few works have considered the problem of selection bias due to missing eligibility criteria."
"Given that one of the first steps in many retrospective EHR-based observational studies is determining the study eligible population for analysis, any setting where eligibility is not readily ascertainable for all subjects might be susceptible to selection bias."

Key Insights Distilled From

Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials

by Luke Benz, R... at arxiv.org 10-08-2024

https://arxiv.org/pdf/2406.16830.pdf

Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials

Deeper Inquiries

How can researchers leverage emerging data sources, such as patient-generated health data or data from wearable sensors, to mitigate the issue of missing eligibility criteria in EHR-based TTE studies?

Emerging data sources like patient-generated health data (PGHD) and wearable sensor data hold immense potential to address the challenge of missing eligibility criteria in EHR-based target trial emulation (TTE) studies. Here's how:

Enriching Eligibility Variable Data: PGHD, encompassing data actively provided by patients (e.g., through health apps, online surveys, or patient portals), can supplement EHRs with crucial missing information. For instance, patients might regularly log their weight or blood glucose levels in a diabetes management app, filling in gaps present in their EHR data and facilitating more accurate eligibility determination for a TTE study on diabetes interventions. Similarly, wearable sensors can passively collect continuous physiological data like heart rate, sleep patterns, or activity levels, potentially providing valuable insights into eligibility criteria related to physical activity or sleep disorders.

Enhancing Eligibility Ascertainment: Wearable sensor data, by capturing physiological parameters in real-time, can offer a more dynamic and nuanced understanding of a patient's health status compared to intermittent EHR entries. This can be particularly valuable for eligibility criteria that are sensitive to temporal changes, such as activity levels or sleep quality. For example, in a TTE study evaluating the effectiveness of a physical therapy program for patients with osteoarthritis, wearable sensors tracking activity levels could provide objective and continuous data, improving the accuracy of identifying eligible patients compared to relying solely on infrequent clinic visits documented in EHRs.

Validating Eligibility Criteria: PGHD can be instrumental in validating eligibility criteria derived from EHRs. For example, patient-reported outcomes (PROs) captured through surveys or symptom trackers can provide valuable context to clinical measures in EHRs. In a TTE study investigating the impact of antidepressants, incorporating patient-reported depression severity scales could offer a more comprehensive assessment of eligibility compared to relying solely on diagnostic codes or medication prescriptions in EHRs.
However, incorporating these emerging data sources also presents challenges:

Data Integration: Combining data from disparate sources like EHRs, PGHD, and wearables requires robust data integration methods to address variations in data formats, timestamps, and identifiers.

Data Quality and Validity: Ensuring the accuracy, reliability, and validity of PGHD and sensor data is paramount. This involves addressing potential biases in patient reporting, sensor accuracy variations, and missing data patterns.

Privacy and Security: Safeguarding patient privacy and data security is crucial when integrating sensitive health information from multiple sources.
Despite these challenges, the potential of PGHD and wearable sensor data to enhance EHR-based TTE studies is significant. By strategically integrating these data sources and addressing the associated methodological considerations, researchers can improve the accuracy of eligibility determination, reduce selection bias, and strengthen the validity of their findings.

Could the reliance on inverse probability weighting, known to potentially increase variance in estimates, be mitigated by employing alternative approaches like doubly robust estimation methods?

Yes, the potential increase in variance associated with inverse probability weighting (IPW) can be mitigated by employing doubly robust estimation methods. These methods offer increased robustness and efficiency compared to IPW alone, particularly when dealing with missing data or complex confounding structures.
Here's how doubly robust estimation works:

Dual Modeling: Doubly robust methods involve specifying two models:

Treatment Model: Predicts the probability of receiving the treatment (exposure) based on covariates.
Outcome Model: Predicts the outcome based on covariates and treatment status.

Robustness to Misspecification: The key advantage of doubly robust estimation is that it provides consistent estimates of the treatment effect if at least one of the two models (treatment or outcome) is correctly specified. This is in contrast to IPW, which relies solely on the correct specification of the treatment model.

Efficiency Gain: When both models are correctly specified, doubly robust methods can lead to more efficient estimates (i.e., narrower confidence intervals) compared to IPW.
In the context of TTE studies with missing eligibility criteria, doubly robust estimation can be particularly beneficial:

Addressing Selection Bias: Doubly robust methods can be used to adjust for both confounding and selection bias due to missing eligibility data. This involves incorporating the eligibility status as a covariate in both the treatment and outcome models.

Handling Missing Covariates: Doubly robust methods can also accommodate missing data in other covariates, further enhancing their utility in EHR-based studies where missing data is common.
Examples of doubly robust estimation methods include:

Augmented Inverse Probability Weighted (AIPW) Estimators
Targeted Maximum Likelihood Estimation (TMLE)
While doubly robust methods offer advantages, they also come with complexities:

Model Specification: Careful consideration is needed when specifying both the treatment and outcome models, as misspecification of both models can still lead to biased estimates.

Computational Demands: Doubly robust methods can be computationally more intensive than IPW, especially in large datasets.
Overall, doubly robust estimation methods provide a valuable tool for mitigating the limitations of IPW and improving the accuracy and efficiency of treatment effect estimates in TTE studies, particularly when dealing with missing eligibility criteria and complex confounding.

If our understanding of disease progression and treatment eligibility evolves over time, how can TTE studies be designed to adapt to these changes and maintain the validity of their findings?

The evolving nature of medical knowledge, including our understanding of disease progression and treatment eligibility, poses a challenge for the long-term validity of TTE studies. Here are strategies to enhance the adaptability and robustness of TTE designs:

Dynamic Eligibility Criteria: Instead of fixed eligibility criteria, consider incorporating dynamic or time-varying eligibility criteria that can be updated as new knowledge emerges. This might involve:

Regularly reviewing and refining eligibility criteria based on updated clinical guidelines, expert consensus, or emerging research findings.
Using surrogate endpoints or biomarkers that are sensitive to early disease progression or treatment response as part of the eligibility assessment. This allows for adjustments based on evolving understanding of disease markers.

Sensitivity Analyses: Conduct thorough sensitivity analyses to assess the impact of changing eligibility criteria on the study findings. This might involve:

Varying the eligibility criteria within plausible ranges based on current uncertainties in disease definition or treatment guidelines.
Comparing results obtained using different versions of eligibility criteria to gauge the sensitivity of the findings to these changes.

Data-Driven Approaches: Leverage machine learning and data-driven approaches to develop more adaptive and personalized eligibility criteria. This could involve:

Training algorithms on large datasets to identify patterns and predictors of treatment response that can inform eligibility criteria.
Developing risk prediction models that incorporate evolving knowledge of disease progression and treatment effectiveness to guide eligibility assessment.

Prospective TTE Designs: Whenever feasible, consider prospective TTE designs that allow for real-time adaptation to changing eligibility criteria. This might involve:

Pre-specifying a mechanism for updating eligibility criteria based on pre-defined triggers, such as changes in clinical guidelines or the emergence of new treatment options.
Continuously monitoring and reassessing eligibility throughout the study period to ensure alignment with current knowledge.

Collaboration and Data Sharing: Foster collaboration and data sharing among researchers to facilitate the development and validation of evolving eligibility criteria. This can help establish a more robust and generalizable framework for TTE studies in the face of changing medical knowledge.
By embracing these adaptive strategies, researchers can enhance the resilience of TTE studies to the evolving landscape of medical knowledge, ensuring that their findings remain relevant and informative over time.