This paper introduces ipd, an open-source R package designed for conducting statistical inference on data predicted by artificial intelligence and machine learning (AI/ML) algorithms.
The increasing use of AI/ML predictions as outcomes in statistical analyses, driven by the rapid advancement of these algorithms and practical constraints, presents significant statistical challenges. Directly using predicted data can lead to biased estimates and inaccurate inferences. The ipd package addresses these challenges by implementing several recent methods for Inference on Predicted Data (IPD).
The package provides a user-friendly wrapper function, ipd
, that allows users to apply various IPD methods, including:
These methods address the challenges of IPD by:
The ipd
function supports various estimands, including population mean, quantiles, and coefficients for linear and logistic regression models.
The ipd package offers several features to facilitate analysis:
simdat
function to generate simulated data for method exploration.The paper provides a simple example demonstrating the package's use for linear regression. It compares the performance of different IPD methods against benchmark regressions (oracle, naive, and classical) using simulated data.
The ipd package provides researchers and practitioners with a valuable tool for conducting valid statistical inference when using AI/ML predicted outcomes. The authors hope that the package will continue to be developed and expanded as the field of IPD advances.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Stephen Sale... at arxiv.org 10-15-2024
https://arxiv.org/pdf/2410.09665.pdfDeeper Inquiries