The paper introduces a novel generative model, EHR-D3PM, for synthesizing realistic electronic health record (EHR) data. EHRs are a rich data source enabling numerous applications in computational medicine, but their sensitive nature raises privacy concerns that limit their potential use cases.
The authors explore the use of generative models to synthesize artificial, yet realistic EHRs. While diffusion-based methods have shown promise in generating other data modalities, their applications in EHR generation remain underexplored. The discrete nature of tabular medical code data in EHRs poses challenges for high-quality data generation, especially for continuous diffusion models.
EHR-D3PM leverages a discrete diffusion model to enable both unconditional and conditional generation of synthetic EHR data. The key contributions are:
EHR-D3PM incorporates an architecture that effectively captures feature correlations, enhancing the generation process and achieving state-of-the-art performance, particularly in generating instances of rare medical conditions.
EHR-D3PM is extended to conditional generation, using energy-guided Langevin dynamics at the latent layer to generate EHR samples related to particular medical conditions.
Experiments demonstrate that synthetic EHR data generated by EHR-D3PM yields comparable performance to real data in downstream predictive tasks, and can enhance model performance when combined with real data.
EHR-D3PM significantly outperforms existing generative baselines on comprehensive fidelity and utility metrics while maintaining less membership vulnerability risks.
To Another Language
from source content
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Zixiang Chen... lúc arxiv.org 04-19-2024
https://arxiv.org/pdf/2404.12314.pdfYêu cầu sâu hơn