Kernekoncepter
TrialSynth, a Variational Autoencoder (VAE) combined with Hawkes Processes, can generate high-fidelity synthetic sequential clinical trial data that outperforms existing methods in terms of downstream utility and privacy preservation.
Resumé
The paper introduces TrialSynth, a novel model that combines Variational Autoencoder (VAE) and Hawkes Process techniques to generate synthetic sequential clinical trial data.
Key highlights:
- Existing methods for generating synthetic clinical trial data have focused on static context information, but many high-value applications require generating synthetic time-sequential event data with high fidelity.
- TrialSynth leverages Hawkes Processes, which are well-suited for modeling event-type and time gap prediction needed to capture the structure of sequential clinical trial data.
- TrialSynth outperforms alternative approaches in generating sequential event data on 7 real-world clinical trial datasets, in terms of both downstream utility (as measured by binary classification ROCAUC) and privacy preservation (as measured by ML Inference Score and Distance to Closest Record).
- The authors also propose two variants of TrialSynth that can leverage additional information about known event types to further improve performance.
- Experiments demonstrate that TrialSynth can generate high-fidelity synthetic data that is hard to distinguish from real data, while preserving patient privacy.
Statistik
The average number of events per patient across the 7 datasets ranges from 4.5 to 36.9.
The proportion of patients who did not experience the death event (positive label) ranges from 1.9% to 95.1%.
Citater
"Analyzing data from past clinical trials is part of the ongoing effort to optimize the design, implementation, and execution of new clinical trials and more efficiently bring life-saving interventions to market."
"Though proposed methods for generating synthetic clinical trial data have focused on static context information for each subject (e.g., demographics), many of the highest value applications, including control arm augmentation require generating synthetic time-sequential event data that has high fidelity."