This study investigates the potential of GPT-3.5, a large language model, in generating and coding discharge summaries for data augmentation in automated ICD-10 coding tasks.
The researchers first selected a set of low-population ICD-10 codes from the MIMIC-IV dataset and generated 9,606 synthetic discharge summaries based on their descriptions using GPT-3.5. These synthetic documents were then combined with the original MIMIC-IV training set to create an augmented dataset.
Local neural network models (CAML, LAAT, and Multi-Res CNN) were trained on both the baseline and augmented datasets and evaluated on a held-out test set. The results show that while the overall performance of the augmented models slightly decreased compared to the baseline, their performance on the low-population "generation" codes and their families improved, including correctly predicting one code absent from the original training data. The augmented models also exhibited lower out-of-family error rates, indicating that the synthetic data helped reduce mispredictions outside the relevant code families.
The researchers also evaluated GPT-3.5's ability to directly code discharge summaries, both on real MIMIC-IV data and on the self-generated synthetic data. While GPT-3.5 performed reasonably well on the synthetic data when provided with the code descriptions, its performance on the real MIMIC-IV data was significantly lower, suggesting that the model struggles to identify codes without explicit prompting.
Finally, four clinical experts evaluated the quality of the GPT-3.5-generated discharge summaries. They found that the synthetic documents correctly described the prompted medical conditions and procedures but lacked variety, supporting information, and narrative coherence compared to real discharge summaries. The experts highlighted the need for improvements in generating realistic patient histories, prioritizing critical diagnoses, and maintaining coherence between different aspects of the clinical note.
In conclusion, this study demonstrates the potential of using GPT-3.5 to generate synthetic discharge summaries for data augmentation in automated ICD-10 coding, particularly for improving performance on rare codes. However, the generated documents still fall short of the standards required for clinical use, highlighting the need for further advancements in large language model-based medical text generation.
To Another Language
from source content
arxiv.org
Principais Insights Extraídos De
by Matú... às arxiv.org 09-17-2024
https://arxiv.org/pdf/2401.13512.pdfPerguntas Mais Profundas