Core Concepts
SMP-BERT, a novel prompt learning method, can effectively extract detailed phenotypic information from Crohn's disease radiology reports in Hebrew, a low-resource language, outperforming traditional fine-tuning approaches.
Abstract
This study introduces SMP-BERT, a novel prompt learning method, to address the challenge of extracting structured information from free-text radiology reports for Crohn's disease (CD) patients, particularly in low-resource languages like Hebrew.
Key highlights:
- The dataset consists of over 8,000 CD radiology reports in Hebrew, with 700 manually annotated for the presence or absence of various phenotypes.
- SMP-BERT leverages a pre-training task called Section Matching Prediction (SMP) to learn the logical connection between the "Findings" and "Impression" sections of radiology reports.
- During inference, SMP-BERT uses prompts to assess the alignment between the "Findings" section and the presence/absence of a specific phenotype, achieving superior performance.
- The SMP-BERT + tuning model outperformed standard fine-tuning, with a 49% improvement in median F1 score and 5% in median AUC.
- SMP-BERT demonstrated robust performance even with limited training data, particularly for rare phenotypes, addressing the common challenge of data imbalance in medical datasets.
- The study highlights the potential of prompt learning techniques to enable efficient and accurate information extraction from radiology reports in low-resource languages, paving the way for more inclusive and scalable AI-driven healthcare applications.
Stats
There are 137 positive instances for "Ileum Bowel Wall Thickening", which is almost half the dataset.
There are only 19 positive cases for "Rectum Bowel Wall Thickening".
Quotes
"SMP-BERT greatly surpassed traditional fine-tuning methods in performance, notably in detecting infrequent conditions (AUC: 0.99 vs 0.94, F1: 0.84 vs 0.34)."
"SMP-BERT empowers more accurate AI diagnostics available for low-resource languages."