The authors present a novel annotated corpus called the Corpus of Annotated Medical Imaging Reports (CAMIR), which includes 609 radiology reports from Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and Positron Emission Tomography-Computed Tomography (PET-CT) modalities. The reports are annotated using a granular event-based schema that captures clinical indications, lesions, and medical problems, with most arguments normalized to predefined SNOMED-CT concepts.
The annotation process involved four medical students, with guidance from senior radiology experts. The corpus exhibits high inter-annotator agreement, exceeding 0.70 F1 for most argument types. Exceptions include Size Trend, Count, and Characteristic, which are relatively infrequent or linguistically diverse.
To extract the CAMIR events, the authors explored two BERT-based language models: mSpERT, which jointly extracts all event information, and PL-Marker++, a multi-stage approach that the authors augmented for the CAMIR schema. PL-Marker++ achieved the highest overall performance, significantly outperforming mSpERT, with an F1 score of 0.759 on the held-out test set.
The authors discuss the quality of the annotations, the model performance, and the validation of the span overlap evaluation criterion used. They also highlight the potential for CAMIR to support a wide range of secondary-use applications in the radiology domain, such as cohort discovery, epidemiology, image retrieval, automated follow-up tracking, computer-vision applications, decision support, and report summarization.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문