Core Concepts
GPT-4V shows promise but limited effectiveness in interpreting chest radiographs.
Abstract
The study evaluates GPT-4 with Vision (GPT-4V) for generating radiological findings from real-world chest radiographs. It compares the model's performance in zero-shot and few-shot settings, highlighting its ability to detect ICD-10 codes and laterality. The study aims to bridge the gap in applying multimodal LLMs to real-world chest radiographs.
Abstract:
GPT-4V explores automated image-text pair generation.
Introduction:
Importance of generating radiological findings from chest radiographs.
Materials & Methods:
Retrospective study with 100 annotated chest radiographs.
Results:
GPT-4V performance varied between zero-shot and few-shot settings.
Statistical Analysis:
Evaluation metrics used to assess GPT-4V's performance.
Stats
In the zero-shot setting, GPT-4V attained a G&R+/R+ of 12.3% on the NIH dataset and 25.0% on the MIDRC dataset. In few-shot learning, it showed improved performance with an F1 score of 11.1% on the NIH dataset and 34.3% on the MIDRC dataset.