Core Concepts
The core message of this paper is to enhance the understanding of 3D chest CT images by distilling knowledge from a pre-trained 2D chest X-ray expert model, leveraging language as a high-quality supervision signal to address the limited availability of paired CT-report data.
Abstract
The paper explores the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging, given the limited availability of extensively annotated large-scale multi-disease datasets.
The key highlights are:
The authors propose to bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model. A language-guided retrieval method is used to match each 3D CT image with its semantically closest 2D X-ray image, enabling knowledge distillation.
To address the challenge of similar semantic diagnoses across patients, the authors introduce a robust contrastive learning (RoCo) approach that identifies and corrects false negative pairs. They also use an entity-focused masking (EFM) strategy to enhance the recognition of important entities and attributes in the reports.
The proposed BIUD model, trained on over 12,000 pairs of chest CT images and radiology reports, demonstrates superior performance compared to existing methods across multiple scenarios, including zero-shot learning, report generation, and fine-tuning. Notably, a reader study indicates that the model's zero-shot diagnostic capability rivals that of radiologists in specific tasks.
The authors are the first to show that a vision-language pre-training-based method can achieve performance comparable to an experienced radiologist in diagnosing certain primary diseases in 3D CT imaging, significantly reducing the time and resource expenditure.
Stats
"Over 12,000 pairs of chest CT images and radiology reports" were used to train the BIUD model.
The MIMIC-CXR dataset, containing over 300,000 image-report pairs, was used to train the X-ray expert model.
Quotes
"Radiologists highly desire fully automated versatile AI for medical imaging interpretation. However, the lack of extensively annotated large-scale multi-disease datasets has hindered the achievement of this goal."
"We bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model."
"We introduce a robust contrastive learning that identifies and corrects these false negatives."