Core Concepts
Integrating expert annotations in the form of radiologist eye-gaze heatmaps enhances multi-modal contrastive learning in medical imaging.
Abstract
The eCLIP model improves contrastive multi-modal medical imaging analysis by integrating expert annotations, addressing data scarcity and modality gap challenges. It efficiently utilizes scarce expert annotations through mixup augmentation, showcasing consistent improvements in embedding quality across various tasks. The model's operational workflow includes a heatmap processor and mixup strategy without altering the core architecture of CLIP. Through detailed evaluations, eCLIP demonstrates enhanced alignment and uniformity, proving its capability to harness high-quality annotations for enriched multi-modal analysis in the medical imaging domain.
Stats
Models like CLIP have been trained on internet-scale datasets estimated to encompass hundreds of millions of image-text pairs.
The Open-I dataset includes X-rays paired with corresponding radiology reports.
The MIMIC-CXR dataset pairs chest X-rays with free-text radiology reports.
The EGD-CXR dataset provides normalized eye-gaze heatmaps for 1080 datapoints.
Quotes
"We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps."
"eCLIP showcases consistent improvements in embedding quality across several tasks."
"Processing the eye-gaze data from radiologists provides heatmaps indicative of clinical interest areas aligned with details present in radiology reports."