Centrala begrepp
Constructing PubMedVision, a large-scale, high-quality medical multimodal dataset, to significantly boost the multimodal capabilities of language models in medical applications.
Sammanfattning
The authors propose a method to construct PubMedVision, a large-scale, high-quality medical multimodal dataset, to enhance the performance of multimodal language models (MLLMs) in medical applications.
Key highlights:
- The authors refined high-quality data from numerous medical image-text pairs on PubMed and employed an MLLM-powered reformatting method to enhance the data quality, resulting in the PubMedVision dataset with 1.3 million medical visual question-answering (VQA) samples.
- Experiments show that PubMedVision significantly boosts the multimodal capabilities of MLLMs, with marked improvements on medical VQA benchmarks, the MMMU Health & Medicine track, and traditional medical imaging tasks.
- The authors developed HuatuoGPT-Vision, a specialized 34B-parameter medical MLLM, which demonstrates superior performance on multiple medical multimodal benchmarks compared to other open-source models.
- The authors conducted expert evaluations and empirical tests to validate the superior data quality of PubMedVision compared to other data construction methods.
Statistik
"Density of the cystic lesion is 2.4 Hounsfield Unit (HU)."
"Only the head and uncinate segment of the pancreas was visualized and the hypodense unilocular cystic lesion was revealed at the head of pancreas ( Fig. 3 )."
Citat
"The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs."
"To tackle this, we refined medical image-text pairs from PubMed and employed MLLMs (GPT-4V) in an 'unblinded' capacity to denoise and reformat the data, resulting in the creation of the PubMedVision dataset with 1.3 million medical VQA samples."