The Latent Prompt Assist (LaPA) model leverages latent prompts to filter and extract clinically relevant information from uni-modal and multi-modal features, enabling improved performance in medical visual question answering tasks.
OmniMedVQA is a large-scale and diverse medical visual question answering dataset that can comprehensively evaluate the capabilities of large vision-language models in the medical domain.
A novel vision-language model that integrates a radiology domain-adapted language model and a biomedical vision encoder to achieve state-of-the-art performance on medical visual question answering benchmarks.