Multimodal Medical Answer Generation using Large Language Models: Results from the WangLab Submission to MEDIQA-M3G 2024
The WangLab team explored two standalone solutions for the MEDIQA-M3G 2024 Multilingual and Multimodal Medical Answer Generation shared task, achieving 1st and 2nd place in the English category. The first solution involved two consecutive API calls to the Claude 3 Opus model, while the second trained a joint image-disease label embedding model using CLIP. Both solutions demonstrated the potential of large language models and multimodal approaches for medical visual question answering, but also highlighted the significant challenges in this domain.