Leveraging Multimodal Large Language Models to Enhance Cross-lingual Cross-modal Retrieval
The authors propose a novel two-stream solution, LECCR, that incorporates multimodal large language models (MLLMs) to improve the alignment between visual and non-English representations in cross-lingual cross-modal retrieval tasks.