Core Concepts
GEMEL, a Generative Multimodal Entity Linking framework, enhances MEL performance efficiently by leveraging LLMs and visual information.
Abstract
Multimodal Entity Linking (MEL) aims to map mentions with multimodal contexts to entities.
GEMEL proposes a parameter-efficient framework based on LLMs for MEL tasks.
Demonstrates state-of-the-art results on WikiDiverse and WikiMEL datasets.
Mitigates popularity bias in LLM predictions for improved performance.
Compatible with various LLMs and vision encoders.
Stats
GEMEL achieves state-of-the-art results on WikiDiverse and WikiMEL datasets.
With only ∼0.3% of model parameters fine-tuned, GEMEL shows significant accuracy gains.