Enhancing Few-Shot Relation Extraction with Visual Information
The author proposes a multi-modal few-shot relation extraction model that leverages both textual and visual semantic information to improve performance significantly. By integrating image-guided attention, object-guided attention, and hybrid feature attention, the model captures the semantic interaction between visual regions of images and relevant texts.