Visually-Aware Context Modeling for News Image Captioning: Enhancing Caption Generation with Face-Naming Module and CLIP Retrieval
Utilizing visual inputs effectively in News Image Captioning through a face-naming module, CLIP retrieval, and CoLaM improves caption quality significantly.