MeaCap proposes a novel Memory-Augmented framework for zero-shot image captioning, achieving state-of-the-art performance by integrating textual memory and visual-related fusion scores.
提案されたMeaCapフレームワークは、ゼロショット画像キャプショニングにおいて、記憶を活用して高品質なキャプション生成を実現する。
Proposing a novel Memory-Augmented zero-shot image Captioning framework (MeaCap) to generate concept-centered captions with high consistency and less hallucinations.
A solution that leverages retrieval augmentation and caption-level strategies to effectively enhance zero-shot image captioning performance on the NICE 2024 dataset.
새로운 NICE 2024 데이터셋의 스타일과 내용의 차이를 해결하기 위해 검색 증강 및 캡션 등급 부여 방법을 통해 이미지 캡션을 효과적으로 향상시킴.
IFCap, a novel approach for zero-shot image captioning, addresses the modality gap between image and text data by performing Image-like Retrieval and integrating retrieved captions with input features through a Fusion Module. Additionally, it employs Frequency-based Entity Filtering to enhance caption quality by extracting frequently occurring entities from retrieved captions.