Conceitos Básicos
Proposing a novel method for semi-supervised image captioning using Wasserstein Graph Matching to efficiently utilize undescribed images.
Resumo
The content discusses the challenges of image captioning and introduces a novel method, SSIC-WGM, for semi-supervised image captioning using Wasserstein Graph Matching. It addresses the limited availability of described images and the abundance of undescribed images in real-world applications. The method focuses on inter-modal and intra-modal consistency to improve the mapping function between visual and linguistic features.
Index:
- Introduction to Image Captioning
- Challenges in Image Captioning
- Proposed Method: SSIC-WGM
- Encoder-Decoder Model
- Inter-Modal Consistency with Scene Graphs
- Wasserstein Distance on Graphs
- Intra-Modal Consistency with Data Augmentation
- Overall Objective and Loss Function
- Experiments and Results
- Comparison with Baseline Methods
- Ablation Study
Estatísticas
Existing approaches are mostly supervised, but real-world applications have limited described images and many undescribed images.
Proposed SSIC-WGM method uses Wasserstein Graph Matching for semi-supervised image captioning.
SSIC-WGM combines inter-modal and intra-modal consistency for efficient use of undescribed images.
Citações
"Image captioning aims to automatically generate natural descriptions for the given images."
"The key challenge of semi-supervised image captioning is to design reasonable supervisions for qualifying the generated sentences."