Addressing Challenges in Image-Text Retrieval with Soft-Label Alignment
The author proposes the Cross-modal and Uni-modal Soft-label Alignment (CUSA) method to address inter-modal matching missing and intra-modal semantic loss problems in image-text retrieval, leveraging uni-modal pre-trained models for soft-label supervision signals.