Efficient Image-Text Retrieval via Multi-Teacher Cross-Modal Alignment Distillation
The authors propose a Multi-teacher Cross-modal Alignment Distillation (MCAD) technique to integrate the advantages of single-stream and dual-stream models for efficient image-text retrieval, achieving high performance without increasing inference complexity.