Multimodal Prompt Tuning for Efficient Zero-shot Instruction Learning
Multimodal Prompt Tuning (MMPT) is a novel approach that effectively integrates visual and textual prompts into the vision encoder and language processor, respectively, to enable efficient and accurate multimodal adaptation for zero-shot instruction learning.