Core Concepts
Introducing the DualAdapter approach to enhance few-shot learning and domain generalization in Vision-Language Models.
Abstract
This article introduces the innovative concept of dual learning in fine-tuning Vision-Language Models (VLMs) through the DualAdapter approach. It focuses on both positive and negative perspectives to improve recognition accuracy in downstream tasks. The article discusses the challenges faced by current VLMs, the design of the DualAdapter, its inference process, and a similarity-based label refinement technique. Extensive experimental results across 15 datasets validate the effectiveness of DualAdapter in outperforming existing methods.
Structure:
Introduction to Large-scale pre-trained Vision-Language Models (VLMs)
Challenges faced by current VLMs in transferring to downstream tasks
Introduction of DualAdapter approach for few-shot adaptation of VLMs from positive and negative perspectives
Inference process of DualAdapter for unified predictions using both positive and negative adapters
Similarity-based label refinement technique to address noisy samples during few-shot adaptation
Experimental results validating the effectiveness of DualAdapter across 15 diverse recognition datasets.
Stats
"Our extensive experimental results across 15 datasets validate that the proposed DualAdapter outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks while achieving competitive computational efficiency."
"Code is available at https://github.com/zhangce01/DualAdapter."