Core Concepts
Increasing the information density of prompts can significantly improve the generalization ability of vision-language models, while drastically reducing the number of tunable parameters.
Abstract
The paper proposes a novel concept called "Information Density" (ID) to quantify the concentration of essential and non-redundant information in the prompt matrix. The authors observe a strong correlation between the ID of prompts and the generalization performance of vision-language models during fine-tuning.
Inspired by this observation, the authors introduce Dense Information Prompt (DIP), which aims to enhance the information density of prompts to improve generalization. DIP achieves this by decomposing the prompt matrix into a product of two smaller matrices, significantly reducing the number of tunable parameters compared to classic prompt tuning methods.
The paper also introduces a special initialization method and a lightweight regularization module to further improve the performance of DIP without increasing the parameter count or inference cost. Comprehensive experiments on various tasks, including base-to-new generalization, domain generalization, cross-dataset transfer, and few-shot learning, demonstrate the superiority of DIP over state-of-the-art prompt tuning methods, while using only a fraction of the parameters.
Key highlights:
- Proposed the concept of "Information Density" (ID) to quantify the concentration of essential information in the prompt matrix.
- Observed a strong correlation between ID and the generalization performance of vision-language models.
- Introduced Dense Information Prompt (DIP) to enhance the ID of prompts, improving generalization while drastically reducing the number of tunable parameters.
- Proposed a special initialization method and a lightweight regularization module to further boost the performance of DIP.
- Comprehensive experiments show DIP outperforms state-of-the-art prompt tuning methods across various tasks, using only a fraction of the parameters.
Stats
The paper reports the following key metrics:
Accuracy on base and new classes in the base-to-new generalization setting
Average accuracy across 11 datasets in the base-to-new generalization setting
Accuracy on target datasets in the domain generalization setting
Accuracy on 10 datasets in the cross-dataset transfer setting
Few-shot learning accuracy on various shot numbers
Quotes
"Increasing the information density and thus using the fewest but most essential parameters to finish generalization, without causing catastrophic forgetting or overfitting to the small dataset."
"DIP aims to enhance the model's generalization ability by increasing information density. Additionally, due to the increased information carried by each parameter unit, our approach can significantly reduce the required number of parameters."
"We are the first to explore how to effectively adapt vision-language models using an extremely small number of parameters, i.e. 0.5K."