Core Concepts
Including semantic information in prompts improves the performance of pre-trained vision-language models.
Abstract
The content discusses the importance of semantic information in prompts for pre-trained vision-language models. It introduces a method, CPKP, that leverages ontological knowledge graphs and confounder-pruning to enhance prompt learning. The paper details the architecture, training process, and methodology behind CPKP. It also includes an ablation study comparing CPKP with a variant without confounder-pruning. The algorithm pipeline for training and testing CPKP is outlined.
The content covers:
Introduction to Vision-Language Models
Prompt Design Strategies
Knowledge Graphs and Graph Representation Learning
Methodology: Learnable Knowledge Prompt, Ontology-enhanced Knowledge Embedding, Confounder-pruned Graph Representation, Variants of CPKP
Algorithm Pipeline for Training and Testing CPKP
Stats
"CPKP outperforms the manual-prompt method by 4.64% and the learnable-prompt method by 1.09% on average."
"Empirically, CPKP demonstrates stronger robustness than benchmark methods to domain shifts."
Quotes
"Introducing label-relevant semantic information in prompts boosts the performance of pre-trained vision-language models." - Content