Core Concepts
AutoVP introduces an end-to-end framework for automating visual prompting design choices, outperforming existing methods by up to 6.7% in accuracy. It serves as a comprehensive benchmark for evaluating visual prompting performance.
Abstract
AutoVP is an innovative framework that automates visual prompting design choices, optimizing input scaling, prompts, pre-trained model selection, and output label-mapping strategies. It outperforms existing methods and provides a unified benchmark for evaluating visual prompting performance across diverse image-classification tasks.
Visual prompting (VP) is introduced as a parameter-efficient fine-tuning approach for adapting pre-trained vision models to downstream tasks. AutoVP addresses the lack of systematic VP frameworks and benchmarks by proposing an end-to-end expandable framework with superior performance. The study explores the impact of various components like input scaling, visual prompts, pre-trained classifiers, and output label mapping on VP performance.
Key highlights include the joint optimization of prompts, selection of pre-trained models, and output mapping strategies in AutoVP. Experimental results demonstrate significant improvements in accuracy compared to existing VP methods across 12 diverse image-classification tasks. AutoVP's modularity allows for seamless integration and extension of different designs for optimal VP performance.
AutoVP's data scalability analysis shows consistent outperformance over Linear Probing (LP) across datasets with varying data percentages. The framework exhibits robustness on corrupted datasets and demonstrates higher accuracy gains on out-of-distribution tasks compared to non-visual prompt approaches.
Stats
Accuracy (%):
SVHN: 93.0
CIFAR10: 87.8
Flowers102: 85.4
EuroSAT: 83.7
Pets: 82.7
GTSRB: 81.5
ISIC: 67.4
CIFAR100: 63.7
UCF101: 55.9
DTD: 54.8
Quotes
"AutoVP introduces an end-to-end framework for automating visual prompting design choices."
"Experimental results demonstrate significant improvements in accuracy compared to existing VP methods."