Concetti Chiave
Dynamic Tuning (DyT) improves both parameter and inference efficiency for ViT adaptation.
Sintesi
Dynamic Tuning (DyT) proposes a novel approach to enhance parameter and inference efficiency for Vision Transformers (ViTs) adaptation. By dynamically selecting informative tokens, DyT reduces redundant computation during inference while maintaining performance. Various model variants are explored to find the best practice of DyT. The introduction of a MoE-adapter further enhances token processing efficiency. Experimental results across image, video, and semantic segmentation tasks validate DyT's effectiveness in efficient model adaptation.
Statistiche
DyT achieves comparable or superior performance compared to existing PEFT methods.
DyT evokes only 71% − 85% of FLOPs on the VTAB-1K benchmark.