Core Concepts
Lightweight ConvNets can efficiently identify discriminative patches, enabling one-step patch pruning with PaPr for faster inference across various architectures.
Abstract
Deep neural networks evolve from ConvNets to ViTs for better accuracy.
PaPr introduces a novel method for patch pruning without re-training.
Lightweight ConvNets excel at identifying key image regions efficiently.
PaPr significantly reduces redundant patches while maintaining accuracy.
Extensive testing shows superior performance over existing methods.
Stats
PaPrは、ビデオで約70%の冗長なパッチを削減し、精度にほとんど影響を与えずに最大3.7倍のFLOPs削減を実現します。
PaPrはToMeの精度を同じ計算予算で4.5%向上させます。
PaPrはViT-Bと同様の計算予算でToMeの精度を追加で22.3%向上させます。