Zero-Shot Token Pruning for Efficient Vision Transformer Inference
Zero-TPrune, a zero-shot token pruning method, efficiently leverages the attention graph of pre-trained Transformer models to prune unimportant and similar tokens, enabling significant computational savings without accuracy loss.