Efficient Vision Transformer with Selective Attention Layer Removal
The core message of this paper is that uninformative attention layers in vision transformers can be effectively integrated into their subsequent MLP layers, reducing computational load without compromising performance.