核心概念
ParFormer enhances feature extraction in transformers by integrating different token mixers and convolution attention patch embedding.
統計資料
This work presents ParFormer as an enhanced transformer architecture.
Our comprehensive evaluation demonstrates that our ParFormer outperforms CNN-based and state-of-the-art transformer-based architectures in image classification.
The proposed CAPE has been demonstrated to benefit the overall MetaFormer architecture, resulting in a 0.5% increase in accuracy.