生物学的進化に着想を得て、視覚変換器の合理性を進化アルゴリズムとの類推により説明し、効果的なEA変種からヒントを得て、新しいピラミッド型EATFormerアーキテクチャを提案した。
The authors propose a novel pyramid EA-inspired Vision Transformer (EATFormer) that achieves state-of-the-art performance on various computer vision tasks. The key innovations include an EA-based Transformer (EAT) block, a Global and Local Interaction (GLI) module, a Multi-Scale Region Aggregation (MSRA) module, a Modulated Deformable MSA (MD-MSA), and a Task-Related Head (TRH).