ACC-ViT introduces Atrous Attention, combining regional and sparse attention for improved information consolidation. Inspired by atrous convolution, it balances local and global information effectively. The model outperforms MaxViT on ImageNet-1K with fewer parameters. Evaluation across tasks like finetuning, linear probing, and zero-shot learning shows ACC-ViT's versatility. Ablation study highlights the importance of shared MLP layers and adaptive gating for performance improvement.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések