ACC-ViT introduces Atrous Attention, combining regional and sparse attention for improved information consolidation. Inspired by atrous convolution, it balances local and global information effectively. The model outperforms MaxViT on ImageNet-1K with fewer parameters. Evaluation across tasks like finetuning, linear probing, and zero-shot learning shows ACC-ViT's versatility. Ablation study highlights the importance of shared MLP layers and adaptive gating for performance improvement.
Іншою мовою
із вихідного контенту
arxiv.org
Ключові висновки, отримані з
by Nabil Ibteha... о arxiv.org 03-08-2024
https://arxiv.org/pdf/2403.04200.pdfГлибші Запити