ACC-ViT introduces Atrous Attention, combining regional and sparse attention for improved information consolidation. Inspired by atrous convolution, it balances local and global information effectively. The model outperforms MaxViT on ImageNet-1K with fewer parameters. Evaluation across tasks like finetuning, linear probing, and zero-shot learning shows ACC-ViT's versatility. Ablation study highlights the importance of shared MLP layers and adaptive gating for performance improvement.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Nabil Ibteha... às arxiv.org 03-08-2024
https://arxiv.org/pdf/2403.04200.pdfPerguntas Mais Profundas