The paper investigates the potential of convolutional neural networks (CNNs) for retinal disease (RD) tasks, which have been overshadowed by the recent advancements in vision transformers (ViTs). The authors start with a standard MobileNetV2 and systematically optimize its key components, including channel configuration, data augmentation, dropout, optimizer, and activation functions.
The optimized model, named nnMobileNet, is shown to surpass ViT-based models across various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. Notably, nnMobileNet achieves this superior performance without requiring extensive pretraining on large-scale external datasets, unlike many ViT-based models.
The authors argue that CNNs are inherently more suited for RD tasks due to their ability to capture fine-grained local features and hierarchical information, which are crucial for detecting small and heterogeneous lesions. The visual interpretability analysis further confirms that nnMobileNet can accurately localize diabetic lesions, outperforming both ViT-based and other CNN-based methods.
The findings challenge the widely held belief that ViTs are superior to CNNs for medical imaging tasks and highlight the importance of carefully optimizing CNNs for improved performance. The authors recommend that future model development in RD tasks should consider the strengths of both CNNs and ViTs, with an emphasis on data characteristics and model fine-tuning.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Wenhui Zhu,P... at arxiv.org 04-11-2024
https://arxiv.org/pdf/2306.01289.pdfDeeper Inquiries