Harun, M.Y., Lee, K., Gallardo, J., Krishnan, G., & Kanan, C. (2024). What Variables Affect Out-of-Distribution Generalization in Pretrained Models? Advances in Neural Information Processing Systems, 38.
This research investigates the factors influencing the "tunnel effect" in pretrained deep neural networks, a phenomenon where deeper layers hinder out-of-distribution (OOD) generalization, challenging the assumption that these layers are universally transferable.
The authors conduct extensive experiments using linear probes to analyze the impact of various factors on OOD generalization across different deep neural network architectures. These factors include image resolution, data augmentation, the number of classes and samples in the training dataset, DNN architecture (CNN vs. ViT), depth, over-parameterization level, stem size, and spatial reduction. They use three metrics to measure the tunnel effect's strength: % OOD performance retained, Pearson correlation between ID and OOD accuracy curves, and ID/OOD alignment. Statistical analysis includes paired Wilcoxon signed-rank tests and SHAP (SHapley Additive exPlanations) to determine the contribution of each variable to OOD generalization.
The tunnel effect is not a universal phenomenon and is heavily influenced by the diversity of the training data. Increasing this diversity, especially through higher resolution images, augmentations, and more classes, can mitigate the tunnel effect and improve OOD generalization. These findings challenge previous assumptions about the universality of features learned in deeper layers and highlight the importance of using diverse datasets for training robust and generalizable deep learning models.
This research provides valuable insights into the factors influencing OOD generalization in deep learning, particularly by challenging the universality of the tunnel effect. It emphasizes the need to move beyond toy datasets like CIFAR and utilize more diverse, higher-resolution datasets for training and evaluating deep learning models to ensure their robustness and generalizability to real-world scenarios.
Future research should focus on developing theoretical frameworks to explain the tunnel effect and investigate its presence in non-vision, multi-modal, and biased datasets. Further exploration of SSL backbones and the development of techniques to mitigate tunnel formation in continual learning are also promising avenues for future work.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Md Yousuf Ha... klo arxiv.org 10-28-2024
https://arxiv.org/pdf/2405.15018.pdfSyvällisempiä Kysymyksiä