The paper investigates the generalization patterns of deep neural networks on out-of-distribution (OOD) data. It presents empirical evidence that contradicts the widely held belief that increasing the size of training data mixture can always improve the model's OOD generalization performance.
The key findings are:
For small distribution shifts, the generalization error decreases as the training data size increases, mirroring the performance on in-distribution data. However, for substantial distribution shifts, the generalization error may not decrease monotonically and can even remain high despite enlarging the training data.
The authors propose a novel definition of OOD data as those situated outside the convex hull of the training data mixture. They then establish new generalization error bounds that distinguish between in-distribution and OOD cases. The analysis of this new bound reveals the main factors influencing the non-decreasing OOD generalization trends.
The authors explore popular OOD techniques like data augmentation, pre-training, and algorithm tuning. They demonstrate that the effectiveness of these methods can be explained by their ability to expand the coverage of the training data mixture and its associated convex hull.
Inspired by the analysis of data diversity, the authors propose a novel data selection algorithm that selects samples with substantial differences to expand the training mixture. This algorithm outperforms random selection, especially for large training sizes.
Overall, the paper provides a deeper theoretical understanding of OOD generalization in deep learning and offers insights for designing more effective OOD techniques.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Songming Zha... at arxiv.org 04-24-2024
https://arxiv.org/pdf/2312.16243.pdfDeeper Inquiries