Core Concepts
The early period of training significantly impacts out-of-distribution generalization in neural networks.
Abstract
The content explores how the early period of training affects out-of-distribution (OOD) generalization in neural networks. It delves into the impact of gradual unfreezing on OOD performance, the relationship between learning dynamics and OOD generalization, and the optimal time to remove interventions for better OOD results. The study includes empirical experiments with various datasets and model architectures to validate findings.
Abstract:
- Differences in early training affect in-distribution tasks significantly.
- Neural networks are sensitive to out-of-distribution data.
- Investigating learning dynamics and OOD generalization during early training.
Introduction:
- Modifications to optimization processes shape early training periods.
- Limited work on how early training impacts OOD generalization.
Data Extraction:
- "selecting the number of trainable parameters at different times during training has a minuscule impact on ID results but greatly affects generalization to OOD data."
- "the trace of Fisher Information and sharpness may be used as indicators for the removal of interventions during the early period of training for better OOD generalization."
Stats
"selecting the number of trainable parameters at different times during training has a minuscule impact on ID results but greatly affects generalization to OOD data."
"the trace of Fisher Information and sharpness may be used as indicators for the removal of interventions during the early period of training for better OOD generalization."