Core Concepts
Neural networks exhibit tempered overfitting behavior in noisy interpolation learning scenarios, with different loss functions leading to varying degrees of generalization.
Abstract
Abstract:
Neural networks show "tempered" overfitting behavior in noisy interpolation learning.
Rigorous analysis of regression with minimum norm focusing on univariate two-layer ReLU networks.
Introduction:
Overfitting can be catastrophic or benign, even in interpolation learning with noisy data.
Theoretical study limited to linear and kernel methods, prompting investigation into neural networks.
Minimum Norm Linear Prediction:
Extensively studied for ridgeless regression with noisy labels and sub-Gaussian features.
Tempered Overfitting Behavior:
Neural networks exhibit tempered overfitting, not approaching Bayes-optimal risk but avoiding catastrophic divergence.
Effect of Overfitting on Lp Risk:
For p < 2, overfitting is tempered with high probability but becomes catastrophic as p approaches 2.
Convergence vs. Expectation:
While the risk converges in probability for p < 2, its expectation remains infinite, highlighting the delicate nature of overfitting behavior.
I.I.D. Samples vs. Samples on a Grid:
Catastrophic overfitting for Lp losses with p ≥2 is influenced by fluctuations in spacing between training points.
Stats
Mallinar et al. [2022] conducted simulations with neural networks and observed "tempered" overfitting.
Quotes
"Noisy interpolation learning entails achieving zero training error with noisy data while allowing for good generalization."
"We provide the first rigorous analysis of the overfitting behavior of regression with minimum norm."