Core Concepts
Understanding the cost of overfitting in kernel ridge regression from an agnostic perspective.
Abstract
The content delves into the cost of overfitting in noisy kernel ridge regression, analyzing benign, tempered, and catastrophic overfitting. It discusses the impact of overfitting on generalization performance and provides insights into different regimes of overfitting. The analysis is based on Gaussian universality ansatz and closed-form risk estimates. Various examples are provided to illustrate different scenarios of benign, tempered, and catastrophic overfitting.
Introduction:
Large neural networks' ability to generalize challenged understanding of overfitting.
Progress in understanding overfitting in kernel methods.
Problem Formulation:
Bi-criterion optimization in kernel ridge regression.
Mercer's Decomposition:
Understanding kernel decomposition using Mercer's theorem.
Closed-Form Risk Estimate:
Theoretical works providing closed-form equations for estimating test risk.
Cost of Overfitting:
Defining and analyzing the cost of overfitting under different scenarios.
Benign Overfitting:
Conditions for benign overfitting based on effective rank analysis.
Tempered Overfitting:
Analysis and estimation of tempered overfitting based on effective rank ratios.
Catastrophic Overfitting:
Discussion and conditions leading to catastrophic overfitting.
Application: Inner-Product Kernels:
Application to inner-product kernels in the polynomial regime with detailed analysis and results.
For detailed proofs and further insights, refer to the full content above.
Stats
We study the cost of overfitting in noisy kernel ridge regression (KRR).
The test error can approach Bayes optimality even when models interpolate noisy training data.
Effective ranks play a crucial role in characterizing the cost of overfitting across different settings.
Quotes
"The ability of large neural networks to generalize has significantly challenged our understanding."
"We take an 'agnostic' view by considering costs as a function of sample size for any target function."
"Our analysis provides a refined characterization of benign, tempered, and catastrophic overfittings."