The paper analyzes the Empirical Risk Minimization (ERM) algorithm for PAC learning in the agnostic setting, where the target function may not be in the hypothesis class. The key insights are:
The authors derive an improved distribution-dependent error exponent for the PAC error probability under some stability assumptions on the hypothesis class and the target function.
They show that under these assumptions, the error exponent for agnostic learning can be the same as the error exponent for realizable learning (when the target function is in the hypothesis class) for small enough deviations from the optimal risk.
The error exponent analysis is done by decomposing the PAC error probability into two terms - the error incurred in realizable learning and the additional error incurred in agnostic learning.
The authors explicitly construct the distributions needed to compute the improved agnostic error exponent, which is shown to be the KL divergence between the true distribution and a set of distributions for which the ERM outputs a suboptimal hypothesis.
The improved error exponent is better than the classical agnostic bound, being linear in the deviation from the optimal risk instead of quadratic.
The results open up new directions for research, such as finding explicit conditions for practical hypothesis classes like neural networks that satisfy the assumptions.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Adi Hendel,M... at arxiv.org 05-03-2024
https://arxiv.org/pdf/2405.00792.pdfDeeper Inquiries