The paper establishes the strict positive definiteness of the Neural Tangent Kernel (NTK) for feedforward neural networks with any depth and a continuous, almost everywhere differentiable, non-polynomial activation function.
The key insights are:
For networks with biases (β ≠ 0), the NTK is shown to be strictly positive definite for any non-polynomial activation function and any depth L ≥ 2.
For networks without biases (β = 0), the NTK is shown to be strictly positive definite if the training inputs are all pairwise non-proportional, again for any non-polynomial activation function and any depth L ≥ 2.
The proofs rely on a novel characterization of polynomial functions, which states that if a continuous function satisfies a certain condition involving its finite differences, then it must be a polynomial.
This result on the positivity of the NTK is significant, as it is directly related to the memorization capacity and generalization performance of wide neural networks trained via gradient descent.
To Another Language
from source content
arxiv.org
Deeper Inquiries