The paper establishes a connection between Tensor Network (TN)-constrained kernel machines and Gaussian Processes (GPs). It proves that the outputs of Canonical Polyadic Decomposition (CPD) and Tensor Train (TT)-constrained kernel machines converge to a fully characterized GP when placing i.i.d. priors over their parameters.
The key insights are:
CPD and TT-constrained kernel machines converge to a GP in the limit of large TN ranks. The target GP is fully characterized by the basis functions and prior covariances.
TT-constrained models converge to the GP at a polynomially faster rate compared to CPD models when dealing with higher-dimensional inputs, for the same number of model parameters. This is due to the hierarchical structure of TT.
The priors specified in the theorems provide a sensible initialization strategy for gradient-based optimization of TN-constrained kernel machines, which addresses the vanishing/exploding gradients issue.
In the finite rank regime, both CPD and TT models can craft nonlinear features from the provided basis functions, potentially learning more latent patterns in the data compared to the fixed GP representation.
The authors empirically demonstrate the GP convergence and GP behavior of these models through numerical experiments on regression tasks.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Frederiek We... at arxiv.org 03-29-2024
https://arxiv.org/pdf/2403.19500.pdfDeeper Inquiries