Concepts de base
The authors propose a cheap stochastic iterative method that solves the optimization problem on the random generalized Stiefel manifold without requiring expensive eigenvalue decompositions or matrix inversions. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian counterparts.
Résumé
The content discusses an optimization problem on the random generalized Stiefel manifold, which appears in many applications such as canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP).
The authors propose a stochastic iterative method, called the "landing" method, that solves the optimization problem without enforcing the constraint in every iteration exactly. Instead, the method produces iterations that converge to a critical point on the generalized Stiefel manifold defined in expectation.
The key highlights of the proposed method are:
- It has lower per-iteration cost compared to Riemannian methods, as it only requires matrix multiplications and does not need expensive eigenvalue decompositions or matrix inversions.
- It can handle stochastic estimates of the feasible set (the matrix B), while Riemannian approaches and infeasible optimization techniques are not well-suited for this setting.
- It has the same convergence rates as its Riemannian counterparts, both in the deterministic and stochastic cases.
- It is particularly efficient when the matrices are well-conditioned and the variance of the samples is small.
The authors provide a detailed theoretical analysis of the proposed landing method, proving its convergence to a critical point under suitable assumptions. They also demonstrate the effectiveness of the method on various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and GEVP.
Stats
The condition number of the matrix B is denoted as κ.
The variance of the stochastic gradient ∇fξ(X) is denoted as σ2
G.
The variance of the stochastic matrix Bζ is denoted as σ2
B.
The eigenvalues of the matrices A and B are denoted as αi and βi, respectively.