Conceitos essenciais
This work proposes the first learning-based algorithms that optimize both the locations and values of the non-zero entries in sketching matrices, leading to significant improvements in accuracy and efficiency over classical sketching techniques and previous learning-based approaches.
Resumo
The paper presents novel algorithms for learning sketching matrices that outperform classical sketching techniques and previous learning-based approaches. The key contributions are:
-
Greedy Search Algorithm:
- Iteratively constructs a sketching matrix by greedily optimizing the positions of the non-zero entries.
- Achieves good accuracy but has a slower training time.
-
Inner Product Algorithm for Low-Rank Approximation:
- Samples rows based on ridge leverage scores and assigns remaining rows to hash buckets.
- Optimizes the positions and values of the non-zero entries.
- Provably achieves better worst-case guarantees than classical sketching.
- Runs much faster than previous methods while maintaining similar accuracy.
-
Optimizing Subspace Embedding Property for Second-Order Optimization:
- Observes that the subspace embedding property is the key requirement for sketching in second-order optimization.
- Optimizes the sketch matrix to have fewer rows by focusing on rows with large leverage scores.
- Provably achieves quadratic improvement in the number of rows and exponential improvement in the failure probability compared to classical sketching.
- Learns the indices of heavy rows in practice, avoiding the need to compute leverage scores.
The paper also provides extensive empirical evaluations on real-world datasets, demonstrating significant improvements in accuracy and efficiency over classical and previous learning-based sketching techniques.
Estatísticas
The paper does not provide any specific numerical data or statistics to support the key claims. The results are presented in the form of tables and plots comparing the performance of different sketching algorithms.
Citações
"Clearly this is sub-optimal. Indeed, suppose the input matrix A is an n × d matrix with first d rows equal to the d×d identity matrix, and remaining rows equal to 0. A random sketching matrix S with a single non-zero per column is known to require m = Ω(d2) rows in order for S · A to preserve the rank of A [NN14]; this follows by a birthday paradox argument. On the other hand, it is clear that if S is a d × n matrix with first d rows equal to the identity matrix, then ∥S · Ax∥2 = ∥Ax∥2 for all vectors x, and so S preserves not only the rank of A but all important spectral properties."
"Lemma 6.2 implies that if the loss function over Atrain is small and the distribution of Atest is similar to Atrain, it is reasonable to expect that S is a good subspace embedding of Atest. Here we use the Frobenius norm rather than operator norm in the loss function because it will make the optimization problem easier to solve, and our empirical results also show that the performance of the Frobenius norm is better than that of the operator norm."