Core Concepts
This paper proposes a randomized algorithm to efficiently compute the generalized singular value decomposition (GSVD) of two data matrices, with applications in comparative analysis of genome-scale expression data sets.
Abstract
The paper presents a randomized algorithm for computing the GSVD of two data matrices, G1 and G2, which is a valuable tool for comparative analysis of genome-scale expression data sets. The key highlights are:
The algorithm first uses a randomized method to approximately extract the column bases of G1 and G2, reducing the overall computational cost.
It then calculates the generalized singular values (GSVs) of the compressed matrix pair, which are used to quantify the similarities and dissimilarities between the two data sets.
The accuracy of the basis extraction and the comparative analysis quantities (angular distances, generalized fractions of eigenexpression, and generalized normalized Shannon entropy) are rigorously analyzed.
The proposed algorithm is applied to both synthetic data sets and practical genome-scale expression data sets, showing significant speedups compared to other GSVD algorithms while maintaining sufficient accuracy for comparative analysis tasks.
Stats
The paper reports the runtime and absolute errors of the generalized singular values for various synthetic data set sizes.
Quotes
"The randomized algorithm for basis extraction aims to find an orthonormal basis sets for U and V in eq. (1.1) with non-zero GSVs αi and βj, respectively."
"The approximation accuracy of the basis extraction is analyzed in theorem 3.5 and the accuracy mainly depends on the decay property of the GSVs."