AcceleratedLiNGAM focuses on scaling causal discovery methods by parallelizing LiNGAM analysis on GPUs. It addresses the limitations of traditional methods and provides statistical guarantees for large-scale datasets. The paper discusses the implementation details, experimental results on gene expression data and stock indices, and the potential for further optimization.
Existing causal discovery methods are slow due to combinatorial optimization or search algorithms, hindering their application on large datasets. Recent approaches aim to address this limitation by formulating causal discovery as structure learning with continuous optimization but lack statistical guarantees. AcceleratedLiNGAM efficiently parallelizes existing methods, achieving up to a 32-fold speed-up compared to sequential implementations.
DirectLiNGAM recursively performs regression and conditional independence tests between pairs of variables to establish causal ordering in linear non-Gaussian acyclic models. The complexity of DirectLiNGAM is O(d3), where d is the number of variables. Parallelization allows for efficient computation of causal ordering sub-procedures in DirectLiNGAM using GPU kernels.
The paper extends LiNGAM analysis to gene expression data with genetic interventions and U.S. stock data using DirectLiNGAM and VarLiNGAM methods. It compares the performance of AcceleratedLiNGAM with other continuous optimization-based structure learning methods like DCD-FG on Perturb-CITE-seq datasets.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania