The paper presents SpComm3D, a framework for enabling sparsity-aware communication and minimal memory footprint in distributed-memory sparse kernels. Existing 3D algorithms for sparse kernels like SDDMM and SpMM suffer from limited scalability due to reliance on bulk sparsity-agnostic communication, which leads to unnecessary bandwidth and memory consumption.
SpComm3D detaches the local computation from the communication, allowing flexibility in choosing the best accelerated version for computation. It performs sparse communication efficiently with minimal or no communication buffers to further reduce memory consumption. The framework provides several options for enabling true zero-copy communication in MPI.
The paper outlines the communication and memory inefficiencies of existing 2D and 3D algorithms for SDDMM and SpMM, and carefully defines the minimum required communication for correctness. It then utilizes the SpComm3D framework to build efficient sparsity-aware 3D algorithms for these kernels.
Experimental evaluations on up to 1800 processors demonstrate that SpComm3D has superior scalability and outperforms state-of-the-art sparsity-agnostic methods with up to 20x improvement in terms of communication, memory, and runtime of SDDMM and SpMM.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Nabil Abubak... kl. arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.19638.pdfDybere Forespørgsler