The paper addresses the challenges of processing massive graphs using the push-relabel algorithm on GPUs. The authors identify two key issues: 1) the significant memory consumption of representing the residual graph, and 2) the inherent workload imbalance in the traditional parallel push-relabel algorithm.
To address the memory consumption challenge, the authors propose two enhanced CSR data structures - Reversed CSR (RCSR) and Bidirectional CSR (BCSR). RCSR reduces the memory complexity from O(V^2) to O(V+E) by storing backward edges separately. BCSR further improves memory access efficiency by aggregating incoming and outgoing neighbors.
To tackle the workload imbalance, the authors introduce a novel vertex-centric approach with two-level parallelism. First, all threads are used to scan the vertices and add active ones to an Active Vertex Queue (AVQ). Then, a tile of threads is assigned to process each active vertex, allowing them to parallelize the search for the minimum-height neighbor vertex.
The authors evaluate their WBPR algorithm on both maximum flow and bipartite matching tasks using real-world and synthetic graphs. Compared to the state-of-the-art, WBPR achieves up to 7.31x and 2.29x speedups on maximum flow and bipartite matching, respectively, by effectively reducing memory consumption and balancing the workload on the GPU.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Chou-Ying Hs... a las arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.00270.pdfConsultas más profundas