The paper addresses the challenges of processing massive graphs using the push-relabel algorithm on GPUs. The authors identify two key issues: 1) the significant memory consumption of representing the residual graph, and 2) the inherent workload imbalance in the traditional parallel push-relabel algorithm.
To address the memory consumption challenge, the authors propose two enhanced CSR data structures - Reversed CSR (RCSR) and Bidirectional CSR (BCSR). RCSR reduces the memory complexity from O(V^2) to O(V+E) by storing backward edges separately. BCSR further improves memory access efficiency by aggregating incoming and outgoing neighbors.
To tackle the workload imbalance, the authors introduce a novel vertex-centric approach with two-level parallelism. First, all threads are used to scan the vertices and add active ones to an Active Vertex Queue (AVQ). Then, a tile of threads is assigned to process each active vertex, allowing them to parallelize the search for the minimum-height neighbor vertex.
The authors evaluate their WBPR algorithm on both maximum flow and bipartite matching tasks using real-world and synthetic graphs. Compared to the state-of-the-art, WBPR achieves up to 7.31x and 2.29x speedups on maximum flow and bipartite matching, respectively, by effectively reducing memory consumption and balancing the workload on the GPU.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы