The paper studies the problem of efficiently scheduling a computational DAG on multiple processors. In contrast to previous works that have focused on relatively simple models, the authors analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures.
The authors extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. They then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP).
The authors combine these algorithms into a single framework and conduct experiments on a diverse set of real-world computational DAGs. The results show that the resulting scheduler significantly outperforms both academic and practical baselines. Even without NUMA effects, the scheduler finds solutions with 24%-44% smaller cost on average than the baselines. In case of NUMA effects, it achieves up to a factor 2.5x improvement compared to the baselines. The authors also develop a multilevel scheduling algorithm, which provides up to almost a factor 5x improvement in the special case when the problem is dominated by very high communication costs.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Pál ... kl. arxiv.org 04-24-2024
https://arxiv.org/pdf/2404.15246.pdfDybere Forespørgsler