The paper studies the problem of efficiently scheduling a computational DAG on multiple processors. In contrast to previous works that have focused on relatively simple models, the authors analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures.
The authors extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. They then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP).
The authors combine these algorithms into a single framework and conduct experiments on a diverse set of real-world computational DAGs. The results show that the resulting scheduler significantly outperforms both academic and practical baselines. Even without NUMA effects, the scheduler finds solutions with 24%-44% smaller cost on average than the baselines. In case of NUMA effects, it achieves up to a factor 2.5x improvement compared to the baselines. The authors also develop a multilevel scheduling algorithm, which provides up to almost a factor 5x improvement in the special case when the problem is dominated by very high communication costs.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania