Khái niệm cốt lõi
We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into k stages and minimizing the running time of the bottleneck stage, including communication.
Tóm tắt
The authors address the problem of maximizing throughput for pipelined DNN inference by partitioning the computation graph into k stages. The key challenges are:
- Minimizing the communication overhead between stages, as the inter-stage bandwidth is typically much slower than intra-stage bandwidth.
- Balancing the running time across all stages to avoid a bottleneck that limits the overall throughput.
The authors formulate this as the Max-Throughput Partitioning Problem (MTPP), which is shown to be NP-hard. They propose:
- Novel mixed-integer programming (MIP) formulations to compute strong lower bounds on the optimal solution, including a "three-superblock" relaxation and a "guess the bottleneck block" approach.
- A fast and practical pipeline partitioning algorithm called SliceGraph that combines dynamic programming with a biased random-key genetic algorithm.
Extensive experiments on a diverse testbed of 369 production DNN models show that the MIP-based lower bounds are substantially stronger than standard combinatorial bounds. For example, for k=16 pipeline stages, the MIP formulations raised the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. This closes the optimality gap by a factor of 9.855x.
The authors demonstrate that SliceGraph is highly effective in practice, producing solutions that are on average 95.5% of the optimal for k≤16, while being orders of magnitude faster to compute than the MIP-based lower bounds.
Thống kê
The geometric mean of the best available lower bound from the MIP hierarchy, normalized by the best solution found using BKRGA, across the production dataset:
For k=2: 0.9901
For k=4: 0.9737
For k=8: 0.9588
For k=16: 0.9452
For k=32: 0.8749
For k=64: 0.7874
Trích dẫn
"Evaluated via geometric means across our production testbed with k = 16 pipeline stages, our MIP formulations raised the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. In other words, our improved lower bounds closed the optimality gap by a factor of 9.855x."