Core Concepts
The optimal scheduling policy depends on the scaling behavior of the system. When the system has more spare capacity, deferring parallelizable work is more important. When the system is heavily loaded, prioritizing short jobs is more important.
Abstract
The paper considers a system with k homogeneous servers and ℓ classes of parallelizable jobs. Each job class i has an associated job size distribution Si and parallelizability level ci, where ci represents the maximum number of servers the job can utilize.
The key insights are:
When all job classes have the same exponential size distribution, the Least-Parallelizable-First (LPF) policy, which prioritizes jobs from the least parallelizable classes, is optimal for minimizing mean response time.
In the conventional heavy-traffic regime as ρ→1, the Shortest-Expected-Remaining-Processing-Time (SERPT) policy, which prioritizes jobs with the shortest expected remaining processing time, is asymptotically optimal.
In lighter-load scaling regimes (Sub-Halfin-Whitt), LPF is asymptotically optimal because deferring parallelizable work is more important than prioritizing short jobs when there is a high probability of having idle servers.
In heavier-load scaling regimes (Super-NDS), SERPT is asymptotically optimal because minimizing queueing time becomes the dominant concern when the probability of having idle servers vanishes.
The paper also discusses practical considerations, such as how to schedule when the scaling behavior is unknown and the challenges of scheduling with non-exponential job size distributions.