The Job Scheduling Problem

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

The job scheduling problem is a canonical example for the use of a greedy algorithm.

The Problem

Assume there is a one shared resource (a CPU) and there are n jobs that must be processed bu the shared resource. Each job has two known parameters:

  • a weight wj or priority, which qualifies its "importance". The jobs with higher weight deserve to be processed before the jobs with lower weight.
  • a length ℓj, which codifies the processing time.

The completion time of a job j Cj is defined as:

Cj = Wj + ℓj

where Wj is the wait time, or how much the job j had to wait before being scheduled. The wait time is given by the sum of the lengths for all jobs scheduled before j.

The question we need to resolve algorithmically is in what order we should sequence the job to maximize the objective function defined by the weighted sum of completion times:

    n
min ∑ wjCj
   j=1

The Greedy Algorithm

The process through which to come up with a greedy algorithm first involves looking at a special case of the problem, where is reasonably intuitive what should be the optimal thing to do.

We could assume that all jobs have the same length and different weights. The objective function will be ℓw1 + 2ℓw2 + ... nℓwn = ℓ(w1 + 2w2 + ... nwn) so the obvious way to minimize it would be to scheduled the job with the biggest weight first, followed by the job with the second biggest weight and so on.

Another special case is a sequence in which all jobs have the same weight, but different lengths. The objective function is ℓ1w + (ℓ1+ℓ2)w + ... + (ℓ1+ℓ2+...+ℓn)w =(nℓ1+(n-1)ℓ2 + ... + ℓn). To minimize the function, we should schedule the shortest job first, because the execution time will be reflected in the completion time of all subsequent jobs, then the second shortest job and the longest job last.

We then move beyond special case towards the general case. The analysis of the particular cases indicates that we should favor jobs with the biggest weight and jobs with the shortest length. An obvious way to aggregate the weight and a length in a single score that gives an indication on possible priority in scheduling is:

 wj
────
 ℓj

It is obvious that larger the score, the highest priority in scheduling should be.

The weakness of greedy algorithm design, however, is that is really easy to come up with alternative "obvious" scores that might do the wrong thing. It is not obvious that a greedy algorithm that schedules based on the above score is correct - always minimizes the objective function -, as opposite to an algorithm that uses wj-ℓj as score, for example. In this case, the ratio score happens to be the correct way to go, as demonstrated by the correctness proof below.

To summarize:


A valid greedy algorithm to schedule jobs characterized by their weight w and length ℓ in such a way to minimize the objective function defined by the weighted sum of completion times is to compute the weight per length ratio and schedule the jobs with the highest scores first.

Correctness Proof

Proof by exchange argument:

https://www.coursera.org/learn/algorithms-greedy/lecture/Jo6gK/a-greedy-algorithm
https://www.coursera.org/learn/algorithms-greedy/lecture/mrXwO/correctness-proof-part-ii
https://www.coursera.org/learn/algorithms-greedy/lecture/YuoAV/handling-ties-advanced-optional