This is an old revision of the document!

Chapter 4

Chapter 4 Front Matter

Right off the bat, the authors state that defining a greedy algorithm is exceedingly difficult. A greedy algorithm is characterized by the way that it creates a solution to a problem, which is building up a solution in small steps which select a local maximum of a certain aspect of the inputs. Interestingly, this implies about the original problem that there is a rule that governs local decisions which helps make the optimal solution to the problem.

There are two main strategies for proving the sometimes surprising conclusion that the greedy algorithm produces an optimal solution. The first is referred to as greedy stays ahead. This strategy measures the greedy algorithm's progress step-by-step and proves that the greedy algorithm produces a better solution at every step than any other algorithm is able to. The other strategy to prove a greedy solution's optimality is the exchange argument. This begins with any solution to the problem and transforms it into the greedy solution all the while not making the solution less optimal.

This section was pretty easy to understand since it did not actually delve too deep into technical concepts or rely heavily on notation.

Section 4.1 Interval Scheduling: Greedy Stays Ahead

This section examines the Interval Scheduling Problem, which aims to find the largest subset of intervals that do not overlap one another in time. We must come upon a rule around which to design the greedy algorithm, and a few are suggested.

One possible rule is selecting the earliest starting times, but this will not work because the interval that begins first may end last, so none of the other intervals will be completed. In this example, as long as there are two or more other intervals that do not overlap, using the aforementioned rule will not yield an optimal solution.

Another rule that could be used is selecting the intervals in order of size – smallest to largest. This rule could lead us to select only one interval which, while small, overlaps two non-overlapping intervals and so it yields a subset of size 1 which is not the optimal solution.

A third solution is choosing intervals in order of fewest conflicts. This one will not work either because it does not take into account end times, which are important in anchoring this solution.

The rule that will work is selection of intervals based solely on which intervals end first. It can be proven that this solution is optimal by proving that the greedy algorithm will provide a solution which contains the same number of intervals as the optimal solution. The greedy algorithm will start by choosing the first interval to end, and then by choosing an interval that ends at the same time as the one in the same index position in the optimal set or earlier. The greedy algorithm stays ahead here by choosing the same interval as the one in the optimal set as its latest possible interval for that index position. Thus, the worst set that the greedy can pick is an optimal set and by definition it is also the best set the algorithm can pick. This means that the greedy algorithm always pick the optimal set.

It runs in O(n log n) time. The algorithm must sort the inputs based on ending time, an O(n log n) operation. It then creates an array in linear time and then selects the first interval and iterates through the rest of the intervals, selecting intervals whose start is greater than the previously selected interval's finish.

If we wanted to schedule all intervals using the fewest resources possible, the algorithm would only need a slight modification. This modification involves assigning a different label to each interval that overlaps and thus each label corresponds to a different “layer” or resource that the interval will occupy for the specified time.

Section 4.2 Scheduling to Minimize Lateness: an Exchange Argument

This problem gives a set of requests which all may begin at time s and must use the resource for a specific interval of time. The difference from the previous problem is that instead of a start and end time, these requests only come with a deadline before which they must be completed. These criteria allow us a few possible rules by which we may define the greedy algorithm.

We may order the jobs by increasing length. This will not work because it is insensitive to deadlines. So one short job with a long deadline will be scheduled before a long job with a short deadline which may lead to the second job being scheduled after the first and the second job being late while the first job has excess slack.

Maybe they should be scheduled in order of slack, or the difference between the deadline and the length of the interval. This seems better since it takes deadline into account, however it will still not work. The counter example to this solution is a set of intervals, one which takes one hour and is due after two hours and one which takes three hours and is due after three hours. The second interval mentioned would be scheduled first since it has no slack and the other interval would be two hours late. If the two intervals were switched, that would result in minimizing maximum lateness, since the second interval would only be one hour late.

The rule we should use is scheduling the intervals with the earliest deadlines first. The algorithm will schedule one task as soon as the previous one is completed, and so there will be no idle time. Additionally, two different schedules with neither inversions nor idle time will only differ only in the order which they schedule jobs with identical deadlines and jobs with the same deadlines are scheduled consecutively. This does not affect maximum lateness.

Since there is an optimal schedule that has no inversions and no idle time and all schedules with no inversions and no idle time have the same maximum lateness, the schedule produced by the greedy algorithm which will have no inversions and no idle time will be an optimal schedule.

This algorithm will run in O(n log n) time, since the costliest step in the algorithm will be sorting the intervals by deadline.

This was a dense section which took me a while to get through due to the level of technicality and specification of some of the proofs. It required a good deal of attention and focus at some points, including the proof for statement 4.9.

Section 4.5 The Minimum Spanning Tree Problem

A minimum spanning tree is a tree that includes the smallest subset of edges that connect all the edges in the original graph. Additionally, in order to determine the minimum spanning tree of a graph all the edges in the graph must be strictly greater than zero and of distinct lengths.

This section proposes three algorithms to solve the graph for its minimum spanning tree, all of which will return a correct MST. The first algorithm sketched out is Kruskal's Algorithm, which successively adds edges starting with the least costly and adding edges so long as they do not create a cycle with edges that have already been inserted.

The next algorithm outlined is Prim's Algorithm. This one is similar to Dijkstra's in that it starts at a specified node and add the shortest edge that leads out from the current node to a node not yet visited. This algorithm also can not create a cycle since that would involve adding an edge that leads from the current node to an already-explored node.

The last algorithm discussed in this section is the Reverse-Delete Algorithm. As its name suggests, this algorithm works by deleting edges in the reverse order that they were added in the previous algorithms: from most expensive to least. It will not delete and edge that should otherwise be deleted according to the cost ordering if that edge will create a disconnected graph and this algorithm will terminate when the only remaining edges would disconnect the graph if deleted.

Additionally, the cut and cycle properties are treated by this section. The cut property says that S is any subset of nodes and e is the minimum cost edge with exactly one endpoint in S. Then the MST must contain e. The cycle property on the other hand says that C is any cycle and f is the max cost edge in C. Then the MST can not contain f.

having the cut property, we can now say that Kruskal's and Prim's Algorithms produce minimum spanning trees of their graphs. Prim's algorithm works simply by starting at any node in the graph and applying the cut property, adding the cheapest edge in the cutset of explores set s then adding the newly explored node to s. Kruskal's employs both properties; each edge falls into one of the two cases. Once the edges are ordered from smallest to largest, they will either complete a cycle or not. If the current edge being examined does complete a cycle, then it will be discarded because of the cycle property. If it does not, then it will be added to the MST by the cut property.

The cycle property gives us insight into why the reverse delete algorithm must be true. When an edge e is deleted, it lies on a cycle c and since the edges are ordered and e is the first edge on the cycle that is encountered, then e must be the most costly edge on c and by the cycle property, it does not belong in a minimum spanning tree.

Prim's algorithm can be implemented so that it runs in O(m log n) time for m edges and n nodes. This is plain due to its similarities with Dijkstra's algorithm which runs in the same time. Basically, using a heap-based priority queue gives O(log n) time for ExtractMin and ChangeKey, both of which run inside a m-time loop yielding a total run time of O(m log n).

This section was quite dense to cut through due to the lengthy and notation-dependent proofs. The pictures helped somewhat but I did have to resort to class notes to demystify some of what was trying to be conveyed in the proofs.

Table of Contents

Chapter 4

Chapter 4 Front Matter

Section 4.1 Interval Scheduling: Greedy Stays Ahead

Section 4.2 Scheduling to Minimize Lateness: an Exchange Argument

Section 4.5 The Minimum Spanning Tree Problem