This is an old revision of the document!

Chapter 4

Section 4.1 (Interval Scheduling)

This section deals with the Interval Scheduling problem, and implements a greedy algorithm to solve it.

The interval scheduling problem deals with scheduling job requests. We want to schedule as many jobs as possible, but to do so we need to pick only jobs that do not overlap. Each given job has a beginning and end time.

The solution to this problem is deceptively simple: we start with the request that has the earliest end time possible. Then, we delete all requests that conflict with this accepted request. Then we choose the job that has the next earliest end time. We repeat this algorithm until there are not more requests to either delete or accept, and the accepted jobs are the optimal set. It is important to distinguish that this algorithm works because all jobs have the same value, so our only concern is taking on the highest number of jobs.

This section then goes on to prove that this algorithm is optimal. First, it proves by induction that no two accepted jobs overlap (f(i_r) =< (f(j_r)). Then, it proves by contradiction that the set the greedy algorithm returns is optimal by contradiction. With these two points proven, we can now say that the solution returned is correct and optimal. This algorithm runs in O(n log n).

This section also describes the Interval Partitioning problem, which is similar to the Interval Scheduling problem. This algorithm mostly operates under the same processes described above, except when two jobs overlap, the conflicting jobs are put onto different “resources” (think scheduling lectures in two different classrooms). Therefore, the optimal number of “resources” needed to schedule the jobs is equal to the maximum number of overlapping jobs at any one point in time.

Section 4.2 (Minimize Lateness)

For this problem, we have a single resource at our disposal (like in the Interval Scheduling problem). All jobs have deadlines instead of start and end times, and requires a certain amount of unbroken time to complete the job. Therefore, each job must be scheduled in non-overlapping intervals. We try to minimize the amount of time any job is submitted past its deadline (lateness).

To solve this problem with a greedy algorithm, we must simply sort the jobs in order of their deadlines. The intuition is to make sure jobs with earlier deadlines get completed sooner than jobs with later deadlines.

This section proves that the greedy algorithm is optimal by proving four essential components of the problem. First, we prove that there is an optimal schedule with no idle time. Next, we prove that all schedules with no inversions (a job with an earlier deadline being scheduled AFTER a job with a later one) and no idle time have the same maximum lateness. We prove this via a direct proof stating that the ordering of jobs with the same deadline does not determine lateness. Then, we prove that there is an OPTIMAL schedule with no inversions and no idle time. We do this by direct proof. To conclude the section, we prove that the schedule produced by our greedy algorithm has the optimal maximum lateness. This proof builds off the proofs we completed above: because we proved that there is an optimal schedule with no inversions, and that all schedules with no inversions have the same maximum lateness, the schedule our algorithm returns MUST be optimal.

Section 4.4 (Shortest Path)

This section uses greedy algorithms to solve the Shortest Path problem. We are given a starting node s, and a graph of nodes and vertices. The vertices of this graph include a value designating the cost of traversing that node. Our goal is to traverse from s to another given node t.

Edsger Dijkstra proposed a greedy solution to the Shortest Path problem in 1959. Basically, this algorithm finds the shortest path from s to each other node in the graph, not just t. Then, it is easy to determine the overall least-costly route from s to t.

The specific algorithm is as follows: first, we find the values of all edges connected to s. Then, we construct a priority queue of these nodes, organized from least costly path to most costly. Then, we explore all edges connected to the highest priority on the queue, all the while keeping track of the cost to get to each individual node. As we find less costly paths, we update the cost to get to that node. After all shortest paths have been found, we can easily return a list of the shortest path from s to any end node t.

By using a heap-based priority queue, this algorithm can run in O(m log n) time.

Section 4.5 (Minimum Spanning Tree)

This section deals with minimum spanning trees and the various algorithms that can produce one.

The first algorithm is called Kruskal's Algorithm. This algorithm starts with all of the edges and no nodes. Then, the algorithm adds the least costly edges first, unless adding an edge would create a cycle. In that case, the edge is not added and the next least-costly edge is considered. The graph returned is a minimum spanning tree.

The second algorithm is called Prim's Algorithm. This algorithm is similar to Dijkstra's algorithm, but slightly less complex. We begin with the completed graph, then explore the least costly edge coming off of that node. We add this node to a set of edges. Then we repeat this process, this time exploring the least costly edge connecting any explored node to an unexplored node. We repeat this process until there are no more explored nodes, and the returned set of edges will be a minimum spanning tree.

The final algorithm resembles Kruskal's Algorithm done in reverse. For this algorithm, we are given a complete graph and then delete the most expensive edge. We repeat this process, making sure each time that we do not completely disconnect any node from the graph. The remaining completed graph will be a minimum spanning tree.

The section then proves correctness and optimality of the algorithms, which can be found on pages 144-149.

The section concludes with the implementation and runtime of Kruskal and Prim's Algorithms. Using a heap-based priority queue to hold the nodes, we can get a running time of O(m logn). The extractMin and changeKey function work in O(logn) time, and each is performed a maximum of m times, yielding O(m logn).

Table of Contents

Chapter 4

Section 4.1 (Interval Scheduling)

Section 4.2 (Minimize Lateness)

Section 4.4 (Shortest Path)

Section 4.5 (Minimum Spanning Tree)

Section 4.6 (Union-Find Structure)

Section 4.7 (Clustering)