Table of Contents
Chapter 4
4.1 Interval Scheduling
The motivation for interval scheduling is to schedule as many jobs as possible with no overlap between the intervals of time for each job. When designing a greedy algorithm for this problem, we base the algorithm on one simple rule; e.g. selecting jobs that start the earliest, or jobs with smallest interval times. In this problem, the simple rule that produces an optimal solution is selecting jobs that finish the earliest. The greedy algorithm looks like such:
R = set of all jobs, A is an empty set
while R != empty
choose a job R(i) that has smallest finishing time
add i to A
delete all jobs from R that are not compatible with i
return A
In this algorithm, we simply add the job i that ends as soon as possible, then remove all jobs that conflict with i. We then repeat this until there are no more jobs, resulting in a run time of O(n). We prove this algorithm returns a globally optimal solution using a Greedy Stays Ahead proof. Essentially, using a proof by contradiction, we declare an optimal set O that contains the most optimal scheduling of jobs compared to greedy's set A. If this were the case, O must have more jobs scheduled than A; however, if there was an additional optimal job, it would be in A, a contradiction. Thus, A must be as optimal as O.
A similar problem is the Interval Partitioning Problem, where we want to schedule all lectures in as few classrooms as possible. An important note about this problem is that the number of classrooms needed is at least the depth of the set of lecture intervals. Overall, the readability of this section was 7/10.
4.2 Scheduling to Minimize Lateness
The motivation for this problem is to schedule jobs with corresponding deadlines in a way such that there is minimal lateness. Jobs are allowed to be late (scheduled so that they will end past their deadline), but this must happen as little as possible. Again, we must design a greedy algorithm based on a simple rule; e.g. schedule longest jobs first. The best rule in this problem is earliest job first. Algorithm:
order jobs in order of their deadline
f = s
consider jobs i = 1, ..., n
assign job i to the time interval from s(i) = f to f(i) = f + ti
let f = f + ti
return set of scheduled intervals [s(i), f(i)] for i, ..., n
The schedule produced has no gaps in between jobs. On a different note, an inversion can occur in a schedule if there is a job i with deadline di that is scheduled before job j where dj < di. To prove that this greedy algorithm does not produce any inversions, we do an exchange proof, swapping inversions in an optimal schedule O until eventually we have a schedule that is identical to the one our greedy algorithm would have produced. Overall, the readability of this section is 6/10.
4.4 Shortest Paths in Graphs
The motivation is simple, to find the shortest path between two points s and t. The greedy algorithm for this solution is Dijkstra's algorithm. Within the algorithm, we use a set of explored nodes, an array to keep track of distances, and a priority queue to keep track of unexplored nodes in order to determine the shortest path between any given two nodes. The textbook's algorithm does not utilize a priority queue, however. As we discussed in class, the run time of a Dijkstra's algorithm that uses a priority queue would be O(nlogn). Is this run time more or less efficient than the run time of the algorithm within the textbook? Overall, the readability of this section is 7/10.
4.5 The Minimum Spanning Tree Problem
Motivation: We have a set of nodes and we want to build a connected graph with these nodes as cheaply as possible. The edges of this graph will have non-negative weights. There are three optimal algorithms that can solve this problem: Prim's Algorithm, Reverse-Delete Algorithm, and Kruskal's Algorithm. Prim's Algorithm is very similar to Dijkstra's; it starts with a root node then greedily grows a tree outwards adding nodes with the cheapest possible paths. Reverse-Delete Algorithm orders edges in decreasing value and deletes the next largest edge so long as it does not disconnect the graph. Kruskal's algorithm does almost the opposite of Reverse-Delete, ordering edges in increasing value and adding the next cheapest edge so long as it does not create a cycle. These algorithms are dependent on the proofs of the Cut Property and the Cycle Property. Prim's Algorithm runs in O(mlogn) time. Readability: 7/10.
4.6 Implementing Kruskal's Algorithm
In order to implement Kruskal's Algorithm, we require a new data structure altogether. This specialized data structure is call Union-Find. It allows for rapid searching and updating of a set of connected components within a graph (disjoint sets). It supports 3 operations: MakeUnionFind(A) - O(n), Find(u) - O(logn), and Union(A,B) - O(1). It can be implemented using pointers, one for each node within set S. With this data structure, we can implement Kruskal's algorithm in O(mlogn) time, since we must first sort the edges of graph by increasing value. Overall, the readability of this section is 7/10.
4.7 Clustering
Motivation: We wish to organize groups of data into coherent groups. One such way to categorize data, in the context of geographical points, is differentiating clusters by distance. This gives way to the Clusterings of Maximum Spacing problem. Given a set of U objects, we divide U into k groups. The spacing of a k-clustering is defined as the minimum distance between any pair of points lying in different clusters (this makes absolutely no sense to me). The algorithm is similar to the Minimum Spanning Tree algorithms, specifically Kruskal's. Readability: 5/10.
4.8 Huffman Codes
Our problem is to encode symbols (letters of the alphabet) into bits. However, there are two important stipulations we must meet. First, we want our encoding to be as concise as possible (minimize ABL). This means that frequently used letters like e, t, a, o, and i should be smaller in size than less frequently used letters. Second, there should be no ambiguity in our encoding. Encoded symbols should be unique and not confusable with other symbols. To meet these two requirements, we use Optimal Prefix Codes, where each prefix of an encoded letter is distinct and the size of each prefix corresponds to a letter's frequency. An algorithm that creates optimal prefix codes would utilize a binary tree. More specifically, we would take the two most frequent letters, make them leaf nodes in the tree, have their parent be their combined frequencies, then recursively repeat for the remaining letters. Huffman's Algorithm:
if S has two letters:
Encode one letter as 0 and the other letter as 1
else:
Let y* and z* be the two lowest-frequency letters
Form a new alphabet S’ by deleted y* and z* and replacing
them with a new letter w of freq fy* + fz*
Recursively construct a prefix code y’ for S’ with tree T’
Define a prefix code for S as follows:
Start with T’
Take the leaf labeled w and add two children below it
labeled y* and z*
The run time of this algorithm can be O(klogk) when it uses a priority queue, where k is the number of letters in the alphabet. The readability of this section is 6/10.
