Chapter 4: Greedy Algorithms

Chapter 4: Greedy Algorithms

Introduction

This section opens up by discussion some of the merits of being ‘greedy’. It defines a greedy algorithm to be one that can build a solution using small steps, each step making a decision that will optimize the solution. Various greedy algorithms can be created for the same problem. The success of a greedy algorithm helps to imply that there exists a certain ‘rule’ that can be applied to the decisions at each step which one can use to create an optimal solution. While it is simple to actually create a greedy algorithm, proving they are the optimal solution is much more difficult. There are various approaches to creating a greedy algorithm, which will be discussed in other sections of this chapter. I would give this section a 10/10 readability as it was very short and a good introduction to what this section is talking about.

4.1: Interval Scheduling: The Greedy Algorithm Stays Ahead

This section goes over the Greedy Algorithm Stays Ahead approach, and demonstrates it using the Interval Scheduling problem. Overall I would give it a 8/10 for readability. This is exactly what we did in class, so this section was very much a review. The interval scheduling problem can be described as a set of n tasks where each task starts at s(i) and finishes at f(i). A subset of tasks are compatible if no two tasks in the subset overlap in time, and we are looking for the largest subset possible. Sets that are of the largest size possible are considered optimal.

To develop a greedy algorithm, we want to come up with a rule in selecting our first task. We then would reject all other tasks that are not compatible with the chosen task. Then, we’d look at the next compatible task, and also reject non- compatible tasks. This process is repeated until we have analyzed all tasks for compatibility. The difficult part in developing this greedy algorithm is not the process itself, however, it is choosing the rule that will allow us to have an optimal solution. To find this rule, we consider some of the most obvious methods of picking a task. One rule is to pick the earliest task (earliest start time). This does not yield an optimal solution as a task that starts early might have a long interval, meaning that there may be several conflicts. By starting with the task that takes the shortest interval of time we still might not get the optimal solution. We can find the optimal solution with the rule of accepting the task that finishes first. This allows us to maximize our subset of tasks.

While this method is optimal, it is not clear from looking at the problem that this is the optimal solution. Now that we have determined that it is optimal, we need to prove it. We need to consider an optimal set of intervals, that we can call O, that we need to show our algorithm “stays ahead” of. To do so, partial solutions of the greedy algorithm are compared to the corresponding segments of O to show that the greedy algorithm is “staying ahead”. We want to show that the length of our subset equals the length of the optimal solution’s subset. This can be proved by induction to show that the greedy algorithm stays ahead. To show why this implies the optimality of the subset we determine using the greedy algorithm, we can use a proof by contradiction. In this scenario, our algorithm would have a runtime of O(n log n). This is because we must initially sort the requests in order of their finish time, which takes O(n log n) time. It then takes O(n) time to construct the array and iterate through the intervals.

A similar problem is where we have several resources and want to schedule all tasks using as few resources as possible. This is also similar to what we exampled in class. To demonstrate this problem, we will use a task as a lecture that needs to be scheduled in a classroom for an interval of time. We can refer to the depth as the maximum number that occur at the same point in time. Therefore, the depth is equal to minimum number of resources needed. However, the depth isn’t always the answer to the number of resources needed. There may be other issues that cause the use of additional resources. However, we can design a greedy algorithm that can create a schedule. In this case, we use the start times of the intervals to organize them. We then assign each interval to a ‘label’ that hasn’t already been assigned to any previous interval that overlaps it.

4.2: Scheduling to Minimize Lateness: An Exchange Argument

We now look at greedy algorithms through the exchange argument. Here, we go back to the use of a single resource and a set of n tasks to use the resource for a certain interval of time. Now, these requests have deadlines, but still take a time period of t to complete. We want to schedule each task to minimize lateness. This problem seems much easier to think about a possible greedy solution for as it reminds me of in general scheduling things to be completed, like homework. Just as in the solution, when I do my homework, it is the most logical to complete items that are due first, rather than doing what takes the shortest amount of time or some other method.

A request is considered late if it misses its deadline. Here, we want to minimize the maximum lateness. As already mentioned, the optimal solution here is pretty intuitive, schedule based on earliest deadline, however, the proof isn’t so simple. First, we want to order jobs by their deadline. Then we want to assign the next job to start once we have completed the previous task. In this algorithm there are no gaps in time. This means that there is an optimal schedule with no idle time. Considering an optimal schedule, O, we want to modify O at each step, eventually transforming it into a schedule that is identical to our result from the greedy algorithm. This is an exchange argument. An inversion is if a job is scheduled before a job with an earlier deadline. By definition our greedy algorithm has no inversions. We can then say that all schedules with no inversions and no idle time have the same maximum lateness, and the same about the optimal solution. We can prove that both of these scenarios exist, and so the schedule obtained by the greedy algorithm is optimal.

This problem can also be applied to scenarios where we need to consider each task’s start time as well. Readability was 7/10.

4.4: Shortest Paths in a Graph

Greedy algorithms can also be applied to graphs in finding the shortest paths and constructing minimum-cost spanning trees. Since graphs are used to model networks, we often want to find the shortest path between nodes in a graph in the most efficient manner. Given a starting node s, what is the shortest path from s to another node? Edsger Dijkstra used a simple greedy algorithm to solve this problem. He started by finding an algorithm that finds the length of the shortest path from s to each other node, which then allowed him to produce the paths as well. This algorithm maintains a set of verticies that have been determined the shortest-path distance from s, the “explored” nodes. Then, he determined the shortest path by traveling along a path from the explored region to another node. He then chose the node for which the quantity of this next path is minimized, and then added that node to the explored region as well. We can prove that this algorithm both finds the shortest path from s to each node, and that it will include all nodes when it terminates through induction.

For Dijkstra’s algorithm to work, the edges must have positive lengths. It can also be observed that it is a “continuous version” of breadth-first search. Now, we look at the runtime of this algorithm. We know that it will execute n-1 times for a graph with n nodes. To find the path to the next node we need to check each edge, which means this would take O(m) time, and for each node O(mn) time. The use of additional data structures, however, can improve this runtime. If we use a priority queue to hold the nodes and then remove them when they are explored, it can reduce this runtime to O(m) time. Implementing the heap-based priority queue takes O(log n) time, so the overall time for this algorithm would take O(m log n) time.

4.5: The Minimum Spanning Problem

This section applies an exchange argument to solve the Minimum Spanning Problem. The minimum spanning problem occurs when there is a set of locations that can be represented through nodes and we want to create a network on top of those locations. The network should be connected, having a path between each location, with the goal of creating the “cheapest” network possible. Pairs can be linked together for a specified, positive cost. The links can be represented though edges on the graph. To create the “cheapest” network, we want to keep c as small as possible. Here, we can assume that the graph is fully connected.

We know that the minimum-cost solution to this problem can be turned into a tree, (V, T), where V are the locations and T is the solution to the problem. (V, T) must be connected by definition, and we need to show that it does not contain a cycle. To prove that there are no cycles, we use a proof by contradiction. Assuming there is some edge in a proposed cycle, the tree without that edge is still connected as any path with that edge can now find a longer path around the rest of the cycle instead. This also in turn solves the problem and is less expensive, which gives us a contradiction. A subset of the tree can be called a spanning tree of the graph.

There are three greedy algorithms which creates a minimum spanning tree. The first of with is Kruskal's Algorithm, which starts without edges, building the spanning tree by inserting the edges in order of increasing cost. Edges are only inserted ass long as they do not create a cycle. The next algorithm is similar to Dijkstra's Algorithm for paths. Starting with root node, s, at each step add the node that can be added as cheaply as possible to the tree. This is known as Prim's Algorithm. The third algorithm starts with the graph and deletes edges in order of decreasing cost as long as we do not disconnect the current graph. This is the Reverse-Delete Algorithm.

It is considered “safe” to include an edge in the minimum spanning tree when an edge has one end in the subset of nodes that is not empty nor equal to all of V (S), and the other in V-S. This is considered the cut property. It does seem slightly confusing that this is considered the “cut” property, as one would think the “cut” property would occur when you want to get rid of an edge (cut it). The cut property helps us prove the optimality of Kruskal's and Prim's Algorithm. Both algorithms only include an edge when it is justified by the Cut Property. Both algorithms produce a minimum spanning tree of the graph.The cycle property helps us to guarantee an edge is not in the Minimum Spanning Tree. The cycle property is as follows: if C is a cycle in G, and e is the most expensive edge belonging to C, then e does not belong to any minimum spanning tree of G. This property helps us prove the optimality of the reverse-delete algorithm. Using these two properties, we can say that any algorithm that builds a spanning tree by repeatedly including edges when justified by the Cut Property and deleting edges when justified by the Cycle Property.

We can implement both Kruskal's and Prim's Algorithms in O(m log n) time.

Overall this section was difficult to follow for me. I had a difficult time understanding some of this during class, and the textbook readings did not help very much. Overall readability was about a 4 out of 10.

4.6: Implementing Kruskal's Algorithm: The Union-Find Data Structure

The Union-Find data-structure will store a representation of connected components that allows us to quickly and efficiently search and update these components. This turns out to be the same data structure in Kruskal's Algorithm. This data structure allows us to maintain disjoint sets. Find(node) allows us to find the set containing that node. Union(nodeA, nodeB) takes to sets and merges them into a single set. To add an edge to a graph, first we must test that they're in the same connected component. If they are not, then we perform a union on them to combine them into one. This only works for addition, not deletion.

To create this structure, we create an array Component, and a set with n elements. To create the union-find, we initialize the Component array to the set. We can optimize this set by choosing the name for the union to be the name of one of the sets. Furthermore, we can optimize by maintaining the union as a smaller set for any large sets. This gives the Union operation O(n) time. We can improve the structure for union-find by using pointers. The pointer implements Union in O(1) time, but Find is no longer constant, rather, takes O(log n) time.

We can implement Kruskal's Algorithm by using the union-find data structure, which runs in O(m log n ) time.

4.7: Clustering

This section discusses clustering. Clustering can be used whenever one is attempting to organize a collection into various groups. There are various ways to group a collection. One way is by distance, such as with points on a plane. We can use minimum spanning trees to play a part in clusterings of maximum spacing. If we want to divide objects into k groups, we will have a k-clustering as a partition of a union into k nonempty sets. The spacing of a k-clustering is the minimum distance between any pair of points in different clusters.

To find a clustering of maximum spacing, we want to cluster points as rapidly as possible. We grow a graph edge by edge, with connected components corresponding to clusters. This process will never create a cycle, but rather a union of trees. This procedure is extremely similar to Kruskal's Minimum Spanning Tree Algorithm.

4.8: Huffman Codes and Data Compression

This chapter discusses solving problems using data compression. Here, the problem is encoding symbols using bits. To convert various “alphabets” into language computers understand, we use bits. Typically, we use a system of a fixed number of bits per letter, which for the English alphabet (and a few symbols) gives of 32 symbols, which means we use 5 bits per symbol. Taking Computer Organization really helps in understanding how binary and bits work, so this is a somewhat familiar topic for me. Although we need 5 bits per symbol, what we really need is an average of 5 bits per symbol, as some symbols will be used more often than others. It is a waste to use so many bits on letters that are used very frequently. In fact, it makes a lot of sense to use less bits on letters used more frequently and more on those used less frequently. This is where data compression comes into play. The big question is how to find the most optimal way of compressing the data.

One example of how data has been compressed is Morse code. Morse code translates each symbol into dots or dashes. However, Morse code removes ambiguity between letters that are represented similarly by adding a space in between each letter. If we were to turn this into bits, the ambiguity would remain as certain encodings are prefixes to others. A method must be built to eliminate the prefix problem. The method could work as follows: Start reading bits from left to right, once you come across enough bits to match an encoding this becomes a letter, continue with the next bit and repeat.

Now, we must account for letter frequencies to make the most optimal encoding. We can use a greedy algorithm to find the optimal prefix code. First, we can build a tree to help us build each encoding. We will create a binary tree where each time we go from a node to its left child we add a 0 and every time we go from a node to its right child we add a 1. By using a tree, we can see that the length of each symbol in bits is its corresponding layer in the tree. We want to fill the binary tree to make it optimal.

The first thought of building is tree is to attempt to build it top down. However, if we knew the optimal prefix code, this would help a great amount. We could then simply fill in the nodes of the tree. We would start with the highest frequency letters and move down through the tree, labeling nodes in order of decreasing frequency. This is essentially the basis of Huffman's Algorithm, which gives us a Huffman code.

Huffman's Algorithm runs in polynomial time k, which is the number of letters in the alphabet, which recursively calls a sequence of k-1 iterations, which overall gives a O(k^2) runtime. However, if a priority queue is used to implement this algorithm, the run time can be dropped down to a nice O(k log k) runtime.

Table of Contents