Table of Contents

Chapter 4: Greedy Algorithms

Section 1: Interval Scheduling: The Greedy Algorithm Stays Ahead

Sometimes greed works. An algorithm is greed if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion. When algorithms work to give an optimal or almost optimal solution, it says something about the problem's underlying structure. Problem: how to find cases where greedy works and prove that it works. First way: greedy stays ahead. Second way: exchange argument. Uses: shortest paths in a graph, minimum spanning tree, Huffman codes.

Interval Scheduling: start time s(i), end time f(i) for an event i. A subset is compatible of no events overlap. Compatible sets of max size are optimal. Idea for using greedy with this: select a first request i1. Rule out anything that's not compatible with i1 and do it again. What rule should you use to pick one? Seems reasonable, but doesn't work: pick earliest start time first, pick smallest interval first, pick the one with the fewest “conflicts” (–> none of those work … book explains why and we did in class, too). Actually pick the one that finishes first (book formalizes this). Analysis: if you have an “optimal” solution, call the first event in it O1, then O2, etc. through Om. Let the greedy algorithm's solution be A1 through Ak. Prove that k=m. Compare the start/end times of O1 to A1. A1 has to finish before or at the same time as O1. This is going to be true for every set of corresponding A's and O's. Book goes into a more detailed explanation, but basically it works and Greedy's solution is, at every point, ahead of any other solution (i.e. there have been the same number of events and it's “earlier”). Running time: if you sort the events in order of finishing time, you can get it to run in O(nlogn) time. Variants/Extensions: what if you don't know all of the events that you're trying to schedule? And what if they're weighted (i.e. optimal soln may not be to have as many as possible but to get as much weight as possible (i.e. they're paying for the use of a space). Another problem: Interval Partitioning Problem - trying to get all of the events scheduled in the least “depth” (ex. scheduling lectures in as few classrooms as possible). Book goes into an algorithm to solve this (a greedy algorithm) and shows that the depth is at most equal to the number of events that have overlapping times.

The arguments about how the problem can be solved locally are really interesting. It seems like such a difficult problem, and the idea that it can be solved without looking at the entire situation, but only one part is realllllly fascinating.

This section was a little long, but it was very readable and presented really interesting problems that seem like something that could/would come up in the real world, which made it more fun to read.

Section 2: Scheduling to Minimize Lateness: An Exchange Argument

Similar problem: Requests with a deadline di and a contiguous time interval ti. Schedule the requests one after the other to minimize lateness (i.e. the how long after the deadline the request completes). We order them by Earliest Deadline First (book has an algorithm, but it's pretty straightforward). Analysis: 1. There's an optimal solution with no idle time. If there's idle time, you could just slide the jobs forward and you wouldn't make anything any more late. “A schedule A' has an inversion if a job i with deadline di is scheduled before another job j with an earlier deadline dj<di.” If there are multiple jobs with the same deadline, there can be different schedules w/ the same lateness (proof p. 128). “There's an optimal solution with no inversions and no idle time” (proof p.129). If we swap anything with an inversion, we can only decrease the overall lateness (or leave it alone). Then the optimality of this algorithm “immediately follows” (they give a short explanation).

I've convinced myself that given the algorithm that orders events by the earliest finish time, this algorithm that orders jobs by the earliest deadline “makes sense.” They're definitely really similar problems, and it's interesting that their solutions are also sort of similar.

Pretty easy to read. I think the pictures in this section will be really helpful to review later (it's not as much one of the ones where you need to see the diagrams “move” in order to get it).

Section 3: Optimal Caching: A More Complex Exchange Argument

The problem: you can have a small amount of data that requires very little time to access and a large amount of data that requires more time to access; how do you decide what pieces of data to have in the quick-access? (Memory Hierarchy). This is a really common problem with computers. Caching is the process of storing a small amount of data in fast memory to avoid having to access slow memory as often. Cache maintenance determines what to keep in the cache and what to evict from the cache when new stuff needs to be brought in. In real life, cache maintenance has to work without knowledge of the future, but you can get knowledge from solving the problem with that knowledge. If you know the future, then the Farthest-in-Future Algorithm solves the problem with the fewest “misses” (a miss is when something you need is not in cache and you have to “go get it”). This algorithm looks at what's coming up and evicts the piece of data that's not needed until the farthest in the future. The book goes into other algorithms and shows that a subtle switch of “decisions” about which items to evict sometimes doesn't make a difference to the optimality of the solution (this is the exchange argument here). All of these algorithms only go get a piece of data when it's needed (this is called “reduced”). There are algorithms that get things before they're needed (nonreduced). The book proves that there's always a reduced schedule that's just as good as a nonreduced schedule. Using this, the book proves that Farthest-in-the-Future is optimal. Extension to real life: since you can't actually know the future in real situations, this algorithm won't work. It turns out that Least-Recently-Used seems to work pretty well.

Least-Recently-Used is really non-intuitive for type of examples given in the book (i.e. lists of a, b, c, etc.) but it's interesting that it works so well in real situations!

This is (I think) the first time I've read a section before we've discussed it in class. It definitely took more concentration to understand the subtleties of the proofs, but it was still really easy to read, though.

Section 4: Shortest Paths in a Graph

Greedy algorithms can be applied to graphs. This section is about finding the shortest paths in graphs. Shortest paths are useful to know because graphs are often used to model travel networks or communication networks so it makes sense to ask what the shortest way to get from one point to another is. Dijkstra's Algorithm tells you the shortest path from some node s to every other node: keep track of an “explored part” S, which initially contains just s. The distance to s d(s) is 0. Determine the shortest path that can be taken out of S, and add the node that this path connects to to S, update its distance and continue. The book formally presents the algorithm and shows diagrams (class note diagrams are easier to look at). Then they go on to prove that the algorithm works (they do it by induction that everything in S has a shortest path and when you add the last node to S, it must be right). Using a priority queue, you can get this algorithm to run in O(mlogn) time.

On p. 140, they talk about how Dijkstra's algorithm is a “continuous” version of BFS and that it's like water waves traveling through pipes. I'm having a hard time seeing that analogy. Hmm.

I thought this was one of the least readable sections so far, which is surprising, because I thought this was really easy to understand and seemed, at least, easier to explain than other sections when we did it in class. Also, the length of this section (and one of the others in this chapter) made it harder to read. But overall, it was still very good.

Section 5: The Minimum Spanning Tree Problem

How do you build a “communication network” on top of a connected graph such that there's a path between every pair of nodes, but so that it's done as cheaply as possible? The solution is going to be a tree, because if there were any cycles, you could get rid of any one of the edges in the cycle and still reach every node but decrease the overall cost (proof p. 142) (hence, the minimum spanning TREE). This problem has a bunch of intuitive greedy algorithms that actually work. (1) Start without any edges and insert edges from the graph in order of increasing cost as long as adding that edge doesn't create a cycle (Kruskal's Algorithm). (2) Start some node s and grow the tree greedily by attaching the node that can be added as cheaply as possible (Prim's Algorithm … modified Dijkstra's). (3) Start with the full graph and delete edges starting with the most expensive edge unless deleting an edge will disconnect the graph (Reverse-Delete Algorithm). The Cut Property: given some subset S of nodes that is not empty or equal to all of V, the minimum edge with one end in S and one end in V-S will definitely be in the minimum spanning tree of V (proof p.145). That lets you easily prove Kruskal's and Prim's Algorithms. The Cycle Property: the most expensive edge in any cycle in a graph will definitely not belong to the minimum spanning tree (proof p.147). That lets you prove the Reverse-Delete Algorithm. With a heap-based priority queue, you can get Prim's algorithm to run in O(mlogn) time (Kruskal's will also run in the same time, see next section). Note: this problem was introduced in the context of a communication network, but it's probably not the right kind of solution to a network problem, because cutting any one edge disconnects the network, and with so few edges there are likely to be congestion issues.

If this kind of solution isn't good for networks, what is it actually useful for? There has to be something, right?

Super easy to read! The only thing is that we emphasized the “naming” of the Cut property and Cycle property more than they did. They called it the cut/cycle property but they didn't bold those words or make them titles or anything (I missed them my first read-through).

Section 6: Implementing Kruskal's Algorithm: The Union-Find Data Structure

The Union-Find data structure stores a representation of components in such a way that supports rapid searching and updating (i.e. you start with a graph that has only nodes and no edges and add edges keeping track of connected components the whole time). This is what you need to efficiently implement Kruskal's Algorithm. The Union-Find data structure should maintain disjoint sets such that Find(u) gives the “name” of the connected component that u is in. If Find(u) = Find(v) then they're in the same connected component. So in Kruskal's Algorithm, if they're in the same connected component already, we would NOT add an edge between them. Operations: MakeUnionFind(S) - for a set S, returns a Union-Find where all of the elements are separate sets in O(n) time (i.e. no edges yet); Find(u) - already explained (works in O(logn) time); Union(A,B) - for two sets A and B, merges them into a single set in O(logn) time. A simple data structure to use with Union-Find is using an array that keeps track of what component each element is in and updates all of the elements of the larger component when components are combined. For this, Find takes O(1) time. Union can take as long as O(n), but for Kruskal's it will take only O(nlogn) overall. You can also implement this using pointers, which is a little more complicated (but it's all explained around p. 154). This lets you do a union in O(1) time, but makes Find take O(logn) time. The book discusses a way to make Find run a little faster in certain cases, but it won't really help in any of the applications they/we use the data structure for (because other operations will determine the running time). The book then goes into exactly how to implement Kruskal's Algorithm with the Union-Find data structure so that Kruskal's takes O(mlogn) time.

It's really cool that this specific data structure can make this run in faster time! This is something I had trouble seeing earlier, and this section definitely helped.

The stuff about the pointers was really hard to read. That's probably partly because we didn't get into that in class and because pointers bring back nightmares from 209 (haha just kidding). Anyway, it just took more concentration to understand that part.

Section 7: Clustering

Oh hey the answer to other times that a minimum spanning tree can be useful! (We did this in class … why didn't I still have to ask that question earlier haha?). Clustering comes up when you've got a collection of things and you're trying to separate them into coherent groups. How do you tell how similar objects are? The distance function - the farther away two objects are, the less similar they are (the closer objects are the more likely they are to be in the same cluster). If you're trying to find a clustering of maximum spacing, then you want the shortest distance between clusters to be as big as possible. They explain the “long way” of doing this, but basically if you want k clusters, you should delete the k biggest edges from the minimum spanning tree. Then those are the shortest distances between the clusters and they're at a maximum. They also prove that it works.

They talk about the distance function being one way to cluster, but there must be other things to base it on. Also, in the context of pictures that they mentioned, the pixel stuff they talked about is probably not the best way to cluster pictures into similar groups the way you'd really want to in real life. Interesting stuff.

Pretty easy to read. They sort of “went through with” the explanation of the “long way” (i.e. non-minimum-spanning tree explanation) to the solution more than we did before they started talking about the minimum spanning tree. I found that interesting.

Section 8: Huffman Codes and Data Compression

This problem concerns data compression. In order to represent text in symbols, you have to change them into bits. One way to do this would be to use a fixed number of bits to represent each letter (ex. if you've got 32 symbols to represent, you'd use 5 bits for each letter). But we want to reduce the average number of bits that it takes to represent letters. You can take advantage of nonuniform frequencies in the number of times each letter comes up to do this (ex. the vowels come up a lot more than the letter 'z' so we really want the representation for vowels to be short, but z can take more than 5 bits if it needs to). Morse code is one good example of this, but they have dots, dashes and pauses (so that's three symbols, not just two like binary). If you converted Morse code into 0s and 1s without having pauses around, you start to run into prefix issues with ambiguity. Ex. if 'a' is encoded to 00 and 'b' is encoded to 001 and 'c' is encoded to 1, then 001 could mean ac or just b. They introduce some notation for making the problem precise. Mostly what's important is that ABL is the average bit length (p. 166 if I need to review). We can use binary trees to encode symbols with different bit lengths without having that prefix issue. All of the leaves are symbols and none of the other nodes are symbols. The path to the leaf/symbol is the encoding of it such that if you take a “left” path, that's a 0 and if you take a “right” path, that's a 1. Pictures are helpful for this (see slides). The book proves that the optimal tree for this is full. They then go into the “intuitive” way to try to create this tree, which is top-down, but that doesn't work. The actual solution for creating this tree is bottom-up. There will always be at least two leaves at the bottom layer of the tree (because it's definitely full). So the smallest two frequencies should be siblings. Then you combine their frequency and the mini tree that's just those two has a bigger frequency. So then you take the two smallest frequencies again out of all of the other frequencies and this new frequency and put the two smallest into a tree together and basically recursively do this. The book says this more formally, proves that this has to be optimal (p. 174) and explains why the overall running time is O(klogk) if k is the number of symbols you're encoding. Sometimes this isn't actually the best way to encode data; they go into some scenarios where it wouldn't have optimal compression.

Huffman codes may not always be the best for every set of data, but they are still used for compressing text, aren't they? Because the frequencies of letters in the English language is pretty consistent across different texts.

This was a long section, but it was pretty easy to read, except the math they introduced with summations was a little hard to figure out.

Section 9: Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm