This is an old revision of the document!

Chapter 4: Greedy Algorithms

Section 1: Interval Scheduling: The Greedy Algorithm Stays Ahead

Summary:

Sometimes greed works. An algorithm is greed if it builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion. When algorithms work to give an optimal or almost optimal solution, it says something about the problem's underlying structure. Problem: how to find cases where greedy works and prove that it works. First way: greedy stays ahead. Second way: exchange argument. Uses: shortest paths in a graph, minimum spanning tree, Huffman codes.

Interval Scheduling: start time s(i), end time f(i) for an event i. A subset is compatible of no events overlap. Compatible sets of max size are optimal. Idea for using greedy with this: select a first request i1. Rule out anything that's not compatible with i1 and do it again. What rule should you use to pick one? Seems reasonable, but doesn't work: pick earliest start time first, pick smallest interval first, pick the one with the fewest “conflicts” (–> none of those work … book explains why and we did in class, too). Actually pick the one that finishes first (book formalizes this). Analysis: if you have an “optimal” solution, call the first event in it O1, then O2, etc. through Om. Let the greedy algorithm's solution be A1 through Ak. Prove that k=m. Compare the start/end times of O1 to A1. A1 has to finish before or at the same time as O1. This is going to be true for every set of corresponding A's and O's. Book goes into a more detailed explanation, but basically it works and Greedy's solution is, at every point, ahead of any other solution (i.e. there have been the same number of events and it's “earlier”). Running time: if you sort the events in order of finishing time, you can get it to run in O(nlogn) time. Variants/Extensions: what if you don't know all of the events that you're trying to schedule? And what if they're weighted (i.e. optimal soln may not be to have as many as possible but to get as much weight as possible (i.e. they're paying for the use of a space). Another problem: Interval Partitioning Problem - trying to get all of the events scheduled in the least “depth” (ex. scheduling lectures in as few classrooms as possible). Book goes into an algorithm to solve this (a greedy algorithm) and shows that the depth is at most equal to the number of events that have overlapping times.

Insights and Questions:

The arguments about how the problem can be solved locally are really interesting. It seems like such a difficult problem, and the idea that it can be solved without looking at the entire situation, but only one part is realllllly fascinating.

Readability:

This section was a little long, but it was very readable and presented really interesting problems that seem like something that could/would come up in the real world, which made it more fun to read.

Section 2: Scheduling to Minimize Lateness: An Exchange Argument

Summary:

Similar problem: Requests with a deadline di and a contiguous time interval ti. Schedule the requests one after the other to minimize lateness (i.e. the how long after the deadline the request completes). We order them by Earliest Deadline First (book has an algorithm, but it's pretty straightforward). Analysis: 1. There's an optimal solution with no idle time. If there's idle time, you could just slide the jobs forward and you wouldn't make anything any more late. “A schedule A' has an inversion if a job i with deadline di is scheduled before another job j with an earlier deadline dj<di.” If there are multiple jobs with the same deadline, there can be different schedules w/ the same lateness (proof p. 128). “There's an optimal solution with no inversions and no idle time” (proof p.129). If we swap anything with an inversion, we can only decrease the overall lateness (or leave it alone). Then the optimality of this algorithm “immediately follows” (they give a short explanation).

Insights and Questions:

I've convinced myself that given the algorithm that orders events by the earliest finish time, this algorithm that orders jobs by the earliest deadline “makes sense.” They're definitely really similar problems, and it's interesting that their solutions are also sort of similar.

Readability:

Pretty easy to read. I think the pictures in this section will be really helpful to review later (it's not as much one of the ones where you need to see the diagrams “move” in order to get it).

Section 3: Optimal Caching: A More Complex Exchange Argument

Summary:

The problem: you can have a small amount of data that requires very little time to access and a large amount of data that requires more time to access; how do you decide what pieces of data to have in the quick-access? (Memory Hierarchy). This is a really common problem with computers. Caching is the process of storing a small amount of data in fast memory to avoid having to access slow memory as often. Cache maintenance determines what to keep in the cache and what to evict from the cache when new stuff needs to be brought in. In real life, cache maintenance has to work without knowledge of the future, but you can get knowledge from solving the problem with that knowledge. If you know the future, then the Farthest-in-Future Algorithm solves the problem with the fewest “misses” (a miss is when something you need is not in cache and you have to “go get it”). This algorithm looks at what's coming up and evicts the piece of data that's not needed until the farthest in the future. The book goes into other algorithms and shows that a subtle switch of “decisions” about which items to evict sometimes doesn't make a difference to the optimality of the solution (this is the exchange argument here). All of these algorithms only go get a piece of data when it's needed (this is called “reduced”). There are algorithms that get things before they're needed (nonreduced). The book proves that there's always a reduced schedule that's just as good as a nonreduced schedule. Using this, the book proves that Farthest-in-the-Future is optimal. Extension to real life: since you can't actually know the future in real situations, this algorithm won't work. It turns out that Least-Recently-Used seems to work pretty well.

Insights and Questions:

Least-Recently-Used is really non-intuitive for type of examples given in the book (i.e. lists of a, b, c, etc.) but it's interesting that it works so well in real situations!

Readability:

This is (I think) the first time I've read a section before we've discussed it in class. It definitely took more concentration to understand the subtleties of the proofs, but it was still really easy to read, though.

Section 4: Shortest Paths in a Graph

Summary:

Insights and Questions:

Readability:

Section 5: The Minimum Spanning Tree Problem

Summary:

Insights and Questions:

Readability:

Section 6: Implementing Kruskal's Algorithm: The Union-Find Data Structure

Summary:

Insights and Questions:

Readability:

Section 7: Clustering

Summary:

Insights and Questions:

Readability:

Section 9: Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm

Summary:

Insights and Questions:

Readability:

Table of Contents