====== Chapter 6: Dynamic Programming ====== ==== Intro ==== * **Summary:** There isn't always a Greedy Algorithm or a Divide and Conquer Algorithm to solve a problem efficiently. Dynamic programming is a faster and more subtle technique. "...the basic idea is drawn from the intuition of divide and conquer and is essentially the opposite of the greedy strategy: one implicitly explores the space of all of the possible solutions, by carefully decomposing things into a series of subproblems, and then building up correct solutions to larger and larger subproblems." "Dangerously close to brute-force." ===== Section 1: Weighted Interval Scheduling: A Recursive Procedure ===== * **Summary:** This is like the interval scheduling problem, but every job has a weight and we want to maximize the sum of the weights/values. Again, we sort them in order of finish time (larger indexes are earlier end times). Notation: for a job i, p(i) is the largest index of a job that's compatible with i. Start off by saying something obvious about an optimal solution. It either includes the nth (latest end time) job or it doesn't. If it does include the nth job then nothing that comes after p(n) could possibly be in the optimal solution, so the optimal solution includes the nth job and the optimal solution for the smaller problem of all of the jobs compatible with n. If the nth job is not in the optimal solution, then the optimal solution is the optimal solution for the smaller problem of all of the jobs except the nth job. And whichever of those two gives a better solution is actually the optimal. I.e. OPT(j) = max(Vj + OPT(p(j)), OPT(j-1)). And then we know j belongs to the optimal solution iff Vj + OPT(p(j))>= OPT(j-1). If we just ran this straight up with recursion (they give an algorithm, but it's pretty clear), it would take exponential time. BUT we can memoize! This only runs in exponential time because of the redundancy of the recursion. We recompute some of those OPT(x)'s a bunch of times! Instead, we compute it once and put it in an array M[] when we compute it, so that every other time we need it, we only have to do a lookup instead of a recursive computation. And since we only fill in every thing once (i.e. we do OPT(0), OPT(1), OPT(2), ..., OPT(n), but we only do each one once), it takes O(n) time (wayyy better than exponential!). I already kind of mentioned how you'd figure out a solution (not just a value) based on the result (it's in there if Vj + OPT(p(j)) >= OPT(j-1)), but they give an algorithm for it and show that it also only takes O(n) time to figure out. * **Insights and Questions:** None. * **Readability:** Awesome! This book is great. ===== Section 2: Principles of Dynamic Programming: Memoization or Iteration over Subproblems ===== * **Summary:** The key to the efficient algorithm is the array M. Once we have that, we can find the solution in O(n) time, and we can just compute the entries directly (iteratively) instead of recursively. They talk about this in the context of the previous problem if I need any more details (p.259). This new way (iterative way) of computing the array gives a general template for designing dynamic programming algorithms. Properties that need to be satisfied: (i) there are only a polynomial number of subproblems. (ii) the solution to the original problem can be easily computed from the solutions to the subproblems (ex. actual solution might be one of the subproblems). (iii) there's a natural ordering on subproblems from "smallest" to "largest" together with an easy-to-compute recurrence that allows one to determine the solution to a subproblem from the solutions to some number of smaller subproblems. Sometimes easier to define recurrence first and then think about structure of optimal solution, sometimes the other way around (chicken & egg-esque problem). * **Insights and Questions:** No biggies. This is pretty abstract, and I get how it applies to our problems (but I might not look at it this way to figure the problem out). * **Readability:** Hmm. I guess I get why they included this, but I don't really think it helped me understand the other examples any better. But it was easy to read and made sense. ===== Section 3: Segmented Least Squares: Multi-way Choices ===== * **Summary:** The last problem had a decision to either include the job or not (binary); this problem will have a multi-way decision. It also shows how a clean algorithmic definition can formalize a fuzzy problem. Say you've got a bunch of points on a graph and you want to find the line of best fit. The error of the line is the sum of its squared "distances" to the points. So we want to find the line with minimum error (they give the math for it in a closed-form solution). But that math formula doesn't handle the case where points seem to lie on a sequence of two or more lines (i.e. one line would give lots of error, but two would be pretty accurate). So now the problem is to fit the lines well with as few lines as possible. They give us formal notation and info, but basically there's a "penalty" of a partition that's the sum of the number of lines you use times a constant multiplier C and the error value of the optimal line through each segment (we want to minimize that, of course). Observation: the last point pn belongs to a single segment in the optimal partition, and that segment begins at some earlier point pi. If we can figure out what i is, we can remove the points from i+1 to n and solve recursively on the leftover points. OPT(i) is the optimal solution for points 1 through i. e(i,j) is the minimum error of any line with respect to pi through pj. If the last segment is pi, ... , pn, then the value of the optimal solution is OPT(n) = e(i,n) + C + OPT(i-1). So then you figure out the Optimal solution between every set of points i through j and include it if the minimum is obtained using that index i (see p. 265 for more of the algorithm and formal stuff). There are n^2 pairs that we need to analyze, so that's O(n^2) and then we need to use an O(n) algorithm to get e(i,j) for every pair of points i and j, so we wend up with an O(n^3) running time (or just O(n^2) if you've already pre-computed all of the e(i,j) values. * **Insights and Questions:** How do you pick a good C value? If it's too big, you don't get enough lines, but if it's too small you'd get a ton of lines! * **Readability:** Great! I actually understood this much better by reading it than by doing it in class (which is unusual). ===== Section 4: Subset Sums and Knapsacks: Adding a Variable ===== * **Summary:** This is like problems we've solved before, but the "obvious" set of subproblems isn't enough so we create more by adding a new variable to the recurrence. Problem: we've got a single machine that can process jobs and we have requests {1, 2, ... , n}. We can only use the resource between times 0 and W. Each request requires a time wi to process. Goal is to keep the machine as busy as possible up to the cutoff. That is, we've got n items, each with a weight wi, and we have a bound W and we want a subset of S of the items that is as large as possible -- the "Subset Sum Problem." This is a special case of the Knapsack Problem, where each request has a value vi in addition to the weight and you're trying to maximize the value. The strategy we used before where the optimal solution depended only on whether the last request was accepted or rejected doesn't work anymore. We need a better algorithm. We need to know the value of OPT(n-1) and also the value of the best solution we get for using the first n-1 items and a total allowed weight of W-wn. That means we have to solve a lot more problems. So the solution is either going to be OPT(n,W)=OPT(n-1,W) if n's not in the optimal solution OR OPT(n, W) = wn + OPT(n-1,W-wn) if n is in the optimal solution since the capacity is decreased if we put the nth item in there. The first case happens when the item's too big to fit, so we do that for sure if it's too big; otherwise we pick the choice that gives the better solution. So then this problem is going to be like filling in a table of subproblems (we compute for every possible weight matched to every n), so we end up with a running time proportional to the number of entries in the table, or O(nW). We can recover the optimal set of items in O(n) time. Then they get into the Knapsack problem, which we've already basically given a solution to (if I need more I can look back at class notes). * **Insights and Questions:** None. * **Readability:** They think the Knapsack problem is more complicated than the first problem they introduce?!? I don't think that other problem was necessary to introduce Knapsack; I think it made everything more confusing. Knapsack is pretty clear (probably because it's a physical problem) ... and it felt very jammed with two such different problems in the same section. The connection between the two problems was interesting, though. ===== Section 5: RNA Secondary Structure: Dynamic Programming on Intervals ===== * **Summary:** This problem will require adding a new variable like in the Knapsack problem. DNA is "zipped" together by complementary "base-pairing." If you look at the DNA as bases chosen from {A,C,G,T} and A-T pair up and C-G pair up, then those pairings hold the DNA together. RNA is only one "strand" of bases, so it loops around and pairs back up with itself. The set of pairs formed is called the secondary structure. So we're faced with the problem of having a strand of RNA, viewed as a sequence of n symbols drawn from {A,C,G,U} (the ith symbol is denoted bi). A and U match up and C and G match up (each can only be matched to one other base). Here are the conditions for matchings: no sharp turns, elements in a pair are either A and U or C and G, no base appears in more than one pair, noncrossing. (See book p274 or notes for clarification on conditions). We want to figure out how to get a secondary structure with as many base pairs as possible for a given strand of RNA. In a first attempt at a solution, you start at the end of the strand and try to match that base to some other base. You run into problems because your subproblems will be the optimal solution for the RNA up to the attempted match and the optimal solution for the part of the RNA between the two you just matched (the other option is that that base doesn't match to anything, but that doesn't cause problems). This is an issue, because usually we call OPT(#), but here we need to specify start and end points in the call. So we just pass in two numbers as start and end points for OPT. See the book p276 or our notes for specifics on the algorithm, but it's essentially the same as others, choosing to either pair the base with something or not to. If so, we get the opt of 1+OPT(the stuff before the first thing in the pairing)+OPT(the stuff between the two pairings). If not, we just do OPT(the stuff before the base that you're not pairing). And we keep everything in a memoized table and trace back through that to get the actual secondary structure. * **Insights and Questions:** I'm not sure why this was so much more complicated than previous problems. I mean, it seems intuitive that you would need two variables for the start and endpoints, and I'm not sure why they needed to show the initial, incorrect attempt at a solution (I didn't think it made it any more clear). * **Readability:** I thought it was very readable, even though I don't have any idea about biology-type stuff, I understood their explanation of the problem without much trouble. ===== Section 6: Sequence Alignment ===== * **Summary:** Sequence alignment is a problem that comes up when you're comparing strings. For example, it's used online a lot when you misspell words a little, it asks you, "Did you mean ___?" Ex. ocurrance and occurrence. We see intuitively that they are similar, but how does a computer see that they're similar? Well, we can almost line them up letter-by-letter ... o-currance vs. occurence, with one gap and one mismatched vowel (e to a). So we're going to use a model that determines similarity based on the number of mismatches and the number of gaps. How do we decide how to line them up? It's kind of easy when we're looking at English words, but when you get into seemingly random strings like the DNA/RNA in biology, it becomes a lot less clear. They determine similarity based on the optimal alignment between two strings X and Y where there's a gap penalty (delta, which I will use "d" for) and a mismatch cost (alphapq, which is 0 if the letters match, and can have different values for different mismatches -- ex. a vowel mismatch might not incur as much of a penalty as others). So the cost M of the alignment is the sum of its gap and mismatch costs, and we look for an alignment of minimum cost. The overall method for determining the minimum cost alignment will be OPT(i,j)=min[alpha(xi, yj)+OPT(i-1,j-1), delta+ OPT(i-1,j), delta+OPT(i,j-1)]. That is, we'll start at the back of each of the strings. We can either take a gap cost at the end of one string and then get the minimum alignment of all but the last character of the other string (there are two options here, the second two in the previous equation), or we can match those two letters and take the cost of aligning them (which can be 0 if they match or nonzero if they don't) and then get the minimum alignment of the rest of the strings. They go on to analyze the correctness of the algorithm and show that it runs in O(mn) space (m and n are the number of letters in X and Y respectively). They also explain how you would get the optimal alignment from your memoized table (by getting the shortest path from opposite corners of the matrix ... more info on p284). * **Insights and Questions:** I am really curious about how they pick values for delta and alpha. They explain that a lot of thought goes into it, and I definitely see how that could be. It could totally change how you align words. * **Readability:** Great! It was a good review of the algorithm, but I'll be using my class notes to review the process of getting the optimal alignment back out, because their version was harder to follow than what we did (it almost seemed rushed). ===== Section 7: Sequence Alignment in Linear Space via Divide and Conquer ===== * **Summary:** In the previous section, they showed us how to compare two strings, but it took O(mn) time AND O(mn) space for the array. The amount of space required turns out to be a big issue in problems involving really big strings (the running time is not really a problem). So in this section they show how to do this in O(m+n) space. If you don't need the optimal alignment, but just the optimal value, then you can just use O(m) space (collapse the array into an m x 2 array so that the row 0 entries contain the value of the "previous" column and the row 1 entries contain the current column's value. But then we can't get the optimal solution from this information. Before, we were looking for the shortest path from the top left to the bottom right of the table. This algorithm is going to be really hard to explain without using subscripts and lots of equations (so see p287), but essentially the insight is that the shortest path we were looking at before must go through the ith column at some point. And the minimum path from the top left to that place in the ith column plus the minimum path from the bottom right to that place in the ith column is the minimum path we're looking for. So we end up only looking at approx. the top left 1/4 of the table/graph (because if it passes through the jth position in the ith row, you don't need to look at any rows lower than j) and the bottom right 1/4 of the table/graph (by a similar argument). That j we're looking for will be the "minimizing index." This algorithm maintains the shortest corner-to-corner paths as they're discovered (so we keep track of the optimal alignment, not just the optimal value). They go on to prove that this still runs in O(mn) time. * **Insights and Questions:** None really. It's interesting that the space constraint is the bigger issue in this case. I would not have thought about that. * **Readability:** As usual, I found this section quite readable, but not quite as good as most. I will need to go over this a few times to truly 'get' it. I think the class notes are probably going to be more useful for that because they're more picture-heavy. ===== Section 8: Shortest Paths in a Graph ===== * **Summary:** This section discusses how to find shortest paths in a graph that contains negative edge costs (Dijkstra's only works for graphs with only positive edge costs). First we'll have the problem of figuring out if the graph has a negative cycle (i.e. the sum of the edge costs around the cycle is negative). Then we will find the shortest path P from s to t if the graph has no negative cycles. Things that don't work: Dijkstra's Algorithm no longer works (it greedily takes the edge of least cost out of the start node, but that path might be more expensive than a path that starts with a more costly edge but is compensated for by a negative cost edge later in the path), modifying the costs by adding a constant to them also doesn't work (you add more overall cost to longer paths from s to t than shorter paths, also see counterexample from class). The Bellman-Ford Algorithm uses dynamic programming to correctly find shortest paths. If G has no negative cycles, then there is a shortest path from s to t that is simple (i.e. doesn't repeat nodes), and hence has at most n-1 edges (p293). So we're going to try to find OPT(i,v), which is the optimal v-t path that uses at most i edges (so initially we'll be computing OPT(n-1,s)). Two choices: the path uses at most i-1 edges, in which case we're looking for OPT(i-1,v) or it uses i edges and the first edge is (v,w) so we're looking at a solution that is the cost of the (v,w) edge + OPT(i-1, w) ... for that one we'll have to consider all of the edges w that could be next out of v. They go into the algorithm. The running time is O(n^3) because there are the memoized table has n^2 entries, each of which could take O(n) time to compute. And then tracing back through to find the shortest path takes O(in) time where i is the number of edges in the path. They prove that you can put a tighter bound of O(mn) on the running time if there are significantly less edges than n^2. They then prove that you can significantly improve the space bound by only storing the shortest path from s to each node v instead of having the whole table. If we do this and also have each node maintain the first node on its path to t. There's also a chance that this can improve the running time of the algorithm, because you can stop when none of the entries in the array change in an iteration (but overall it doesn't affect the running time). * **Insights and Questions:** None really. I am curious about the other algorithm they mentioned that starts with the idea that subproblem i could be to find the shortest path using only i nodes ... which they say doesn't immediately work but can be made to work (but then they just switch to the Bellman-Ford Algorithm). It's also really interesting that this was one of the first applications of dynamic programming, but one of the last that we study * **Readability:** Awesome! I really enjoyed this section! ===== Section 9: Shortest Paths and Distance Vector Protocols ===== * **Summary:** Routers in a communication network are an important application for the Shortest-Path problem. Nodes correspond to routers, and there is an edge between v and w if the two routers are connected directly. The cost is the delay on the link from one node to another. The problem is to find a path with minimum delay from a source node s to a destination t. Since the delays are definitely nonnegative, you could theoretically use Dijkstra's Algorithm, but that requires global knowledge of the network. The Bellman-Ford Algorithm doesn't ... each node just needs info from its neighbor (i.e. the length of the path to t from each node). To improve on Bellman-Ford, they propose switching from a pull-based algorithm (in each iteration, the nodes "pull" the value from their neighbors) to a push-based algorithm (none of those values "pulled" in the earlier algorithm change unless the node they're being pulled from has changed, so we will update the connected nodes when we change a value). Not all values need to be pushed each time, so running time is improved (a little). They give a concrete version of this algorithm and then discuss the real situation, where routers update at different speeds, so the algorithm runs asynchronously (a node that's values is changed becomes "active" until it has updated all of its values). The asynchronous version works, too. This algorithm computes shortest paths from a single source to a single destination, but in real life you'd be interested in distances from and to everything else, so they keep track of a distance vector protocol, which maintains a vector of distances to every other node in the network. There are a few problems with this approach. The algorithm assumes that edge costs remain the same and that edges are never deleted, but that's not true in real life. If an edge gets deleted, then cycles in the graph can just keep updating their information based on what's around them and never notice that the edge has been deleted (see the example from the slides). It's hard to tell if paths are just "down" or if they're really just disconnected. To avoid this problem, designers of network routing schemes used path vector protocols (store the whole path to everything else) instead of distance vector protocols. That way nodes can avoid the counting to infinity problem, but they require a lot more storage. The path vector protocol is used in the Border Gateway Protocol in the Internet core. * **Insights and Questions:** How do nodes/paths get "added" in this algorithm? If we initially "know" which nodes are connected in order to figure out what needs to be "pushed" to, then, if something is added later, how do we tell the nodes that they need to push to this one? I guess we just update their list of things to push to ... * **Readability:** As always, very easy to read :)