Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
courses:cs211:winter2011:journals:wendy:chapter6 [2011/03/14 18:59] – shangw | courses:cs211:winter2011:journals:wendy:chapter6 [2011/04/03 05:29] (current) – shangw | ||
---|---|---|---|
Line 14: | Line 14: | ||
===== Section 2: Principles of Dynamic Programming: | ===== Section 2: Principles of Dynamic Programming: | ||
+ | The section sums up the basic principles of dynamic programming. | ||
+ | The fundamental idea is to iterate through all subproblems rather than computing recursively. While we try to solve the subproblems, | ||
+ | The basic outline/ | ||
+ | I, There are only a polynomial number of subproblems | ||
+ | II, The solution to the nth problem can be easily | ||
+ | III, There is a natural ordering of the subproblems ( for example, the finishing time in the Weighted Interval Scheduling). | ||
+ | |||
+ | This section concisely summarize the logic behind dynamic programming. I think it quite useful before looking at more complicated examples. | ||
+ | |||
+ | Readability is 8. | ||
+ | |||
+ | ==== Section 3: Segmented Least Squares: Multi-way Choices ===== | ||
+ | This section talks about another problem to be dealt with through dynamic programming. But it is slightly different from the Weighted Interval Scheduling problem. | ||
+ | The book first spends a lot of paragraph explaining the problem. In short, we want a best fit for some discrete points, the best fit is composed by several lines. However, every line added will add some penalty. There is parameter, OPT, composed by Error and Penalty from adding lines and we want to minimize the OPT. | ||
+ | The relationship we have between the OPT of the nth and the previous ones is: | ||
+ | OPT(n)=min_1< | ||
+ | Notice in this time, in order to find OPT(n), we actually need all the OPT(i) for i before n. Because it is totally possible that if we set up a line from n-1th to nth point, the total value is not as good as set up a line from n-ith to nth. This is also where the difference between this problem and the weighted interval scheduling problem comes from. | ||
+ | Hence, the running time in the algorithm to find out the minimum value of OPT(n) is O(n^3). Tracing back to find out the lines take O(n^2) time, because for each point we need to look at multiple lines. | ||
+ | The readability is 7. | ||
+ | |||
+ | ==== Section 4: Subset Sums and Knapsacks: Adding a Variable ===== | ||
+ | This is another variation of the dynamic programming problem. | ||
+ | The book first introduces the problem. We have a lot of items with different values to pick from but a weight limit. How to pick so that we can maximize the total value? | ||
+ | The optimal relationship is | ||
+ | If wi>w, obviously adding i will make the basket overweight so the answer is no, otherwise: | ||
+ | OPT(i, | ||
+ | If not adding the item gives a better result than adding it, surely we choose no, otherwise yes. | ||
+ | It is different from the previous two because now we look at two variables, both the weight (not to exceed) and the value (to maximize) instead only 1 to optimize. On the surface, it is a little bit like the weighted Interval problem because it is a single way decision, either include the item or not. However, when we look closer, to decide to include item n or not, we need to see if there is a way to make better use of the weight without including the item. Then we need to calculate all the possible optimal combinations at each weight w from 1 to W with allowing the 1st item, 1st+2nd items, 1st+2nd+3rd items... 1st+2nd+3rd+..nth items. So this takes O(n*W) time. Trace back to find which items to add takes linear time. | ||
+ | Readability is 7. | ||
+ | |||
+ | ==== Section 5: RNA Secondary Structure Dynamic Programming over Intervals ===== | ||
+ | This is another slightly more complicated problem. | ||
+ | The problem is RNA matching, which need to satisfy the two ends of a base to be separated by at least 4 elements, no crossing, and AU, CG correspondingly match. We try to find out all the possible matches and filter out the ones are definitely not the optimal matching till at the end there is only 1 left. We try matching any two molecules that are 4 distance apart, if they do not match we just go on to the next one. The difficulty of this problem is that once a new pair appear, we will have further constrains (crossing, 4 elements) because of the appearance of the pair. In fact, in some cases, the original problem will be divided into 2 subproblems (not evenly). That is why we call it " | ||
+ | Even though, realistically, | ||
+ | Readability is 6. | ||
+ | |||
+ | ==== Section 6: Sequence Alignment===== | ||
+ | This section introduces a fairly commonly seen problem: sequence alignment, which appear in google-search and molecular computational biology. We want to see if two things are " | ||
+ | The algorithm given by the book is to use the recurrence: | ||
+ | OPT(i, | ||
+ | First possibility is to have i,j matching each other, second is to skip i, third is to skip j. | ||
+ | To prove the correctness of the dynamic programming, | ||
+ | |||
+ | However, in real world scenario, we are not going to be given the target to compare with what we type in. In other word, how can the computer know to compare " | ||
+ | |||
+ | Readability is 7. | ||
+ | |||
+ | ==== Section 7: Sequence Alignment in Linear Space via Divide and Conquer===== | ||
+ | This section gives an improvement of the algorithm from section 5. | ||
+ | In real world, especially molecule biology, there are many elements to be aligned and it takes over too much space (O(mn) for the graph) if we use the algorithm given in the last section. Hence, we hope to find a more spacial-efficient algorithm. | ||
+ | The first idea comes to mind (in the book) is to only record the current column and the one right before it because those are the only relevant information in order to calculate the optimal value. | ||
+ | However, a problem appears: we cannot record the path leading to the optimal value in that way. Then the book further elaborates the idea using divide-conquer. The algorithm is based on this theorem: | ||
+ | m and n to be aligned, k be in {0..n} and q be an index that minimizes the quantity f(q, | ||
+ | This, together with working the problem backward from (m,n) to (0,0) and forward as the original way, allows us to use divide and conquer. For one of the sequence to be aligned, wlog, say n, we always divide it by half; and the other sequence, m, we divide it based on the value of q. On the way, we record each q. | ||
+ | It indeed takes O(m+n) space. The recurrence relationship is: T(m, | ||
+ | I really like the idea and the readability is 9. | ||
+ | |||
+ | ==== Section 8: Shortest Paths in a Graph===== | ||
+ | This is a variation of the shortest paths in a graph problem with only positive edges. In this extended problem, we allow negative-weighted edges; however we still do not allow a negative cycle-because it can just go on forever around the cycle and go to negative infinitive, and become meaningless. | ||
+ | When there is negative edges, it is easy to come up with a counter example that our old greedy algorithm no longer works. Then the book tries to modify the greedy algorithm by adding a constant M big enough to each edge so as to make them all positive, which is easy to be seen wrong, too. Hence, we need to turn to dynamic programming. | ||
+ | The recurrence is: | ||
+ | OPT(i, | ||
+ | This algorithm, like many other dynamic programming, | ||
+ | In addition, to save both time and space, we set a stopping signal: when we run into 2 same columns consecutively, | ||
+ | The readability is 8. | ||
+ | |||
+ | ==== Section 9: Shortest Paths and Distance Vector Protocols===== | ||
+ | This is section introduces an algorithm that improves the shortest paths algorithm from section 8. | ||
+ | First we observe the shortage of the original algorithm: it is a pull-based algorithm, and in each iteration, each node will contact all its neighbors and " | ||
+ | Then it continues to discuss some associated concerns with the distance vector protocol. In real life scenario, if one of the edges is deleted, which totally cut-off all the connections between a node a and the ending node b, the distance vector protocol will not cease updating M[a] and never realize there is no path available. And this, of course, can be problematic. | ||
+ | The way to solve it is to keep track of the big picture: each node not only storing knowledge of its neighbor but the entire network to ensure there actually exists a path. The extra storage will take over more space but is indeed necessary. | ||
+ | Readability is 7. | ||
+ |