Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
courses:cs211:winter2011:journals:wendy:chapter6 [2011/03/14 20:28] – [Section 4: Subset Sums and Knapsacks: Adding a Variable] shangw | courses:cs211:winter2011:journals:wendy:chapter6 [2011/04/03 05:29] (current) – shangw | ||
---|---|---|---|
Line 43: | Line 43: | ||
It is different from the previous two because now we look at two variables, both the weight (not to exceed) and the value (to maximize) instead only 1 to optimize. On the surface, it is a little bit like the weighted Interval problem because it is a single way decision, either include the item or not. However, when we look closer, to decide to include item n or not, we need to see if there is a way to make better use of the weight without including the item. Then we need to calculate all the possible optimal combinations at each weight w from 1 to W with allowing the 1st item, 1st+2nd items, 1st+2nd+3rd items... 1st+2nd+3rd+..nth items. So this takes O(n*W) time. Trace back to find which items to add takes linear time. | It is different from the previous two because now we look at two variables, both the weight (not to exceed) and the value (to maximize) instead only 1 to optimize. On the surface, it is a little bit like the weighted Interval problem because it is a single way decision, either include the item or not. However, when we look closer, to decide to include item n or not, we need to see if there is a way to make better use of the weight without including the item. Then we need to calculate all the possible optimal combinations at each weight w from 1 to W with allowing the 1st item, 1st+2nd items, 1st+2nd+3rd items... 1st+2nd+3rd+..nth items. So this takes O(n*W) time. Trace back to find which items to add takes linear time. | ||
Readability is 7. | Readability is 7. | ||
+ | |||
+ | ==== Section 5: RNA Secondary Structure Dynamic Programming over Intervals ===== | ||
+ | This is another slightly more complicated problem. | ||
+ | The problem is RNA matching, which need to satisfy the two ends of a base to be separated by at least 4 elements, no crossing, and AU, CG correspondingly match. We try to find out all the possible matches and filter out the ones are definitely not the optimal matching till at the end there is only 1 left. We try matching any two molecules that are 4 distance apart, if they do not match we just go on to the next one. The difficulty of this problem is that once a new pair appear, we will have further constrains (crossing, 4 elements) because of the appearance of the pair. In fact, in some cases, the original problem will be divided into 2 subproblems (not evenly). That is why we call it " | ||
+ | Even though, realistically, | ||
+ | Readability is 6. | ||
+ | |||
+ | ==== Section 6: Sequence Alignment===== | ||
+ | This section introduces a fairly commonly seen problem: sequence alignment, which appear in google-search and molecular computational biology. We want to see if two things are " | ||
+ | The algorithm given by the book is to use the recurrence: | ||
+ | OPT(i, | ||
+ | First possibility is to have i,j matching each other, second is to skip i, third is to skip j. | ||
+ | To prove the correctness of the dynamic programming, | ||
+ | |||
+ | However, in real world scenario, we are not going to be given the target to compare with what we type in. In other word, how can the computer know to compare " | ||
+ | |||
+ | Readability is 7. | ||
+ | |||
+ | ==== Section 7: Sequence Alignment in Linear Space via Divide and Conquer===== | ||
+ | This section gives an improvement of the algorithm from section 5. | ||
+ | In real world, especially molecule biology, there are many elements to be aligned and it takes over too much space (O(mn) for the graph) if we use the algorithm given in the last section. Hence, we hope to find a more spacial-efficient algorithm. | ||
+ | The first idea comes to mind (in the book) is to only record the current column and the one right before it because those are the only relevant information in order to calculate the optimal value. | ||
+ | However, a problem appears: we cannot record the path leading to the optimal value in that way. Then the book further elaborates the idea using divide-conquer. The algorithm is based on this theorem: | ||
+ | m and n to be aligned, k be in {0..n} and q be an index that minimizes the quantity f(q, | ||
+ | This, together with working the problem backward from (m,n) to (0,0) and forward as the original way, allows us to use divide and conquer. For one of the sequence to be aligned, wlog, say n, we always divide it by half; and the other sequence, m, we divide it based on the value of q. On the way, we record each q. | ||
+ | It indeed takes O(m+n) space. The recurrence relationship is: T(m, | ||
+ | I really like the idea and the readability is 9. | ||
+ | |||
+ | ==== Section 8: Shortest Paths in a Graph===== | ||
+ | This is a variation of the shortest paths in a graph problem with only positive edges. In this extended problem, we allow negative-weighted edges; however we still do not allow a negative cycle-because it can just go on forever around the cycle and go to negative infinitive, and become meaningless. | ||
+ | When there is negative edges, it is easy to come up with a counter example that our old greedy algorithm no longer works. Then the book tries to modify the greedy algorithm by adding a constant M big enough to each edge so as to make them all positive, which is easy to be seen wrong, too. Hence, we need to turn to dynamic programming. | ||
+ | The recurrence is: | ||
+ | OPT(i, | ||
+ | This algorithm, like many other dynamic programming, | ||
+ | In addition, to save both time and space, we set a stopping signal: when we run into 2 same columns consecutively, | ||
+ | The readability is 8. | ||
+ | |||
+ | ==== Section 9: Shortest Paths and Distance Vector Protocols===== | ||
+ | This is section introduces an algorithm that improves the shortest paths algorithm from section 8. | ||
+ | First we observe the shortage of the original algorithm: it is a pull-based algorithm, and in each iteration, each node will contact all its neighbors and " | ||
+ | Then it continues to discuss some associated concerns with the distance vector protocol. In real life scenario, if one of the edges is deleted, which totally cut-off all the connections between a node a and the ending node b, the distance vector protocol will not cease updating M[a] and never realize there is no path available. And this, of course, can be problematic. | ||
+ | The way to solve it is to keep track of the big picture: each node not only storing knowledge of its neighbor but the entire network to ensure there actually exists a path. The extra storage will take over more space but is indeed necessary. | ||
+ | Readability is 7. | ||