==== Chapter 6: Dynamic Programming ==== This chapter is about dynamic programming, which is a more powerful approach than greedy or divide and conquer methods. The basic idea is kind of the opposite of greedy approach. You explore all possible solutions to a problem but breaking a problem up into subproblems, the building up the correct solution into larger subproblems. This may seem to be close to brute-force, but it is not because of the way it explores the possible solutions. It never explicitly looks at every solution, but carefully breaks it down so that we can find an optimal solution with a good running time. === 6.1 Weighted Interval Scheduling: A Recursive Procedure === This problem is an extension of the interval scheduling problem that we developed a greedy solution for previously. The difference is that each interval has a weight/value and we want to maximize the value of the accepted set of intervals. There is no greedy solution to this problem, but we can set it up similarly, with the n jobs ordered by finish time. For each job j, we define p(j) which is the largest interval that is compatible(i.e., disjoint) with j. To start considering our optimal solution, we know that job n is either in the optimal solution, or not in the optimal solution. If it is, then the optimal solution is now the value of n plus the opt(p(n)). If n is not in the optimal solution, then the optimal solution is opt(n-1). This intuitively develops into a recursive algorithm. However, the running time is bad because the recursive calls lead to a lot of redundancy in computing the various optimal solutions. The fix to this problem is to use memoization, which is the process of storing the value of each optimal subproblem the first time we compute in and then just looking it up later. We can implement this process in O(n) time and then build up the set of intervals in the optimal solution from the array values of solutions for subproblems already computed. The algorithms are on pages 256-258. This process is confusing at first, but makes more sense as we go through it. It is actually pretty intuitive and very useful. === 6.2 Principles of Dynamic Programming: Memoizaiton or Iteration over Subproblems === This section presents a slightly different solution to the Weighted Interval Problem that will then act as a template for future dynamic programming problems. This solution is iterative and gets rid of the need for the memoized recursion. The important aspect of the previous algorithm is the array M of the values of the solutions to the subproblems. We can actually compute the values of M using a iterative approach instead of a recursive one. By starting at the beginning of M and incrementing j, we can easily compute M[j] using the maximum of the opt(j-1) and the value of j plus opt(p(j)), as done in section 6.1. I think the book didn't do a great job of explaining this iterative approach, but it made sense in class, so it was ok. In general, we make dynamic programming algorithms using the iterative approach rather than the recursive memoization one. The iterative solution to the Weighted Interval Scheduling Problem leads to a rough template for dynamic programming algorithms. Basically, you need to be able to find a polynomial number of subproblems such that you can easily get the overall solution from the solutions to the subproblems and there is some natural of intuitive ordering of the subproblems (i.e., smallest to largest) with an easy recurrence. It is often good to consider the recurrence by thinking about what the optimal solution would like before determining the subproblems. === 6.3 Segmented Least Squares: Multi-Way Choice === This problem is derived from the process of finding a line of best fit for a set of points. A line of best fit is not always helpful if you have a set of points that curves in such a way that it looks like 2 or more lines would be needed to make the error small. In this case, we consider the problem of finding a set of lines that gives us the best fit. In order to avoid the trivial solution (where you would have n-1 lines that connect 2 points each), we create a penalty for each partition which is the number of segments times a constant C plus the error value. Thus, our goals is to find the partition with the minimum penalty. The key point to this problem is to observe that the last point, call it n, belongs in only one segment of the optimal partition and that segment begins at some earlier point, i. We can say that the value of the optimal solution for the last segment from i to n is e(i,n) + C + opt(i-1), where e is the error. The optimal solution for each subproblem can be achieved by taking the minimum of these solutions for combinations of i and j. The algorithm is on page 265. Similarly to the weighted interval problem, we can trace back through the array to get the set of lines in opt(n). This algorithm runs in O(n^2) time. I think this algorithm is kind of hard to come up with on my own, but by looking at the similarities to the weighted interval problem, it is getting easier to see and understand the general approach for dynamic programming algorithms. === 6.4 Subset Sums and Knapsacks: Adding a Variable === The Subset Sums Problem is when we have n requests, a bound W, and weights w(i) and we need to select the subset of items that gives us the largest weight subject to the bound W. Like any dynamic programming problem so far, we need to find small subproblems that can then make the original solution easier to find. The subproblems we used for the weighted interval scheduling problem will not work in this case - we actually need more subproblems. We need a subproblem for for each item i and each possible value for the remaining weight. Basically we are saying if W < w(i) then opt(i,w) = opt(i-1,W). Otherwise, opt(i,W) = maximum of opt(i-1,W) and w(i) + opt(i-1, W-w(i)). The second part is when item i is in the subset. The algorithm is on page 269, It uses a 2-dimensional array as an extension of previous problems and runs in O(nW), or pseudo-polynomial time. The Knapsack Problem is an extension where each item i has a weight and a value, and there is a maximum weight for the knapsack. The idea is to maximize the total value you can put in the knapsack, subject to the weight constraint. The recurrence is the same as the Subset Sums Problem, and it also runs in O(nW) time. === 6.5 RNA Secondary Structure: Dynamic Programming Over Intervals === This section discusses a problem where is not easy or possible to naturally break the problem into subproblems. The problem here is looking at RNA strands. An RNA strand consists of a sequence of the symbols {A,C,G,U} where A,U and G,C can pair up. The secondary structure is when the strand wraps around and pairs up with itself, which some restrictions. Each pair must be separated by 4 bases (i.e. i