We previously developed a recurrence based on a fundamentally binary choice: either an interval n belonged to the solution or not. The recurrence in this problem will involve what might be called “multi-way choices”. At each step we have a polynomial number of possibilities to consider for the structure of the optimal solution. The Problem. Goal: trying to pass a line of best fit through a set of data plotted on a two-dimensional set of axes. These points have distinct x-coordinates. We say that the error of this best fit line, L with respect to the set of points in the plane is the sum of its squared “distances” to the points in the plane. Our main goal here is to find the line with minimum error but, there are cases in which two lines would give a less error instead of using one line. Thus, we need to seek a set of lines(as few as possible) that gives us minimum error as our new goal. Formulating the Problem. We need to partition our points into segments. Then compute the line minimizing the error with respect to the points in each segment, according to the formulas above. The penalty of a partition is defined to be a sum of the following terms:
- Number of segments, times a fixed given multiplier C>0
- For each segment, the error value of the optimal line through that segment.
our goal is to now find a partition of minimum penalty. If we increase the number of segments, we reduce the penalty terms in part(2) of the definition, but we increase the term in part(1). We shall use dynamic programming to find a partition of minimum penalty in time polynomial in n. Designing the Algorithm. We want a polynomial number of sub problems, the solutions of which should yield a solution to the original problem; and we should be able to build up solutions to these sub problems using a recurrence. The last point belongs to a single segment in the optimal partition, and that segments starts early before the last point. We know OPT(0)=0 as a boundary case, then we observe that the value of the optimal solution is OPT(n)=e(i,n)+C+OPT(i-1) if the last segment of the optimal partition is pi,….,pn. Algorithm on Page 265.
Analyzing the Algorithm. To consider the running time of Segmented-Least-Squares , first we need to compute the values of all the least-squares errors. There are O(n^2) pairs, and for every pair we need to compute its error which will take O(n) time for each pair thus an O(n^3) running time. But there are n iterations for values j=1,…,n. And for each value of j, we have to determine the minimum in the recurrence which takes O(n) time, thus an overall O(n^2) time once all the error values have been determined.
This section was pretty hard, thus I give it a scale of 7.5/10.