Chapter 6.3: Segmented Least Squares
In previous problems, we've developed recurrence relations based on a binary choice, either n belonged to the optimal solution or it didn't. This problem, however, involves multiway choices; i.e. at each step, we have a polynomial number of possibilities to consider for the structure of the optimal solution.
A good example problem is one where we try to find the line(s) of best fit for a set of points. We are given a set of points P={(x_1,y_1), (x_2,y_2),…, (x_n,y_n)} with x_1<x_2<…<x_n. We will use p_i to denote the point (x_i,y_i). We must first partition P into some number of segments. The penalty of a partition is the sum of the following terms:
- The number of segments into which we partition P, times a fixed, given multiplier C>0
- For each segment, the error value of the optimal line through that segment
Our goal is to find a partition of minimum penalty.
Algorithm
In this problem, it is important to remember that the last point p_n belongs to a single segment in the optimal partition, and that segment begins at some earlier point p_i. Knowing that the last segment is p_i,…,p_n, we can remove those points from consideration and solve the problem on the remaining points p_1,…p_i-1. So the algorithm looks like this:
Segmented-Least-Squares(n):
- M[0…n]
- M[0]=0
- For all pairs i < = j:
- compute the error e_i,j for the segment p_i,…,p_j
(end for)
- For j=1,2,…,n:
- compute M[j]=min(e_i,j+C+OPT(i-1)) for 1 < = i < = j
- return M[n]
Find-Segments(j):
- if j=0 then return nothing
- else:
- find an i that minimizes e_i,j+C+M[i-1]
- return the segment {p_i,…,p_j} and the result of Find-Segments(i-1)
Computing all e_i,j values is O(n^3) running time, and the remaining problem runs in O(n^2) time.
I would rate this section a 7. I understood it, but I feel like they could have done a better job setting up the problem. I think the motivation for the problem was explained better in class than in this section.