Table of Contents
6.3. Segmented Least Squares: Multi-way Choices
With this problem, at each step we have a polynomial number of possibilities to consider the structure of the optimal solution.
The Problem
Suppose our data consists of a set P of n points in the plane,:
(x1y1),(x2y2),…,(xnyn), where x1 < x2,…,< xn
Given a line L with equation y = ax + b, we say that an “error” of L with respect to P is the sum of all of its squared distances to the points in P:
Error(L,P) = ∑ from i =1 to n (yi - axi- b) 2
Thus naturally, we are bound to finding the line with minimum error.
The solution turns out to be a line y = ax + b, where:
a = [(n∑i xiyi) - (∑ixi)(∑i yi)]/[(n∑i xi2) - (∑i xi)2]
And b = (∑i yi- a∑i xi)/n
However, our problem is different from the above mentioned which these (above) formulas solve.
Formulating our problem
Given a sequence of data points, we need to identify a few points in the sequence at which a discrete change occurs.–> In our specific case, a change from one linear approximation to another.
So, we have a set of points P = {(x1,y1),(x2,y2),…,(xn,yn)} with x1 < x2,…, xn.
pi denotes the point (xi,yi)
We must first partition P into some number of segments.
Each segment is a subset of P that represents a contiguous set of x-coordinates of the form {pi,pi+1,…,pj-1,pj} for some indices i≤j.
For each segment S in our partition P, we compute the line minimizing the error with respect to the points in S using the formula we found above(where we wrote the value of a and b in the line y = ax + b).
The penalty of a partition is defined to be the sum of:
The number of segments into which we partition P, times a fixed, given multiplier C>0.
For each segment, the error value of the optimal line through that segment.
Our goal is to find the a partition of minimum penalty.
Designing the Algorithm
INPUT: n, p1,…,pN, C
Segmented-Least-Squares()
M[0] = 0
e[0][0] = 0
for j = 1 to n
for i = 1 to j
e[i][j] = least square error for the segment pi,…,pj
for j = 1 to n
M[j] = min 1 ≤ i ≤ j (e[i][j] + c + M[i-1])
return M[n]
Just like the Weighted Interval Scheduling Problem, we need to find an algorithm that gives us the solution itself.
Algorithm to give for the solution itself
FindSegments(j):
if j = 0:
output nothing
else:
Find an i that minimizes ei,j + C + M[i-1]
Output the segment {pi, …, pj} and the result of
FindSegments(i-1)End if
Algorithm Analysis
The running time of the algorithm is O(n2) since filling in the array M[j] takes us O(n) time for each j.
I give this section 7/10.