However, the square of the vertical distance is only one of many reasonable ways of measuring the distance from the data points to the line mx + b. For example, it is perfectly reasonable to consider instead the expression 1 + (mxi + b - yi)2. If (xi, yi) is on the line, this value would be equal to 1, and its logarithm would then be zero. Thus, the function
There is a priori no particular reason to prefer one error function over another. And for each choice of such we could end up with a problem whose solution exists and produces a line that best fit the data. Then we could compute the value of the error function for the best fit line, value that depends upon the choice of function made. A canonical way of choosing the best error function could be that for which this number is the smallest. But finding such a function is quite a difficult, if not impossible, problem to solve. If we limit our attention to linear estimators, in a sense to be explained in the last section of this chapter, the solution obtained in §1 is optimal. In here, we pursue further the problem of finding the best fit according to the two error functions introduced above.
Notice that if
= mxi + b - yi, data with large
errors
excert a huge influence on the original error function
of §1 (because we use
). The two error
functions mentioned above grow linearly (or sub-linearly)
with
|
|, causing the tails to count much less heavily. We will
see this when comparing the different results.
In order to solve the problem now at hand, we first generate some ``noisy''
data. We load the routines in lsq_data.txt
:
> read(`lsq_data.txt`);
The data and its graphical visualization can then be obtained by:
> pts:=bad_line_pts():
> plot(pts,style=point);
> epsilon:= (pt,m,b) -> (m*pt[1]+b-pt[2]);
In order to compare results, let us first find the line given by least squares:
> H:=(m,b)->sum(epsilon(pts[i],m,b)^2,i=1..nops(pts));
> sol:=solve({diff(H(m,b),m)=0, diff(H(m,b),b)=0},{m,b});
> p := plot(pts,style=point):
l := plot(subs(sol,m*x+b),x=-10..10):
plots[display](p,l,view=-40..20);
Now, let's try it minimizing
ln(1 + )/2, which behaves
like
for small errors, but grows very slowly for large ones.
> R:=(m,b)->sum(ln(1+epsilon(pts[i],m,b)^2/2),i=1..nops(pts));
Here's what the functional we want to minimize looks like:
> plot3d(R(m,b),m=-5..0,b=-20..0,style=patchcontour, axes=boxed);
> plot3d(R(m,b),m=-4..-2,b=-18..-12,view=30..80, style=patchcontour, axes=boxed);
The equations we want to solve are not linear. In fact, they're messy enough that Maple needs some help to find the minimum. From the pictures above, we can choose an appropriate region.
> rs:=fsolve( {diff(R(m,b),m)=0, diff(R(m,b),b)=0},
{m,b},m=-3.1..-2.8, b=-15..-14);
And here we can compare the two lines found.
> p := plot(pts,style=point):
l := plot(subs(sol,m*x+b),x=-10..10):
s := plot(subs(rs,m*x+b),x=-10..10,color=blue):
plots[display](p,l,s,view=-40..20);
> plots[display](p,l,s);
Let us now try to minimize
|
|. Since this function is
not even differentiable near
= 0, we have to work harder
to get less:
> A:=(m,b)->sum(abs(epsilon(pts[i],m,b)),i=1..nops(pts));
Displaying the graph of this function gives us an idea about where its
minimum lies:
> plot3d(A(m,b),m=-10..5,b=-30..5,style=patchcontour, axes=boxed);
> plot3d(A(m,b),m=-4..-2,b=-15..-13,style=patchcontour, axes=boxed);