In this case, we use as error the square of the *horizontal*
distance, and it is somewhat perturbing that this similarly reasonable
approach leads to a best fit that differs from the one obtained when
employing vertical distances.

The situation can be made even worse if neither *x* nor *y* is a predictor
variable. In this case, we
want to minimize the shortest distance to a line *y* = *mx* + *b* rather than the
vertical (or horizontal) distance. The resulting equations will not be
linear, nor can they be made linear. However, Maple will be able to find
the critical points with no trouble. There will always be at least two
(there may be a third with a huge slope and intercept)- one is the minimum,
and the other is a saddle point. It is worth thinking for a few minutes
about the geometric interpretation of the saddle point in terms of the
problem at hand.

In practice, the perturbing fact that different but reasonable error
functions lead to best fits that are different is resolved by
knowing what part of the data is the input and what part is
predicted from it. This happens often enough, though not always.
We thus can make a good choice for *E* and solve a
minimization problem. That settled,
we are still left with the problem of demonstrating why our choice for *E* is
a good one.

It is rather easy to write down the solution to the problem in §3: if

= *x*_{i} , = *y*_{i} ,

then
= = , = - *m* .

If *y*_{i} are assumed to be the values of random variables *Y*_{i} which
depend *linearly* upon the *x*_{i},

This conclusion follows by merely making assumptions about the inner
products of the data points
(*x*_{1},..., *x*_{n}) and
(*y*_{1},..., *y*_{n}). Staticians often would like to answer questions such
as the degree of accuracy of the estimated value of *m* and *b*. For that
one would have to assume more about the probability distribution of the
error variables *Y*_{i}. A typical situation is to assume that the
above are normally distrubuted, with mean 0 and
variance . Under these assumptions, the values of *m* and
*b* given above are the so-called *maximum likelihood estimators* for
these two parameters, and there is yet one such estimator for the variance
. But, since we assumed more, we can also say more. The estimators
and are normally distributed and, for example, the
mean of is *m* and its variance is
/(*x*_{i} - )^{2}. With this knowledge, one may embark into determining the
confidence we could have on our estimated value for the parameters. We do
not do so in here, but want to plant the idea in the interested reader,
whom we refer to books on the subject.

2002-08-29