When the data is approximate

Now we relax the restriction that the function we are searching for must pass
exactly through each of the data points. This is the typical situation in
science, where we make a measurement or do an experiment to gather our *y*
values from the input *x* values.

More specifically, let's assume that
*y* = *f*_{c1,..., ck}(*x*_{1},..., *x*_{m}) is a real-valued function of
(*x*_{1},..., *x*_{m}) which depends upon *k* parameters
*c*_{1},..., *c*_{k}.
These parameters are unknown to us. However, suppose that we can perform
repeated experiments that for given values of
(*x*_{1},..., *x*_{m}) allows us
to measure output values for *y*.
How can we estimate the parameters
*c*_{1},..., *c*_{k} that best correspond
with this information?

Let us assume that the experiment measuring the value
*y* = *f*_{c}(*x*)
for specific input values
*x* = (*x*_{1},..., *x*_{m}) is repeated *n* times. We
will then obtain a system of equations

f_{c1,..., ck}(x_{11}, x_{12},^{ ... }, x_{1m}) |
= | y_{1} |

f_{c1,..., ck}(x_{21}, x_{22},^{ ... }, x_{2m}) |
= | y_{2} |

f_{c1,..., ck}(x_{n1}, x_{n2},^{ ... }, x_{nm}) |
= | y_{n} |

Since we may perform the experiments as many times as we wish,
we may end up with more relations of the type above than unknowns
(that is, *n* is larger than *k*). The larger the *n*, the more information
we have collected about the coefficients. However, even if the experiments are
carried out with great care, they unavoidably will contain some error.
The question remains: how may we estimate judiciously the coefficients
*c*_{1},..., *c*_{k} using the collected information about
*y* = *f*_{c}(*x*)? What is the
best fit?

A very common method to respond to this question is known as the *method
of least-squares*. The idea is simple: per realization of the experiment, we
measure the fitting error by the distance from the real number
*f*_{c}(*x*_{1}, *x*_{2},^{ ... }, *x*_{m}) and the
observed value of *y*. The best fit for the distance though will also lead
to a best fit for the square of the distance. To avoid absolute values,
we change our viewpoint slightly and measure the fitting error by
(*f*_{c1,..., ck}(*x*_{1}, *x*_{2},..., *x*_{m}) - *y*)^{2}. The error function that
considers
all the information obtained from the *n* experiments is then

This turns out to be a function of
*c* = (*c*_{1},..., *c*_{k}).
Mathematically, our best fit problem is now reduced to finding the value
of *c* which produces a minimum for this error function *E*. The details of
how this can be done depends intrinsically upon the assumed form of the
function *f*, and its relation to the parameters
*c*_{1},..., *c*_{k}.

We should remark that in some cases, we might want to use

2002-08-29