Latin Squares in Practice and in Theory I
Feature Column Archive
3. The statistical analysis of a latin-square experiment
Ronald A. Fisher realized that latin squares could be abstracted
from the partition of growing plots and applied to the elimination
of systematic error in a much more general context.
In the 2-dimensional plot of land, the systematic error due to
variation in soil, etc. can be minimized by a suitable latin
square partition of the plot. More generally, whenever there are two
independent factors that may introduce systematic error into
an experiment, a latin square arrangement in ``experiment space''
can compensate for these errors. (Fisher also showed
how graeco-latin and ``hyper graeco-latin" squares could be
applied to more complex experiments; see Fisher).
The following example is adapted from what was D. H. Kim's Stat 470 website at the University of Michigan. The data set is taken from The Design and Analysis
of Experiments by Douglas C. Montgomery (Wiley).
The experiment is to study the burning rate of five different formulations of a rocket propellant.
The formulations are mixed from raw material that comes in batches whose composition may vary.
Furthermore, the formulations are prepared by several operators, and there may be differences in the skills
and experience of the operators. So in this experiment there are two presumably unrelated sources
of systematic error: different batches and different operators.
To compensate for these systematic errors by a latin square design, five operators are chosen
at random, and five batches of raw material are selected at random, each one large enough
for samples of all five formulations to be prepared. A sample from the one of the five batches (labelled
at random I, II, III, IV, V) is assigned to one of the five operators (labelled at random 1, 2, 3, 4, 5)
for preparation of one of the five formulations (labelled at random A, B, C, D, E)
according to the following latin square arrangement; the table also contains the observed burning
rate for that formulation of that sample.
Operator
Batch |
| 1 | 2 | 3 | 4 | 5 |
I | A 24 | B 20 | C 19 | D 24 | E 24 |
II | B 17 | C 24 | D 30 | E 27 | A 36 |
III | C 18 | D 38 | E 26 | A 27 | B 21 |
IV | D 26 | E 31 | A 26 | B 23 | C 22 |
V | E 22 | A 30 | B 20 | C 29 | D 31 |
|
The calculational scheme used to analyze these data goes as follows.
- Normalization: The average of all the observations is 25.4.
We first normalize by
subtracting this average from each observation. This gives a new set
of data with average 0:
| 1 | 2 | 3 | 4 | 5 |
I | -1.4 | -5.4 | -6.4 | -1.4 | -1.4 |
II | -8.4 | -1.4 | 4.6 | 1.6 | 10.6 |
III | -7.4 | 12.6 | 0.6 | 1.6 | -4.4 |
IV | 0.6 | 5.6 | 0.6 | -2.4 | -3.4 |
V | -3.4 | 4.6 | -5.4 | 3.6 | 5.6 |
- Separation of signals: We want to think of these data as representing the superposition of
four signals:
- Effect of batch: the matrix of row averages
- Effect of operator: the matrix of column averages
- Effect of formulation: the matrix of A, B, etc. averages
- Nonsystematic error: whatever is left
Accordingly we write the normalized data matrix as the sum of four matrices:
-3.2 | -3.2 | -3.2 | -3.2 | -3.2 |
1.4 | 1.4 | 1.4 | 1.4 | 1.4 |
0.6 | 0.6 | 0.6 | 0.6 | 0.6 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| + |
-4.0 | 3.2 | -1.2 | 0.6 | 1.4 |
-4,0 | 3.2 | -1.2 | 0.6 | 1.4 |
-4.0 | 3.2 | -1.2 | 0.6 | 1.4 |
-4.0 | 3.2 | -1.2 | 0.6 | 1.4 |
-4.0 | 3.2 | -1.2 | 0.6 | 1.4 |
| + |
3.2 | -5.2 | -3.0 | 4.4 | 0.6 |
-5.2 | -3.0 | 4.4 | 0.6 | 3.2 |
-3.0 | 4.4 | 0.6 | 3.2 | -5.2 |
4.4 | 0.6 | 3.2 | -5.2 | -3.0 |
0.6 | 3.2 | -5.2 | -3.0 | 4.4 |
| + |
2.6 | -0.2 | 1.0 | -3.2 | -0.2 |
-0.6 | -0.3 | 0.0 | -1.0 | 4.6 |
-1.0 | 4.4 | 0.6 | -2.8 | -1.2 |
0.0 | 1.6 | -1.6 | 2.0 | -2.0 |
-1.0 | -2.8 | 0.0 | 5.0 | -1.2 |
|
- The Null Hypothesis. The entries in the third matrix represent the signal we are looking
for: the differences between the five formulations being tested.
The entries in the fourth matrix represent errors that cannot be accounted
for by operator or batch effect. The significance of the experiment depends on the
relations between these two sets of numbers. More precisely, we suppose that
there is no formulation effect (``the null hypothesis'') and estimate the
probability of a set of numbers like that in the third matrix being observed.
This calculation is made using what can be called the basic axiom of statistics:
Nonsystematic error is normally distributed.
- Analysis of variance. This is based on the following observation, adapting
Fisher's
words to this context: ``On the null hypothesis the mean squares for formulation and error
have the particularly simple interpretation that each may be regarded as an independent
estimate of the same single quantity, the variance due to error of a single observation.''
- The sample variance represented by the third (formulation) matrix is the sum
of the squares of the entries divided by the number of independent entries (``number of
degrees of freedom''), which in
this case is 4 since same-color (same formulation) elements are identical, and each column
sums to zero. The sample variance is s2f = 82.5 .
- Similarly the fourth (error) matrix has 12 degrees of freedom:
it has 25 entries; but the entries must sum to zero; the top four
rows must each sum to zero (the fifth is then automatic); the first four columns must
each sum to zero, and the first four same-color sets
must each sum to zero, a total of 13 constraints. The sample variance is
s2e = 10.66 .
- It is remarkable that the ratio of two such sample variances is a random variable with a known
distribution (here is where our ``basic axiom'' is used). Fisher and Yates call it the ``Variance Ratio''
distribution; it is now called Fk,n, where here k = 4, n = 12.
Interpolating from the table in Fisher and Yates
(Table V) or in a suitable text (for example
Box, Hunter and Hunter) shows that the probability of
that ratio being as large as or larger than the value in this example (82.5/10.66 = 7.74)
is 0.28%.
- This analysis shows that if the null hypothesis were
true, the experimental
data would be extremely unlikely. On this basis we reject the null hypothesis,
and report that the experiment has detected a difference
between formulations, statistically significant at the 0.0028 level.