Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

the coordinate-free approach, Exams of Vector Analysis

, I shall illustrate the general theory in terms of a simple illustration, two-way analysis of variance with one observation per cell. The vector space ...

Typology: Exams

2022/2023

Uploaded on 05/11/2023

stefan18
stefan18 🇺🇸

4.2

(35)

279 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
THE
COORDINATE-FREE
APPROACH
TO
GAUSS-MARKOV
ESTIMATION,
AND
ITS
APPLICATION
TO
MISSING
AND
EXTRA
OBSERVATIONS
WILLIAM
KRUSKAL
UNIVERSITY
OF
CHICAGO
1.
Introduction
and
summary
The
purposes
of
this
paper
are
(1)
to
describe
the
coordinate-free
approach
to
Gauss-Markov
(linear
least
squares)
estimation
in
the
context
of
Model
I
analysis
of
variance
and
(2)
to
discuss,
in
coordinate-free
language,
the
topics
of
missing
observations
and
extra
observations.
It
is
curious
that
the
coordinate-free
approach
to
Gauss-Markov
estimation,
although
known
to
many
statisticians,
has
infrequently
been
discussed
in
the
literature
on
least
squares
and
analysis
of
variance.
The
major
textbooks
in
these
areas
do
not
use
the
coordinate-free
approach,
and
I
know
of
only
a
few
journal
articles
that
deal
with
it
([2],
plus
some
of
the
references
in
Dutch
that
it
lists,
and,
to
some
extent,
[1],
[3]
and
[7]).
The
coordinate-free
viewpoint
is
implicit
in
R.
A.
Fisher's
geometrical
approach
to
sampling
problems.
The
subject
of
missing
observations
in
Model
I
analysis
of
variance
is
well
understood
and
often
discussed.
This
paper
presents
no
new
results
here,
but
it
does
present
a
viewpoint
different
from
that
usually
given.
In
contrast,
the
topic
of
extra
observations,
although
it
was
briefly
considered
by
Gauss
[5],
section
35
of
Theoria
Combinationis
.
.
.
,
has
elicited
hardly
any
papers
since.
(I
know
only
of
papers
by
R.
L.
Plackett
[9]
and
K.
D.
Tocher
[10].)
The
prob-
lem
of
extra
observations
is
important
in
its
own
right
and
also
in
connection
with
the
treatment
of
so-called
outliers.
I
shall
discuss
a
method
of
treating
extra
observations
that
bears
some
resemblance
to
that
for
missing
observations.
In
particular,
it
leads
to
possible
methods
for
treating
apparent
outliers
that
I
described
briefly
in
[8].
There
are
two
major
motivations
for
emphasizing
the
coordinate-free
approach
to
Gauss-Markov
estimation.
First,
it
permits
a
simpler,
more
general,
more
elegant,
and
more
direct
treatment
of
the
general
theory
of
linear
estimation
The
work
leading
to
this
paper
was
supported
in
part
by
the
Logistics
and
Mathematical
Statistics
Branch
of
the
Office
of
Naval
Research
and
in
part
by
the
National
Science
Foundation.
435
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download the coordinate-free approach and more Exams Vector Analysis in PDF only on Docsity!

THE COORDINATE-FREE APPROACH

TO GAUSS-MARKOV ESTIMATION,

AND ITS APPLICATION TO MISSING

AND EXTRA OBSERVATIONS

WILLIAM KRUSKAL

UNIVERSITY OF CHICAGO

  1. Introduction and (^) summary

The purposes of this paper are (1) to describe the coordinate-free approach to Gauss-Markov (linear least squares) estimation in the context of Model I analysis of variance and (2) to discuss, in coordinate-free language, the topics of missing observations and extra observations. It is (^) curious that the (^) coordinate-free approach to Gauss-Markov estimation, although known to many statisticians, has infrequently been discussed in the literature on least squares and analysis of variance. The major textbooks in these areas do not use the coordinate-free (^) approach, and I know of only a few journal

articles that deal with it^ ([2], plus some^ of the references^ in^ Dutch that it^ lists,

and, to some extent, [1], [3] and^ [7]). The coordinate-free viewpoint is^ implicit

in R. A. (^) Fisher's geometrical approach to sampling problems. The subject of missing observations in Model I^ analysis of variance is well understood and often discussed. This paper presents no new results here, but it

does present a viewpoint different from that usually given. In contrast, the

topic of extra observations, although it was briefly considered by Gauss [5],

section 35 of Theoria Combinationis... , has elicited hardly any papers since.

(I know only of papers by R. L. Plackett [9] and K. D. Tocher [10].) The prob-

lem (^) of extra (^) observations is (^) important in its own (^) right and also in connection

with the treatment of so-called outliers. I^ shall discuss a^ method of treating extra

observations that bears some resemblance to that for missing observations. In

particular, it leads to possible methods for treating apparent outliers that I

described briefly in [8].

There are two major motivations for emphasizing the coordinate-free approach

to Gauss-Markov estimation. First, it permits a simpler, more general, more

elegant, and more direct treatment of the general theory of linear estimation

The work leading to this paper was supported in^ part by the^ Logistics and^ Mathematical Statistics Branch of the Office^ of^ Naval Research^ and^ in^ part by the National Science Foundation. 435

436 FOURTH BERKELEY SYMPOSIUM:^ KRUSKAL

than do its notational competitors, the matrix and scalar approaches. Second, it is useful as an introduction to infinite-dimensional spaces, which are important, for example, in the consideration of^ stochastic^ processes. A related point is that more or less^ coordinate-free treatments^ of^ finite-di- mensional vector spaces are now more common than they once were^ and^ are being taught to students at an earlier stage.^ With^ such^ mathematical^ back- ground, a student can learn the theoretical side of^ Model^ I^ analysis^ of^ variance quickly and efficiently. The treatment in this paper will, however, be^ compact and without the motivational material and the many examples that would be

pedagogically important.

Nonetheless, it may be^ useful^ to keep one concrete^ example^ before^ us.^ Ac- cordingly, I shall illustrate the general theory in terms of a simple illustration, two-way analysis of variance with one observation per cell. The vector space viewpoint and notation will mostly^ be^ taken^ from^ P. R. Halmos's text [6]. My own introduction to the coordinate-free approach^ came from discussions with L. J. Savage and I acknowledge my great debt^ to^ him,^ a debt of which coordinate freedom forms but a part.

  1. Gauss-Markov estimation from a coordinate-free viewpoint

We consider a sample point, Y, that ranges over an n-dimensional real vector

space, V, on^ which^ an^ inner product,^ (,^ ),^ is^ given.^ (It^ would also be^ possible^ to

start without a given inner product and to define one^ in^ terms^ of the^ covariance

structure of Y.) Perhaps more basically, Y^ is^ a^ function^ from^ an^ underlying

probability space onto V such that all^ sets in^ the^ underlying^ space^ of form {ej(x, Y(e)) 5 c}, where x E^ V and c is a real number, are^ measurable. Of

course, Y is the abstract entity usually corresponding, in^ a^ particular problem,

to the coordinate vector comprising the set of scalar observations; by not^ writing

in terms of coordinates, that is, by not requiring that a basis be specified, we are

able to present the general theory succinctly.

All first and second moments discussed will be assumed to exist without

further mention.

Clearly, E(x, Y) is a linear functional of x^ E V, and hence there exists^ a

unique member of V, say g, such that E(x, Y) =^ (x, u) for all^ x^ E V.^ Call^ p the

(vector) expectation of^ Y,^ EY;^ this^ quantity^ is^ easily^ articulated^ with the^ vector

expectation in^ coordinate^ form.

Similarly Cov^ [(x, Y), (z, Y)], where^ x,^ z^ E^ V,^ is^ clearly^ a^ quasi-inner^ product

(like an inner product but possibly nonnegative definite). Hence there exists^ a

unique linear transformation 2 on^ V^ such^ that

(2.1) Cov^ [(x,^ Y), (z,^ Y)]^ =^ (x, 2z),^ x,^ z^ E^ V.

It is easily seen that 2 is nonnegative definite and symmetrical with respect to

(, ); that^ is, (x,^ 2x)^2 0 and^ (x,^ 2z)^ =^ (2x,^ z)^ for all^ x,^ z^ E^ V.^ Naturally,

Var (x, Y) = (x, lx). Let us say that Y is weakly spherical if^2 is^ a^ (nonnegative)

438 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

It is often thought desirable to add side conditions so that p, the at, and

the f3j are uniquely determined by the gij (estimable, identifiable). A popular

set of side conditions is E2 ai =^ E (^) ij =^0 for all i, j. Then it follows that p =^ E E Ai./(IJ), that (^) ai =^ (Ej pAi/J) -^ p, and that (^) ,B =^ (i Ai,/I) -^ q.

What we have described is the model for two-way Model I analysis of

variance, with one observation per cell and no interaction. If, in addition, we require joint normality of the (^) Yij, then the (^) Yij are independent. A basic fact is that, under conditions (i) and (ii), the orthogonal projections, PQY of^ Y^ on^02 and^ Y^ -^ PnY =^ QQY^ on^ the^ orthogonal^ complement^ of^ Q,^ are

uncorrelated and^ have^ weakly spherical distributions^ in their^ own^ subspaces^ with

(restricted) covariance transformations^ O2I,^ where^ .2^ is the same as that for Y

itself. To say that AY and BY are uncorrelated is to say that Cov [(x, AY),

(z, BY)] =^0 for all x, z E V. This immediately extends to orthogonal decom-

positions of Y to more than two components. Since Po and Qo are orthogonal

projections, they are idempotent (for example, P0Pa =^ Po) and symmetric [for

example, PQ =^ Po or, equivalently, (x,^ Paz) =^ (Pox,^ z)^ for all^ x,^ z^ (E Q]^ with

respect to (,

If we require normality of Y, that is, if^ we^ require that (x, Y) be normal for

all x E V, then it is almost immediate that (^) ilPoY -^ ;LI2/q2 and (^) IIQUYI12/o2 have

independent chi-square distributions with p =^ dim Q and n -^ p =^ dim e2 de-

grees of freedom respectively. In any case, these quantities have expectations p and n - (^) p. The (^) vector Gauss-Markov estimator of (^) p is (^) PQY and the (^) scalar Gauss-Markov

estimator of a linear functional (x, ,u) of IA is (x, PoY) =^ (Pax, Y). (Com-

mentary on the historical accuracy or inaccuracy of the designation "Gauss-

Markov" appears in the discussion of [10].) These Gauss-Markov estimators are

characterized by the following well-known properties. To avoid trivialities, we

assume that a2 > 0.

(a) PuY is the^ unique linear^ transformation^ of^ Y^ that is^ an^ unbiased^ estimator

of u and leads to minimum variance for^ all^ derived estimators of linear^ func-

tionals. In other words,

(a,) E(PQY)^ =^ ,^ for^ all p^ E^ Q,

(a2) For all x, and for linear transformations D $ Po satisfying E(DY) =

all EC2, it follows that Var (x, (^) PoY) < Var (x, DY).

(b) For all^ x, the^ unique minimum variance^ linear functional^ of^ Y^ that is^ un-

biased for (^) (x, (^) Iu) is (^) (Pox, Y) =^ (x, PoY). (c) For^ all^ x, the^ unique minimum^ variance^ linear^ functional of^ Y^ that^ has bounded mean (^) square error in (^) A is (Pox, Y). (d) The^ unique vector , in^ Q minimizing (^) IIY - ,112 is^ PnY. This is the least squares characterization.

(e) For all x, the^ unique linear functional of^ Y^ whose^ "coefficient^ vector,"

when the functional is expressed in the form (z, Y), lies in a2 and which

estimates (x, p) unbiasedly is (Pax, Y); that is, for w C Q, (w, Y) is the Gauss-

Markov estimator of its expectation (w, (^) A). This characterization often leads to

GAUSS-MARKOV ESTIMATION^439

an easy method of obtaining Gauss-Markov^ estimators^ when^ there is^ high^ sym- metry or when Q is similar to another manifold onto^ which we^ know^ how^ to

project orthogonally.^ For^ if we^ guess^ (Pox, Y) by choosing^ a^ vector^ w^ E^ Q,^ we

need only compute (w, p) to^ see^ if our^ guess^ is^ right^ and, if^ it is^ nearly^ right,^ we can often see immediately how it^ should^ be modified.^ A^ similar^ vector^ character- ization for PnY is^ readily^ written^ down.

(f) When Y is normal, PQY and (x, PQY) are the maximum likelihood esti-

mators of u and (x, p) respectively. Further,^ (x,^ PnY) is^ the^ minimum^ variance

unbiased estimator of (x, p).

Various other characterizations and properties of PaY can be^ stated,^ for^ exam-

ple, in terms of invariance under^ relevant^ linear^ transformations.^ Note that

Gauss-Markov estimation is linear, that is,^ that^ a(x,^ PoY) +^ ,B(z,^ PnY)^ =

(ax (^) + f3z, PQY), the Gauss-Markov estimator of (ax +^ ,z,^ A). Further,^ in^ terms of a fixed basis, the coordinates of PQY are the Gauss-Markov estimators of the respective coordinates^ of u. The^ conventional^ (unbiased) estimator^ of^ a2^ is IIQnYII2/(n -^ p).

Turn now to our example, with^ the^ description^ of Q in^ the^ form^ Ai,^ =

is + ai + fj, where^ Lcai =^ L fj =^ 0.^ Let^ a^ bar denote^ simple^ averaging

with respect to dotted subscripts and consider^ linear^ functionals^ of^ Y^ as follows: Y.., Yr. -^ Y.., and (^) Y.j-^ Y...^ Note that there are^ I^ of^ the^ second kind and J of the third. The coefficient vectors of^ these^ functionals^ are easily seen to be^ in^ R.^ Hence the^ functionals^ are^ the^ Gauss-Markov^ esti-

mators of their own expectations, Fl, as, and^ ,.^ Hence^ PQY^ is^ the^ coordinate

vector with (i,j) component Y.. + (Yi. -^ Y..) + (Y.3^ -^ Y..)^ =^ Yri + .j-^ V...^ There^ is^ a^ standard^ orthogonal^ decomposition^ of^ PaY^ into three vectors corresponding to^ "over-all^ mean,"^ "row effects,"^ and^ "column effects," which I^ do not discuss here. The^ dimension of Q2 is^ readily^ seen^ to be I + J- 1. If w^ is a q-dimensional linear manifold within^ Q2, then^ the^ standard^ F^ statistic for the null hypothesis ,u E X against all alternatives is

( p q)'1( IQ.YI 2--^ IQYI12) (p -q)-1l IPQ__^ Yl^2 (2.3) (^) -(n-p)-111QUYI12 (n (^) -p)- Q YI (^12)

with large values^ critical.^ Here^ Qs,Y^ =^ Y^ -^ PY,^ and^ Q -^ w^ is the^ orthogonal

complement of w^ with^ respect to^ U.^ This test^ statistic^ has, under the^ null^ hy-

pothesis, the central F distribution with^ p -^ q and^ n^ -^ p degrees of^ freedom.

The geometrical interpretation of^ the F statistic is well^ known.

The purpose of this paper is^ to^ discuss^ estimation,^ not^ testing.^ Hence,^ except for a few^ scattered remarks, we^ leave^ testing^ with the above brief^ paragraph.^ A good bit of the standard^ literature^ on^ Gauss-Markov^ estimation and Model^ I

analysis of^ variance^ may be^ interpreted^ in^ terms^ of^ forming^ and^ combining

orthogonal projections, especially^ when^ Q^ is^ regarded^ as^ the direct sum of several

linear manifolds with interesting statistical^ meanings.

GAUSS-MARKOV ESTIMATION 441

ple, the sickly piglet is probably (^) more likely to die accidentally before the end

of the experiment than is his robust brother. In such a case we are dealing with a

complicated selection process.

I (^) do not intend to discuss these (^) important problems further here. Instead we

shall assume that the experimental design is for Y1 from the start and that,

if observations are (^) missing by chance, it is meaningful to carry out statistical analyses conditionally on the "observed" V,.

Some notation is needed. Let Qi =^ PvU for i =^ 1, 2. Let

(3.1) i^ =^ Pvjs =^ EYi, i = (^) 1, 2, so that (^) %1 1 22 and A' (^1) A2. In (^) general, Q id Q1 + Q2, although Q C Q1 + 22.

We make the basic assumption that

(3.2) dim Q =^ dim Q1,

and we ask how to find (^) Pn1Y1, or its (^) equivalent, in (^) terms of Po, which we sup-

pose known explicitly. (In the following, I permit myself the customary ambi-

guity of using "u1," say, both to denote the unknown true (^) A' =^ EY' and to

serve as a running variable over 01.) Since dim Q = dim Q12, to each gl E Q

there corresponds a unique u2 (^) E Q2, say (^) AMl', such that (^) (I + (^) A)A1 EQ, Pp,(I +^ A)A1 =^ M',^ and^ P-,(I^ +^ A)A1 =^ AA1. In^ short, to^ each (^) A', there corre- sponds a (^) unique uCE with (^) Po,lA = (^) A'; A,.I' is that (^) A minus (^) AI&. For (^) completeness,

take Ax =^0 when x I Q, so that A is a well-defined transformation. It is

readily shown to be linear.

Instead of seeking Po,Y1 = al as such, it is equivalent and more convenient to

seek its analogue (I + A)PoY'l = in R. Note that, if we know 42 =^ APo,Y1, we could easily obtain (^) a since (3.3) A=Pg(yl+42)

and since we know Po explicitly. The proof is simple, for

(3.4) Pa(Y' + 42) =^ PQ(Po1Yl + 42) + p0(Y'- (^) PflYl) = (^) P0(Al + (^) a2) = (^) A,

where the last expression on the (^) right of (^) the first (^) line is (^) zero since Y' - (^) PQY' is obviously orthogonal to both (^) &21 and (^) % and hence to (^) Q.

Next, note^ that^42 can^ be obtained via the following consistency condition

which amounts, in any specific case, to a set of simultaneous linear equations

in dim (^) f2 scalar (^) unknowns:

(3.5) 42 =^ PlhPa(yl + A2).

This condition has great intuitive appeal for it says that 42 is that element of 2

which, when added to (^) Y', followed (^) by orthogonal projection on a and then on (Q2, gives us 42 back again. It is (^) easy to see that (^) (3.5) holds, for 42 = (^) P%A,

and we need only substitute for a from (3.3). That 42 is determined uniquely

by (3.5) may be seen thus. Suppose that u^ and u* both are in Q and satisfy (3.5).

442 FOURTH BERKELEY (^) SYMPOSIUM: KRUSKAL

Then u -^ = (^) Ph2P12(u - ut). But (^) IIu - (^) u1I > (^) IIPho,1(u - (^) u)II unless both

u-u* zEl and PQ(u^ -^ u) E Q2. Hence Pn(u -^ u) is in both Q and Q2; it is

therefore zero and hence u =^ u*.

The use of (3.5) in practice is simple for the kind of (^) application envisaged

since Pe is known explicitly and Po2, presents little difficulty if dim Q2 is small.

Let us turn to our example and suppose that YrJ is missing, that is, that

Y' has components like Y (^) except that (^) the (I, J) component is zero. Further, (^) p' has (i, j) components (^) A + ai + (^) 3j, as before, except that the (I, J) component is zero; IA2^ has all components zero except the (I, J) one which is (^) A + (^) ar + 13j. It is readily checked that dim Q = dim QN = I (^) + J - 1.

To find ,22^ is to find ari, which I shall call y for brevity. Then

P02P12[Y' + ,2] has all^ components zero^ but^ the^ (I, J) (^) one, which we know from the previous discussion is

(3.6) YIo^ +^ Y^ +^ YCJ +Y^ YOO^ +^ Y,

where a circled dot subscript means summation over that subscript for all

possible nonmissing observations. The basic identity (3.5) here says that

the above displayed quantity is y. Hence, solving the resulting linear equa-

tion,

(3.7) = 11 1(YJ0 +

yo (*) (^) Y (^) ~~I_ (^) I J _ (^) (J + (^) I - (^) IJ00o).

As a check, we may compute the expectation of this (^) quantity, a + (^) ar + 13g. Having found y =^ ,Ai, by (3.3) we (^) may use this value to "complete" Y' and apply (^) Po to the (^) completion, thus getting ut, (^) di, and (^) O,. It is straight- forward to (^) write down explicit descriptions of these quantities. Other methods of treating missing observations are (i) minimization of the quadratic form (^) IIQ,2[Y' + 122112 in (^) 4A, which leads (^) to (3.5) again, and (^) (ii) the use of traditional covariance analysis with dummy covariate vectors, each

having a coordinate one for the missing observation to which it corresponds and

zero coordinates elsewhere. (This last (^) method requires caution and modification if dim Q2 < dim V2, that is, if the expectations of the missing observations are linearly related.) The covariance (^) method is based on the easily shown identity

Po+1,2y1 =^ Po,Yy.

The major discussion of (^) this section, in which the normal equations (3.5) for 42 are obtained directly, may be regarded as the coordinate-free analogue of a

paper by Wilkinson [11].

If we are concerned with F testing and want to use the approach of this sec-

tion toward (^) missing observations, then we require dim X = (^) dim Pv1co as well as (3.2). Let (^) w, =^ Pvco. (^) Equation (3.5) must be worked out separately for (^) 11Qg1YI and (^) IQ=,YI12 in (^) (2.3). A (^) suggestion first made, I believe, by Yates is to approxi- mate the F statistic by (^) solving (3.5) for 42 under 01 only and using this quan-

444 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

estimate Pu,u from Y', as it seems clear that y2 can tell us nothing about

Pru, while the last term estimates PQ, -uji from the full Y. Then the two vector

estimators of orthogonal quantities are added. It is often easier to work in terms of estimating linear functionals (x, ji) of (^) A

and, in^ many cases, x^ is already in^ Q1. Then the Gauss-Markov estimator of

(x,,u) is (4.4) (x, P,PnY) =^ (Pnx, Y) = (^) (X, Y) -^ (P12,-ux, Y') + (P(I+A)(c0-U)X, Y),

for x^ E Q1. If, in^ addition, dim Q2 =^ 1, then (4.4) may be simplified further, as follows.

Suppose that U, - U is spanned by z (and %2 is spanned by Az). Straightforward

computation gives

(4.5)

(x, PsnP%Y) -^ (x, Y') - (x, z)(z, Y')^ + (x, z)[(z, Y') + (Az, 12)] (X, PRY1Y) =^ (X Y') (^) ll )-11 l(+ (^) IIA l12 +1l IAzll Z l)-A,F (X,Y) x,z)^ F14AzJJ^ (z,^ Y')^ -^ (Az,Y llz1l2 +^ IlAzl1II2 zI

for x E Q1 and z spanning Q2 - U. In applications, z is usually obtained easily

as the coefficient vector of Y'^ for the Gauss-Markov estimator of the expecta-

tion of the single nonzero coordinate of ,u2, this for V1 and Q1, that is, for no

extra (^) observation. let us illustrate the use of (4.3) to (4.5) in our example, supposing that

there is an extra observation YIJ2 in the (I, J) cell. Now V1 is IJ-dimensional

and V is (IJ + l)-dimensional. Let Q1 have the same coordinates as before

except that the new (IJ + 1)st coordinate is zero. Let Q22 =^ V2 have all

coordinates zero except the (IJ + 1)st. Then Au for u E (^) Q1 is that vector in (^) %2 whose (IJ + I)st coordinate is the same as the (I, J) coordinate of u. Hence to (^) say that Au =^0 is to (^) say that (^) p(u) + ai(u) + ,3j(u) =^ 0. (^) (Here I have kept u in the expression as an argument to emphasize that ,i, ar,

and /3J are functionals of u.) Hence U is defined by (x, u) = 0 where x has

all coordinates zero except that the (I, J) coordinate is one. Project x on Q2,

orthogonally to obtain the vector in Q1 with (i, j) coordinate

(4.6) J-lai + I-'&1J-^ (IJ)-

where the 6's are Kronecker deltas. This vector z spans Q1-^ U, which is

one-dimensional. The transformation I + A, applied to z, takes it into

another vector, which is the same except that the (IJ + 1)st coordinate 0

of z becomes J-1 + I-' -^ (IJ)-1 = (^) X, say, equal to the common (I, J) coordinate. It^ is readily computed that (^) 11zI12 =^ X and 11(I + (^) A)zJ12 = X(1 +^ X). Note that z could also be obtained as follows: the Gauss-Markov esti-

GAUSS-MARKOV ESTIMATION 445

mator, with no extra observation, of p + al + flr is Y,.^ +^ F.J-^ Y...^ This

linear functional of Y' bas z as its coefficient vector. Suppose that we^ want to^ find^ the^ Gauss-Markov^ estimator^ of^ a,. We

know that F1.^ -^ Y.. is that estimator^ if^ there^ were no^ extra^ observation,

hence we may conveniently write^ a1^ as (x,^ IA), where^ x^ C 01 is the^ vector

with (i, j) coordinate J-'6il -^ (IJ)-1 and^ with^ (IJ^ +^ 1)st^ coordinate^ zero.

Compute directly that (x, Y1) =Y1.-Y.., (X, z) = -^ (IJ)

41z112 +^ IIAzJI2^ =^ I1(I^ +^ A)zJ12^ =^ X(1^ +^ X),

. IAzI 2 = x (Z, Y') =^ YI.^ +^ Y..J^ -^ Y..,

(Az, Y2)^ =^ XYLT2.

Hence the Gauss-Markov estimator^ of^ ca1^ is

(4.8) F1.^ -^ ..^ + [(1 + X)IJ]-'[(F1. +^ Y.J-^ ..)^ -YIJ2] Note that the last term in square brackets is the difference between^ the Gauss-Markov estimators^ of^ p +^ ax^ +^ ,BJ from^ Y'^ and^ Y2^ alone,^ respec-

tively. Call this quantity A^ and^ observe^ that^ EA = 0.^ Since^ (1^ +^ X)IJ^ =

IJ + I + J -^ 1, we see that

(4.9) 1=,^ Y1.-Y^ +^ A

A simple interchange of indexes gives us &2, * *, c1-1; 3,I^ * *,I Simi-

lar computations provide^ dx,^ I4J, and^ ju.^ The final results^ may be^ summarized as follows: A

airi. ~ (^) A(DI5r- 1) IJ (^) + I + (^) J- (4.10)

:~~~~Y. Wai(J8JJ-1)) p i .jY--I (^) + (^) I +J-

thus permitting explicit expression of PnlPnY as PaiYl plus a "correc-

tion" term.

Note an apparent lack of symmetry between YIJ and^ YIJ2^ in the above

expressions. That this is only apparent may be^ seen^ by^ computing the^ co-

efficient of Yr., in^ ,A + &j + ,i,; it is the^ same^ as^ that of^ YLF2.

As with missing observations, the^ requisite manipulations should be carried

out twice^ for F^ testing purposes, once^ for each of the^ two^ basic^ sums of squares.

GAUSS-MARKOV ESTIMATION 447

Note that P%1Y' is easily obtained by the missing observation technique, since

dim 21 =^ 1. In fact, A1PoYl, the Gauss-Markov estimator from YP alone

of EY21, is (w, (^) Y')sj since w £ 01 and E(w, Y') = (^) (w, M1) = M21. Further,

Aiw = IlwII2si and Aw = IIwlI2s.

The relationship (w, x) =^ 0 for x (^) C Q1 determines U. Hence, w spans the one- dimensional UN^ -^ U.^ Also,^ (I^ +^ A)w^ =^ w^ +^ jjwll2s^ spans^ (I^ +^ A)(Q1-^ U). Hence, PQ1Y'= Pn1P(I+Ai)9,[Y' +^ (W, YP)sA],

(5.3) P-uY = ( Y^ W,

P(I+A)(Q,-U)Y =^ (w,^ YI)^ +^ IIWI12(s4^ Y2)^ WI^ 2;],

H~wI12 +. 11 I1131 [w^ H-

and from (4.3) or (4.2)

(5.4) P(I+A,)1I%PUY =^ (I + A1)PS2,PQ1Y

= P(I+A1)Sh[Y' + (W, Y')S1]

-^ (w Y1)

W112 [w^ +^ I!wlI2sd]

+(w, Y)^ +^ WIIw2(S, Y2)^ (W I I12S)

IW-2 IIWHIIsIIIS

since s I (^) f1. Hence the desired y21 E Q21, if it exists, must satisfy

(s, Y2) -^ HIs1I2(W, Y') (W -I^ IIWII2Si).

(5.5) P(I+A.)P1Y2' = (W, Y')P(I+A0)s,S1 + 1 + 11s11211WI

Since y21 may be written bs1 and since (I + A1)Q4 may be regarded as the

direct sum of U and the manifold spanned by w + (^) IWI 12S1, (5.5) may be written

as the following scalar equation, omitting the common vector w + IIw I2s,.

(5.6) b + 1 ll_1^ (w,^ y1)1181112 + (s,^ Y2)^ -^ 1s 112(w, Y')

+ llwW12l1slI2 - 1 +- WII21lS112 1 +^ lIS1l2llwll

or, simplifying,

(5.7) b - 11H21 2(W, Yl) +^ flIS12(1 +^ flWjj2jsll2)[(, y2)/11S112]

11k1H2(1 +^ IWll2Il12)

Observe that this is a weighted average of (w, Y1), the si coordinate of the

missing observation vector estimator from Y', and (s, Y2)/I1s112, the estimator

of the same quantity from Y2. The weights are proportional to^ -11s2II2 and

11s112(1 +^ IIwI1211si12) respectively.

In the usual applications, with^ coordinates for^ an^ orthonormal^ basis, II 2 = m

and Ist1112 =^1 so that the^ weights become, respectively,

(5.8) 1-m,WmI1I2WI12.+

448 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

Further, (^) llwli2 is (^) l/l-2 times the variance of the replicated observation's (^) expec- tation, as estimated from Y'. So we may summarize the result as follows, for

the usual applications.

Let y be the Gauss-Markov estimator of the replicated observation's expecta- tion from Y1 alone. Let Var y = 02o2. Let y be the Gauss-Markov estimator of the same quantity from Y2 alone. (It will be the arithmetic average of the replicated observations.)

Let y be the weighted average of yand y with weights proportional to (1 - m)

and m(l + 02) respectively. For Gauss-Markov estimation based on Y, treat y

as if it were a single observation replacing all the replicated ones. Note that this

is essentially a prescription for finding Gauss-Markov estimators when weak

symmetry does^ not^ hold but rather the^ following conditions hold:

(i) the^ observations are^ uncorrelated;

(ii) all the observations have variance o-2, except that one (y) has variance

a2/m where m is known.

In the special case m =^1 (original design), the weights are 0 and 1, as they

should be. In the special case m = 0, which, strictly speaking, is not covered by

the above analysis but can be handled by similar (^) methods, the (^) weights are 1

and 0, again as they should be.

Let us apply this to our example. Here, V may be expressed as the

(IJ + m - l)-dimensional space (^) {Yi, for i = (^) 1, * *, I; j = (^) 1, (^) *, J; (i, (^) j).^ (I, J); Y1J1; YIJ2, * * *, YlJ,j}. The^ three groups of coordinates cor- responding to V1, V21, and V22 are separated by semicolons. The U's are given via EY (^) ,j= (^) p + aci + f3and EYIjj = (^) , + XrI + (^) oj for I = (^) 1, (^) *--, m. We

may take si as having all zero coordinates, except that the (I, J, 1) coordi-

nate is one; similarly, E2 has all zero coordinates, except that the last m -^1

coordinates are one.

Now w is given by the coefficient vector for y toward the end of section 3,

that is, w is the vector in Q1 with (i, j) coordinate, when (i, j) F6 (I, J),

[I(I -^ 1)][J/(J -^ 1) [J-'i5i + (^) I-16,jJ -^ (IJ)-1]; of course, w has zero (IJI) coordinates.^ Further, (w, Y'), the estimator of p + (^) a, + (^) #3T from Y' (^) alone, is (^) [I/(I - (^) 1)][J/(J - (^) 1)][J-1YI0 + I-YoJ - (^) (IJ)-1YoEa]- Next, observe^ that (s, Y2)/11s112 =^ Elm Y1I/m =^ YI. We compute that (^) 11w112 =^ [I/(I -^ 1)][J/(J- 1)]X(1 - ) = (I + J - 1)/[(I -1)(J - 1)] so that the weights are 1 - m and mIJ/[(I -^ 1)(J -1)] respectively. Hence

(5.9) b^ =^ (1^ m)IJrJ(1Y10 +^ I'JY0J -^ (IJ)'Y00] +^ IJ ElKuz

(I -^ 1)(J -^ 1) + m(I +J -^ 1)

and we need only use this in place of all the (^) YrJI and apply the symmetrical explicit formulas of ordinary Gauss-Markov estimation to the resulting IJ-fold balanced (^) array.

It can be checked that, in the special case m =^ 2, this leads to the same

results as in the last section.

450 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

spanned by a^ single vector that^ is set up to^ correspond^ with^ si, that^ is,^ EY21 = (w, (^) ul)s1i and EY22 =^ (w, (^) /A)S2.

If we could observe Yl + Y21 + Y22, orthogonal projection on its Q would

be easy for we would have a k-fold replication of a design for which we have the Gauss-Markov estimators. Although we do not have Y22, we can ask whether there is a y2 E (^) Q2 such that (6.2) P(I+Ai)n1PQ(Y1 +^ y2) =^ P(I+A.)n1(Y' +^ Y21) where A1,.' =^ -P=^ (w, (^) A')sl. In this case,^ with^ the orthonormal coordinates^ we have in^ mind, (^) 1Isi112 =^ m and (^) 1IS112 =^ k.

The estimator of EY2 from YP alone is (w, Y')s and from Y21 alone it is

[(SI, (^) Y2l)/l11siI12]S. We^ can^ write^ Q^ as^ the direct^ sum^ of^ orthogonal subspaces,

U + (I + A) (1-^ U), just as before and express y2 in the form bs. Carrying

out the^ operations indicated^ by (6.2), we^ obtain

(6.3) a =^ g1S2^12 (6.3) a = ~~~~~11s112(l +^ IIWI1211Si112) Then, putting inlls^12 =^ m^ along with^ IS2l 12 =^ k -m andflS^112 =^ k,^ a^ legitimate

operation for orthonormal coordinates, we finally obtain exactly (6.1) again. The

final paragraph of the discussion of the case m _ k holds verbatim.

6.3. Summary of the preceding work of this section. We may summarize as

follows. (^) Suppose that (^) Po[Y' + Y2] is known explicitly, but that Y = yl + Y is (^) not weakly spherical. Suppose further (i) that Y' ranges over an (n - 1)-

dimensional space and in that space has the covariance transformation .21, and

(ii) that Y2^ ranges over a one-dimensional space orthogonal to that of Y'^ and in

that space has the covariance transformation r2VI, where r2 is known [and

rational]. Let y be the Gauss-Markov^ estimator,^ based^ on^ Y'^ alone,^ of^ the coordinate

of EY2 with^ respect to^ a^ unit^ vector t^ in the^ space of^ Y2.^ Let^ Var^9 =^ 02.2.

Let (^) y be the (^) coordinate of Y2 with respect to the same unit vector.

Let y be the weighted average of 9 and y with weights proportional to 1 - 2

and T-2(1 + 02) respectively.

Then P2[Yl + yt] is the Gauss-Markov estimator of EY.

The special cases 72 = 1 (original design) and '2 = oo^ (Y2 is missing)

work out as they should. The special case T2 = 0 (corresponding to m = 00 or

EY2 known a priori) may be useful in some circumstances. The bracketed

words, "and rational," above may be omitted by continuity of Gauss-Markov

estimation in^ I.

A related discussion is given by Gauss^ [5], section^36 of Theoria^ Combina-

tionis....

  1. (^) Application to apparent outliers

One approach to apparently outlying observations is to apply some criterion

and then either decide that the suspect observation is not an outlier and handle

GAUSS-MARKOV ESTIMATION^451 it in the usual way for analysis, or else decide that the observation is an outlier and omit it completely from analysis. One might, however, consider intermediate positions in which a suspect ob- servation is treated with a lower weight than the rest, that is, has an imputed variance higher than that of the other observations. To completely omit the observation (^) is, in effect, to give it an infinite variance, but why go that far? If there is only one such suspect^ observation,^ the^ method^ described above permits the relatively simple incorporation of the^ suspect observation into Gauss- Markov estimation, provided that the ratio of its imputed variance to^ the variance of the other observations is given.

REFERENCES [1] M. S. BARTLETT, "The vector representation of a sample," Proc. Camnbridge Philos. Soc., Vol. 30 (1933-34), pp.327-340. [2] L. C. A. CORSTEN, "Vectors, a tool in statistical regression theory," Meded. Landbouwhoge- school (^) Wageningen, Vol. 58 (1958), pp. 1-92. [3] J. DURBIN^ and^ M.^ G.^ KENDALL, "The^ geometry of^ estimation," Biometrika, Vol.^38 (1951), pp. 150-158. [4] D. A. S. FRASER, "On the combining of interblock and^ intrablock^ estimates,"^ Ann.^ Math. Statist., Vol. 28 (1957), pp. 814-816. [5] C.-F. GAUSS, Methode des Moindres Carres, Paris, Mallet-Bachelier, 1855; translation^ into French by J. Bertrand of Gauss's works on least squares. (A translation into English by H. F. Trotter is Technical Report No.5, Statistical Techniques Research Group, Princeton, 1957.) [6] P.^ R.^ HALMOS,^ Finite-Dimensional^ Vector^ Spaces,^ Princeton,^ Van^ Nostrand^ Press,^1958 (2nd ed.). [7] A. N.^ KOLMOGOROV, "On the^ proof^ of the^ method^ of least^ squares,"^ Uspehi^ Mat.^ Nauk, Vol. 1 (1946), pp. 57-70.^ (In Russian.) [8] W.^ KRUSKAL, "Discussion^ of the^ papers of^ Messrs.^ Anscombe^ and^ Daniel,"^ Technometrics, Vol. 2 (1960), pp. 157-166. (Pages indicated also include discussion^ by T.^ S.^ Ferguson, J. W. Tukey, and E. J. Gumbel.) [9] R. L. PLACKETT, "Some theorems in least squares," Biometrika, Vol. 37^ (1950), pp. 149-157. [10] K. D. TOCHER, "The design and analysis of block experiments," J.^ Roy. Statist.^ Soc., Ser. B, Vol. 14 (1952), pp. 45-91. Discussion, pp. 91-100. [11] G. N.^ WILKINSON, "Estimation^ of^ missing^ values^ for^ the^ analysis^ of^ incomplete^ data," Biometrics, Vol.^14 (1958), pp.^ 257-286. [12] M.^ A.^ WOODBURY, "Modified matrix functions and^ applications,"^ unpublished^ paper presented at^ the^ IBM^ Seminar for^ Directors^ of^ Computing Centers,^ June^ 26,^ 1959.