Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

the coordinate-free approach, Exams of Vector Analysis

Notre Dame College (NDC)Vector Analysis

, I shall illustrate the general theory in terms of a simple illustration, two-way analysis of variance with one observation per cell. The vector space ...

Typology: Exams

2022/2023

Uploaded on 05/11/2023

stefan18 🇺🇸

4.2

(35)

279 documents

1 / 17

This page cannot be seen from the preview

Don't miss anything!

THE

COORDINATE-FREE

APPROACH

TO

GAUSS-MARKOV

ESTIMATION,

AND

ITS

APPLICATION

TO

MISSING

AND

EXTRA

OBSERVATIONS

WILLIAM

KRUSKAL

UNIVERSITY

OF

CHICAGO

1.

Introduction

and

summary

The

purposes

of

this

paper

are

(1)

to

describe

the

coordinate-free

approach

to

Gauss-Markov

(linear

least

squares)

estimation

in

the

context

of

Model

I

analysis

of

variance

and

(2)

to

discuss,

in

coordinate-free

language,

the

topics

of

missing

observations

and

extra

observations.

It

is

curious

that

the

coordinate-free

approach

to

Gauss-Markov

estimation,

although

known

to

many

statisticians,

has

infrequently

been

discussed

in

the

literature

on

least

squares

and

analysis

of

variance.

The

major

textbooks

in

these

areas

do

not

use

the

coordinate-free

approach,

and

I

know

of

only

a

few

journal

articles

that

deal

with

it

([2],

plus

some

of

the

references

in

Dutch

that

it

lists,

and,

to

some

extent,

[1],

[3]

and

[7]).

The

coordinate-free

viewpoint

is

implicit

in

R.

A.

Fisher's

geometrical

approach

to

sampling

problems.

The

subject

of

missing

observations

in

Model

I

analysis

of

variance

is

well

understood

and

often

discussed.

This

paper

presents

no

new

results

here,

but

it

does

present

a

viewpoint

different

from

that

usually

given.

In

contrast,

the

topic

of

extra

observations,

although

it

was

briefly

considered

by

Gauss

[5],

section

35

of

Theoria

Combinationis

.

,

has

elicited

hardly

any

papers

since.

(I

know

only

of

papers

by

R.

L.

Plackett

[9]

and

K.

D.

Tocher

[10].)

The

prob-

lem

of

extra

observations

is

important

in

its

own

right

and

also

in

connection

with

the

treatment

of

so-called

outliers.

I

shall

discuss

a

method

of

treating

extra

observations

that

bears

some

resemblance

to

that

for

missing

observations.

In

particular,

it

leads

to

possible

methods

for

treating

apparent

outliers

that

I

described

briefly

in

[8].

There

are

two

major

motivations

for

emphasizing

the

coordinate-free

approach

to

Gauss-Markov

estimation.

First,

it

permits

a

simpler,

more

general,

more

elegant,

and

more

direct

treatment

of

the

general

theory

of

linear

estimation

The

work

leading

to

this

paper

was

supported

in

part

by

the

Logistics

and

Mathematical

Statistics

Branch

of

the

Office

of

Naval

Research

and

in

part

by

the

National

Science

Foundation.

435

Partial preview of the text

Download the coordinate-free approach and more Exams Vector Analysis in PDF only on Docsity!

THE COORDINATE-FREE APPROACH

TO GAUSS-MARKOV ESTIMATION,

AND ITS APPLICATION TO MISSING

AND EXTRA OBSERVATIONS

WILLIAM KRUSKAL

UNIVERSITY OF CHICAGO

Introduction and (^) summary

The purposes of this paper are (1) to describe the coordinate-free approach to Gauss-Markov (linear least squares) estimation in the context of Model I analysis of variance and (2) to discuss, in coordinate-free language, the topics of missing observations and extra observations. It is (^) curious that the (^) coordinate-free approach to Gauss-Markov estimation, although known to many statisticians, has infrequently been discussed in the literature on least squares and analysis of variance. The major textbooks in these areas do not use the coordinate-free (^) approach, and I know of only a few journal

articles that deal with it^ ([2], plus some^ of the references^ in^ Dutch that it^ lists,

and, to some extent, [1], [3] and^ [7]). The coordinate-free viewpoint is^ implicit

in R. A. (^) Fisher's geometrical approach to sampling problems. The subject of missing observations in Model I^ analysis of variance is well understood and often discussed. This paper presents no new results here, but it

does present a viewpoint different from that usually given. In contrast, the

topic of extra observations, although it was briefly considered by Gauss [5],

section 35 of Theoria Combinationis... , has elicited hardly any papers since.

(I know only of papers by R. L. Plackett [9] and K. D. Tocher [10].) The prob-

lem (^) of extra (^) observations is (^) important in its own (^) right and also in connection

with the treatment of so-called outliers. I^ shall discuss a^ method of treating extra

observations that bears some resemblance to that for missing observations. In

particular, it leads to possible methods for treating apparent outliers that I

described briefly in [8].

There are two major motivations for emphasizing the coordinate-free approach

to Gauss-Markov estimation. First, it permits a simpler, more general, more

elegant, and more direct treatment of the general theory of linear estimation

The work leading to this paper was supported in^ part by the^ Logistics and^ Mathematical Statistics Branch of the Office^ of^ Naval Research^ and^ in^ part by the National Science Foundation. 435

436 FOURTH BERKELEY SYMPOSIUM:^ KRUSKAL

than do its notational competitors, the matrix and scalar approaches. Second, it is useful as an introduction to infinite-dimensional spaces, which are important, for example, in the consideration of^ stochastic^ processes. A related point is that more or less^ coordinate-free treatments^ of^ finite-di- mensional vector spaces are now more common than they once were^ and^ are being taught to students at an earlier stage.^ With^ such^ mathematical^ back- ground, a student can learn the theoretical side of^ Model^ I^ analysis^ of^ variance quickly and efficiently. The treatment in this paper will, however, be^ compact and without the motivational material and the many examples that would be

pedagogically important.

Nonetheless, it may be^ useful^ to keep one concrete^ example^ before^ us.^ Ac- cordingly, I shall illustrate the general theory in terms of a simple illustration, two-way analysis of variance with one observation per cell. The vector space viewpoint and notation will mostly^ be^ taken^ from^ P. R. Halmos's text [6]. My own introduction to the coordinate-free approach^ came from discussions with L. J. Savage and I acknowledge my great debt^ to^ him,^ a debt of which coordinate freedom forms but a part.

Gauss-Markov estimation from a coordinate-free viewpoint

We consider a sample point, Y, that ranges over an n-dimensional real vector

space, V, on^ which^ an^ inner product,^ (,^ ),^ is^ given.^ (It^ would also be^ possible^ to

start without a given inner product and to define one^ in^ terms^ of the^ covariance

structure of Y.) Perhaps more basically, Y^ is^ a^ function^ from^ an^ underlying

probability space onto V such that all^ sets in^ the^ underlying^ space^ of form {ej(x, Y(e)) 5 c}, where x E^ V and c is a real number, are^ measurable. Of

course, Y is the abstract entity usually corresponding, in^ a^ particular problem,

to the coordinate vector comprising the set of scalar observations; by not^ writing

in terms of coordinates, that is, by not requiring that a basis be specified, we are

able to present the general theory succinctly.

All first and second moments discussed will be assumed to exist without

further mention.

Clearly, E(x, Y) is a linear functional of x^ E V, and hence there exists^ a

unique member of V, say g, such that E(x, Y) =^ (x, u) for all^ x^ E V.^ Call^ p the

(vector) expectation of^ Y,^ EY;^ this^ quantity^ is^ easily^ articulated^ with the^ vector

expectation in^ coordinate^ form.

Similarly Cov^ [(x, Y), (z, Y)], where^ x,^ z^ E^ V,^ is^ clearly^ a^ quasi-inner^ product

(like an inner product but possibly nonnegative definite). Hence there exists^ a

unique linear transformation 2 on^ V^ such^ that

(2.1) Cov^ [(x,^ Y), (z,^ Y)]^ =^ (x, 2z),^ x,^ z^ E^ V.

It is easily seen that 2 is nonnegative definite and symmetrical with respect to

(, ); that^ is, (x,^ 2x)^2 0 and^ (x,^ 2z)^ =^ (2x,^ z)^ for all^ x,^ z^ E^ V.^ Naturally,

Var (x, Y) = (x, lx). Let us say that Y is weakly spherical if^2 is^ a^ (nonnegative)

438 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

It is often thought desirable to add side conditions so that p, the at, and

the f3j are uniquely determined by the gij (estimable, identifiable). A popular

set of side conditions is E2 ai =^ E (^) ij =^0 for all i, j. Then it follows that p =^ E E Ai./(IJ), that (^) ai =^ (Ej pAi/J) -^ p, and that (^) ,B =^ (i Ai,/I) -^ q.

What we have described is the model for two-way Model I analysis of

variance, with one observation per cell and no interaction. If, in addition, we require joint normality of the (^) Yij, then the (^) Yij are independent. A basic fact is that, under conditions (i) and (ii), the orthogonal projections, PQY of^ Y^ on^02 and^ Y^ -^ PnY =^ QQY^ on^ the^ orthogonal^ complement^ of^ Q,^ are

uncorrelated and^ have^ weakly spherical distributions^ in their^ own^ subspaces^ with

(restricted) covariance transformations^ O2I,^ where^ .2^ is the same as that for Y

itself. To say that AY and BY are uncorrelated is to say that Cov [(x, AY),

(z, BY)] =^0 for all x, z E V. This immediately extends to orthogonal decom-

positions of Y to more than two components. Since Po and Qo are orthogonal

projections, they are idempotent (for example, P0Pa =^ Po) and symmetric [for

example, PQ =^ Po or, equivalently, (x,^ Paz) =^ (Pox,^ z)^ for all^ x,^ z^ (E Q]^ with

respect to (,

If we require normality of Y, that is, if^ we^ require that (x, Y) be normal for

all x E V, then it is almost immediate that (^) ilPoY -^ ;LI2/q2 and (^) IIQUYI12/o2 have

independent chi-square distributions with p =^ dim Q and n -^ p =^ dim e2 de-

grees of freedom respectively. In any case, these quantities have expectations p and n - (^) p. The (^) vector Gauss-Markov estimator of (^) p is (^) PQY and the (^) scalar Gauss-Markov

estimator of a linear functional (x, ,u) of IA is (x, PoY) =^ (Pax, Y). (Com-

mentary on the historical accuracy or inaccuracy of the designation "Gauss-

Markov" appears in the discussion of [10].) These Gauss-Markov estimators are

characterized by the following well-known properties. To avoid trivialities, we

assume that a2 > 0.

(a) PuY is the^ unique linear^ transformation^ of^ Y^ that is^ an^ unbiased^ estimator

of u and leads to minimum variance for^ all^ derived estimators of linear^ func-

tionals. In other words,

(a,) E(PQY)^ =^ ,^ for^ all p^ E^ Q,

(a2) For all x, and for linear transformations D $ Po satisfying E(DY) =

all EC2, it follows that Var (x, (^) PoY) < Var (x, DY).

(b) For all^ x, the^ unique minimum variance^ linear functional^ of^ Y^ that is^ un-

biased for (^) (x, (^) Iu) is (^) (Pox, Y) =^ (x, PoY). (c) For^ all^ x, the^ unique minimum^ variance^ linear^ functional of^ Y^ that^ has bounded mean (^) square error in (^) A is (Pox, Y). (d) The^ unique vector , in^ Q minimizing (^) IIY - ,112 is^ PnY. This is the least squares characterization.

(e) For all x, the^ unique linear functional of^ Y^ whose^ "coefficient^ vector,"

when the functional is expressed in the form (z, Y), lies in a2 and which

estimates (x, p) unbiasedly is (Pax, Y); that is, for w C Q, (w, Y) is the Gauss-

Markov estimator of its expectation (w, (^) A). This characterization often leads to

GAUSS-MARKOV ESTIMATION^439

an easy method of obtaining Gauss-Markov^ estimators^ when^ there is^ high^ sym- metry or when Q is similar to another manifold onto^ which we^ know^ how^ to

project orthogonally.^ For^ if we^ guess^ (Pox, Y) by choosing^ a^ vector^ w^ E^ Q,^ we

need only compute (w, p) to^ see^ if our^ guess^ is^ right^ and, if^ it is^ nearly^ right,^ we can often see immediately how it^ should^ be modified.^ A^ similar^ vector^ character- ization for PnY is^ readily^ written^ down.

(f) When Y is normal, PQY and (x, PQY) are the maximum likelihood esti-

mators of u and (x, p) respectively. Further,^ (x,^ PnY) is^ the^ minimum^ variance

unbiased estimator of (x, p).

Various other characterizations and properties of PaY can be^ stated,^ for^ exam-

ple, in terms of invariance under^ relevant^ linear^ transformations.^ Note that

Gauss-Markov estimation is linear, that is,^ that^ a(x,^ PoY) +^ ,B(z,^ PnY)^ =

(ax (^) + f3z, PQY), the Gauss-Markov estimator of (ax +^ ,z,^ A). Further,^ in^ terms of a fixed basis, the coordinates of PQY are the Gauss-Markov estimators of the respective coordinates^ of u. The^ conventional^ (unbiased) estimator^ of^ a2^ is IIQnYII2/(n -^ p).

Turn now to our example, with^ the^ description^ of Q in^ the^ form^ Ai,^ =

is + ai + fj, where^ Lcai =^ L fj =^ 0.^ Let^ a^ bar denote^ simple^ averaging

with respect to dotted subscripts and consider^ linear^ functionals^ of^ Y^ as follows: Y.., Yr. -^ Y.., and (^) Y.j-^ Y...^ Note that there are^ I^ of^ the^ second kind and J of the third. The coefficient vectors of^ these^ functionals^ are easily seen to be^ in^ R.^ Hence the^ functionals^ are^ the^ Gauss-Markov^ esti-

mators of their own expectations, Fl, as, and^ ,.^ Hence^ PQY^ is^ the^ coordinate

vector with (i,j) component Y.. + (Yi. -^ Y..) + (Y.3^ -^ Y..)^ =^ Yri + .j-^ V...^ There^ is^ a^ standard^ orthogonal^ decomposition^ of^ PaY^ into three vectors corresponding to^ "over-all^ mean,"^ "row effects,"^ and^ "column effects," which I^ do not discuss here. The^ dimension of Q2 is^ readily^ seen^ to be I + J- 1. If w^ is a q-dimensional linear manifold within^ Q2, then^ the^ standard^ F^ statistic for the null hypothesis ,u E X against all alternatives is

( p q)'1( IQ.YI 2--^ IQYI12) (p -q)-1l IPQ__^ Yl^2 (2.3) (^) -(n-p)-111QUYI12 (n (^) -p)- Q YI (^12)

with large values^ critical.^ Here^ Qs,Y^ =^ Y^ -^ PY,^ and^ Q -^ w^ is the^ orthogonal

complement of w^ with^ respect to^ U.^ This test^ statistic^ has, under the^ null^ hy-

pothesis, the central F distribution with^ p -^ q and^ n^ -^ p degrees of^ freedom.

The geometrical interpretation of^ the F statistic is well^ known.

The purpose of this paper is^ to^ discuss^ estimation,^ not^ testing.^ Hence,^ except for a few^ scattered remarks, we^ leave^ testing^ with the above brief^ paragraph.^ A good bit of the standard^ literature^ on^ Gauss-Markov^ estimation and Model^ I

analysis of^ variance^ may be^ interpreted^ in^ terms^ of^ forming^ and^ combining

orthogonal projections, especially^ when^ Q^ is^ regarded^ as^ the direct sum of several

linear manifolds with interesting statistical^ meanings.

GAUSS-MARKOV ESTIMATION 441

ple, the sickly piglet is probably (^) more likely to die accidentally before the end

of the experiment than is his robust brother. In such a case we are dealing with a

complicated selection process.

I (^) do not intend to discuss these (^) important problems further here. Instead we

shall assume that the experimental design is for Y1 from the start and that,

if observations are (^) missing by chance, it is meaningful to carry out statistical analyses conditionally on the "observed" V,.

Some notation is needed. Let Qi =^ PvU for i =^ 1, 2. Let

(3.1) i^ =^ Pvjs =^ EYi, i = (^) 1, 2, so that (^) %1 1 22 and A' (^1) A2. In (^) general, Q id Q1 + Q2, although Q C Q1 + 22.

We make the basic assumption that

(3.2) dim Q =^ dim Q1,

and we ask how to find (^) Pn1Y1, or its (^) equivalent, in (^) terms of Po, which we sup-

pose known explicitly. (In the following, I permit myself the customary ambi-

guity of using "u1," say, both to denote the unknown true (^) A' =^ EY' and to

serve as a running variable over 01.) Since dim Q = dim Q12, to each gl E Q

there corresponds a unique u2 (^) E Q2, say (^) AMl', such that (^) (I + (^) A)A1 EQ, Pp,(I +^ A)A1 =^ M',^ and^ P-,(I^ +^ A)A1 =^ AA1. In^ short, to^ each (^) A', there corre- sponds a (^) unique uCE with (^) Po,lA = (^) A'; A,.I' is that (^) A minus (^) AI&. For (^) completeness,

take Ax =^0 when x I Q, so that A is a well-defined transformation. It is

readily shown to be linear.

Instead of seeking Po,Y1 = al as such, it is equivalent and more convenient to

seek its analogue (I + A)PoY'l = in R. Note that, if we know 42 =^ APo,Y1, we could easily obtain (^) a since (3.3) A=Pg(yl+42)

and since we know Po explicitly. The proof is simple, for

(3.4) Pa(Y' + 42) =^ PQ(Po1Yl + 42) + p0(Y'- (^) PflYl) = (^) P0(Al + (^) a2) = (^) A,

where the last expression on the (^) right of (^) the first (^) line is (^) zero since Y' - (^) PQY' is obviously orthogonal to both (^) &21 and (^) % and hence to (^) Q.

Next, note^ that^42 can^ be obtained via the following consistency condition

which amounts, in any specific case, to a set of simultaneous linear equations

in dim (^) f2 scalar (^) unknowns:

(3.5) 42 =^ PlhPa(yl + A2).

This condition has great intuitive appeal for it says that 42 is that element of 2

which, when added to (^) Y', followed (^) by orthogonal projection on a and then on (Q2, gives us 42 back again. It is (^) easy to see that (^) (3.5) holds, for 42 = (^) P%A,

and we need only substitute for a from (3.3). That 42 is determined uniquely

by (3.5) may be seen thus. Suppose that u^ and u* both are in Q and satisfy (3.5).

442 FOURTH BERKELEY (^) SYMPOSIUM: KRUSKAL

Then u -^ = (^) Ph2P12(u - ut). But (^) IIu - (^) u1I > (^) IIPho,1(u - (^) u)II unless both

u-u* zEl and PQ(u^ -^ u) E Q2. Hence Pn(u -^ u) is in both Q and Q2; it is

therefore zero and hence u =^ u*.

The use of (3.5) in practice is simple for the kind of (^) application envisaged

since Pe is known explicitly and Po2, presents little difficulty if dim Q2 is small.

Let us turn to our example and suppose that YrJ is missing, that is, that

Y' has components like Y (^) except that (^) the (I, J) component is zero. Further, (^) p' has (i, j) components (^) A + ai + (^) 3j, as before, except that the (I, J) component is zero; IA2^ has all components zero except the (I, J) one which is (^) A + (^) ar + 13j. It is readily checked that dim Q = dim QN = I (^) + J - 1.

To find ,22^ is to find ari, which I shall call y for brevity. Then

P02P12[Y' + ,2] has all^ components zero^ but^ the^ (I, J) (^) one, which we know from the previous discussion is

(3.6) YIo^ +^ Y^ +^ YCJ +Y^ YOO^ +^ Y,

where a circled dot subscript means summation over that subscript for all

possible nonmissing observations. The basic identity (3.5) here says that

the above displayed quantity is y. Hence, solving the resulting linear equa-

tion,

(3.7) = 11 1(YJ0 +

yo (*) (^) Y (^) ~~I_ (^) I J _ (^) (J + (^) I - (^) IJ00o).

As a check, we may compute the expectation of this (^) quantity, a + (^) ar + 13g. Having found y =^ ,Ai, by (3.3) we (^) may use this value to "complete" Y' and apply (^) Po to the (^) completion, thus getting ut, (^) di, and (^) O,. It is straight- forward to (^) write down explicit descriptions of these quantities. Other methods of treating missing observations are (i) minimization of the quadratic form (^) IIQ,2[Y' + 122112 in (^) 4A, which leads (^) to (3.5) again, and (^) (ii) the use of traditional covariance analysis with dummy covariate vectors, each

having a coordinate one for the missing observation to which it corresponds and

zero coordinates elsewhere. (This last (^) method requires caution and modification if dim Q2 < dim V2, that is, if the expectations of the missing observations are linearly related.) The covariance (^) method is based on the easily shown identity

Po+1,2y1 =^ Po,Yy.

The major discussion of (^) this section, in which the normal equations (3.5) for 42 are obtained directly, may be regarded as the coordinate-free analogue of a

paper by Wilkinson [11].

If we are concerned with F testing and want to use the approach of this sec-

tion toward (^) missing observations, then we require dim X = (^) dim Pv1co as well as (3.2). Let (^) w, =^ Pvco. (^) Equation (3.5) must be worked out separately for (^) 11Qg1YI and (^) IQ=,YI12 in (^) (2.3). A (^) suggestion first made, I believe, by Yates is to approxi- mate the F statistic by (^) solving (3.5) for 42 under 01 only and using this quan-

444 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

estimate Pu,u from Y', as it seems clear that y2 can tell us nothing about

Pru, while the last term estimates PQ, -uji from the full Y. Then the two vector

estimators of orthogonal quantities are added. It is often easier to work in terms of estimating linear functionals (x, ji) of (^) A

and, in^ many cases, x^ is already in^ Q1. Then the Gauss-Markov estimator of

(x,,u) is (4.4) (x, P,PnY) =^ (Pnx, Y) = (^) (X, Y) -^ (P12,-ux, Y') + (P(I+A)(c0-U)X, Y),

for x^ E Q1. If, in^ addition, dim Q2 =^ 1, then (4.4) may be simplified further, as follows.

Suppose that U, - U is spanned by z (and %2 is spanned by Az). Straightforward

computation gives

(4.5)

(x, PsnP%Y) -^ (x, Y') - (x, z)(z, Y')^ + (x, z)[(z, Y') + (Az, 12)] (X, PRY1Y) =^ (X Y') (^) ll )-11 l(+ (^) IIA l12 +1l IAzll Z l)-A,F (X,Y) x,z)^ F14AzJJ^ (z,^ Y')^ -^ (Az,Y llz1l2 +^ IlAzl1II2 zI

for x E Q1 and z spanning Q2 - U. In applications, z is usually obtained easily

as the coefficient vector of Y'^ for the Gauss-Markov estimator of the expecta-

tion of the single nonzero coordinate of ,u2, this for V1 and Q1, that is, for no

extra (^) observation. let us illustrate the use of (4.3) to (4.5) in our example, supposing that

there is an extra observation YIJ2 in the (I, J) cell. Now V1 is IJ-dimensional

and V is (IJ + l)-dimensional. Let Q1 have the same coordinates as before

except that the new (IJ + 1)st coordinate is zero. Let Q22 =^ V2 have all

coordinates zero except the (IJ + 1)st. Then Au for u E (^) Q1 is that vector in (^) %2 whose (IJ + I)st coordinate is the same as the (I, J) coordinate of u. Hence to (^) say that Au =^0 is to (^) say that (^) p(u) + ai(u) + ,3j(u) =^ 0. (^) (Here I have kept u in the expression as an argument to emphasize that ,i, ar,

and /3J are functionals of u.) Hence U is defined by (x, u) = 0 where x has

all coordinates zero except that the (I, J) coordinate is one. Project x on Q2,

orthogonally to obtain the vector in Q1 with (i, j) coordinate

(4.6) J-lai + I-'&1J-^ (IJ)-

where the 6's are Kronecker deltas. This vector z spans Q1-^ U, which is

one-dimensional. The transformation I + A, applied to z, takes it into

another vector, which is the same except that the (IJ + 1)st coordinate 0

of z becomes J-1 + I-' -^ (IJ)-1 = (^) X, say, equal to the common (I, J) coordinate. It^ is readily computed that (^) 11zI12 =^ X and 11(I + (^) A)zJ12 = X(1 +^ X). Note that z could also be obtained as follows: the Gauss-Markov esti-

GAUSS-MARKOV ESTIMATION 445

mator, with no extra observation, of p + al + flr is Y,.^ +^ F.J-^ Y...^ This

linear functional of Y' bas z as its coefficient vector. Suppose that we^ want to^ find^ the^ Gauss-Markov^ estimator^ of^ a,. We

know that F1.^ -^ Y.. is that estimator^ if^ there^ were no^ extra^ observation,

hence we may conveniently write^ a1^ as (x,^ IA), where^ x^ C 01 is the^ vector

with (i, j) coordinate J-'6il -^ (IJ)-1 and^ with^ (IJ^ +^ 1)st^ coordinate^ zero.

Compute directly that (x, Y1) =Y1.-Y.., (X, z) = -^ (IJ)

41z112 +^ IIAzJI2^ =^ I1(I^ +^ A)zJ12^ =^ X(1^ +^ X),

. IAzI 2 = x (Z, Y') =^ YI.^ +^ Y..J^ -^ Y..,

(Az, Y2)^ =^ XYLT2.

Hence the Gauss-Markov estimator^ of^ ca1^ is

(4.8) F1.^ -^ ..^ + [(1 + X)IJ]-'[(F1. +^ Y.J-^ ..)^ -YIJ2] Note that the last term in square brackets is the difference between^ the Gauss-Markov estimators^ of^ p +^ ax^ +^ ,BJ from^ Y'^ and^ Y2^ alone,^ respec-

tively. Call this quantity A^ and^ observe^ that^ EA = 0.^ Since^ (1^ +^ X)IJ^ =

IJ + I + J -^ 1, we see that

(4.9) 1=,^ Y1.-Y^ +^ A

A simple interchange of indexes gives us &2, * , c1-1; 3,I^ *,I Simi-

lar computations provide^ dx,^ I4J, and^ ju.^ The final results^ may be^ summarized as follows: A

airi. ~ (^) A(DI5r- 1) IJ (^) + I + (^) J- (4.10)

:~~~~Y. Wai(J8JJ-1)) p i .jY--I (^) + (^) I +J-

thus permitting explicit expression of PnlPnY as PaiYl plus a "correc-

tion" term.

Note an apparent lack of symmetry between YIJ and^ YIJ2^ in the above

expressions. That this is only apparent may be^ seen^ by^ computing the^ co-

efficient of Yr., in^ ,A + &j + ,i,; it is the^ same^ as^ that of^ YLF2.

As with missing observations, the^ requisite manipulations should be carried

out twice^ for F^ testing purposes, once^ for each of the^ two^ basic^ sums of squares.

GAUSS-MARKOV ESTIMATION 447

Note that P%1Y' is easily obtained by the missing observation technique, since

dim 21 =^ 1. In fact, A1PoYl, the Gauss-Markov estimator from YP alone

of EY21, is (w, (^) Y')sj since w £ 01 and E(w, Y') = (^) (w, M1) = M21. Further,

Aiw = IlwII2si and Aw = IIwlI2s.

The relationship (w, x) =^ 0 for x (^) C Q1 determines U. Hence, w spans the one- dimensional UN^ -^ U.^ Also,^ (I^ +^ A)w^ =^ w^ +^ jjwll2s^ spans^ (I^ +^ A)(Q1-^ U). Hence, PQ1Y'= Pn1P(I+Ai)9,[Y' +^ (W, YP)sA],

(5.3) P-uY = ( Y^ W,

P(I+A)(Q,-U)Y =^ (w,^ YI)^ +^ IIWI12(s4^ Y2)^ WI^ 2;],

H~wI12 +. 11 I1131 [w^ H-

and from (4.3) or (4.2)

(5.4) P(I+A,)1I%PUY =^ (I + A1)PS2,PQ1Y

= P(I+A1)Sh[Y' + (W, Y')S1]

-^ (w Y1)

W112 [w^ +^ I!wlI2sd]

+(w, Y)^ +^ WIIw2(S, Y2)^ (W I I12S)

IW-2 IIWHIIsIIIS

since s I (^) f1. Hence the desired y21 E Q21, if it exists, must satisfy

(s, Y2) -^ HIs1I2(W, Y') (W -I^ IIWII2Si).

(5.5) P(I+A.)P1Y2' = (W, Y')P(I+A0)s,S1 + 1 + 11s11211WI

Since y21 may be written bs1 and since (I + A1)Q4 may be regarded as the

direct sum of U and the manifold spanned by w + (^) IWI 12S1, (5.5) may be written

as the following scalar equation, omitting the common vector w + IIw I2s,.

(5.6) b + 1 ll_1^ (w,^ y1)1181112 + (s,^ Y2)^ -^ 1s 112(w, Y')

+ llwW12l1slI2 - 1 +- WII21lS112 1 +^ lIS1l2llwll

or, simplifying,

(5.7) b - 11H21 2(W, Yl) +^ flIS12(1 +^ flWjj2jsll2)[(, y2)/11S112]

11k1H2(1 +^ IWll2Il12)

Observe that this is a weighted average of (w, Y1), the si coordinate of the

missing observation vector estimator from Y', and (s, Y2)/I1s112, the estimator

of the same quantity from Y2. The weights are proportional to^ -11s2II2 and

11s112(1 +^ IIwI1211si12) respectively.

In the usual applications, with^ coordinates for^ an^ orthonormal^ basis, II 2 = m

and Ist1112 =^1 so that the^ weights become, respectively,

(5.8) 1-m,WmI1I2WI12.+

448 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

Further, (^) llwli2 is (^) l/l-2 times the variance of the replicated observation's (^) expec- tation, as estimated from Y'. So we may summarize the result as follows, for

the usual applications.

Let y be the Gauss-Markov estimator of the replicated observation's expecta- tion from Y1 alone. Let Var y = 02o2. Let y be the Gauss-Markov estimator of the same quantity from Y2 alone. (It will be the arithmetic average of the replicated observations.)

Let y be the weighted average of yand y with weights proportional to (1 - m)

and m(l + 02) respectively. For Gauss-Markov estimation based on Y, treat y

as if it were a single observation replacing all the replicated ones. Note that this

is essentially a prescription for finding Gauss-Markov estimators when weak

symmetry does^ not^ hold but rather the^ following conditions hold:

(i) the^ observations are^ uncorrelated;

(ii) all the observations have variance o-2, except that one (y) has variance

a2/m where m is known.

In the special case m =^1 (original design), the weights are 0 and 1, as they

should be. In the special case m = 0, which, strictly speaking, is not covered by

the above analysis but can be handled by similar (^) methods, the (^) weights are 1

and 0, again as they should be.

Let us apply this to our example. Here, V may be expressed as the

(IJ + m - l)-dimensional space (^) {Yi, for i = (^) 1, * *, I; j = (^) 1, (^) *, J; (i, (^) j).^ (I, J); Y1J1; YIJ2, * * *, YlJ,j}. The^ three groups of coordinates cor- responding to V1, V21, and V22 are separated by semicolons. The U's are given via EY (^) ,j= (^) p + aci + f3and EYIjj = (^) , + XrI + (^) oj for I = (^) 1, (^) *--, m. We

may take si as having all zero coordinates, except that the (I, J, 1) coordi-

nate is one; similarly, E2 has all zero coordinates, except that the last m -^1

coordinates are one.

Now w is given by the coefficient vector for y toward the end of section 3,

that is, w is the vector in Q1 with (i, j) coordinate, when (i, j) F6 (I, J),

[I(I -^ 1)][J/(J -^ 1) [J-'i5i + (^) I-16,jJ -^ (IJ)-1]; of course, w has zero (IJI) coordinates.^ Further, (w, Y'), the estimator of p + (^) a, + (^) #3T from Y' (^) alone, is (^) [I/(I - (^) 1)][J/(J - (^) 1)][J-1YI0 + I-YoJ - (^) (IJ)-1YoEa]- Next, observe^ that (s, Y2)/11s112 =^ Elm Y1I/m =^ YI. We compute that (^) 11w112 =^ [I/(I -^ 1)][J/(J- 1)]X(1 - ) = (I + J - 1)/[(I -1)(J - 1)] so that the weights are 1 - m and mIJ/[(I -^ 1)(J -1)] respectively. Hence

(5.9) b^ =^ (1^ m)IJrJ(1Y10 +^ I'JY0J -^ (IJ)'Y00] +^ IJ ElKuz

(I -^ 1)(J -^ 1) + m(I +J -^ 1)

and we need only use this in place of all the (^) YrJI and apply the symmetrical explicit formulas of ordinary Gauss-Markov estimation to the resulting IJ-fold balanced (^) array.

It can be checked that, in the special case m =^ 2, this leads to the same

results as in the last section.

450 FOURTH BERKELEY SYMPOSIUM: KRUSKAL

spanned by a^ single vector that^ is set up to^ correspond^ with^ si, that^ is,^ EY21 = (w, (^) ul)s1i and EY22 =^ (w, (^) /A)S2.

If we could observe Yl + Y21 + Y22, orthogonal projection on its Q would

be easy for we would have a k-fold replication of a design for which we have the Gauss-Markov estimators. Although we do not have Y22, we can ask whether there is a y2 E (^) Q2 such that (6.2) P(I+Ai)n1PQ(Y1 +^ y2) =^ P(I+A.)n1(Y' +^ Y21) where A1,.' =^ -P=^ (w, (^) A')sl. In this case,^ with^ the orthonormal coordinates^ we have in^ mind, (^) 1Isi112 =^ m and (^) 1IS112 =^ k.

The estimator of EY2 from YP alone is (w, Y')s and from Y21 alone it is

[(SI, (^) Y2l)/l11siI12]S. We^ can^ write^ Q^ as^ the direct^ sum^ of^ orthogonal subspaces,

U + (I + A) (1-^ U), just as before and express y2 in the form bs. Carrying

out the^ operations indicated^ by (6.2), we^ obtain

(6.3) a =^ g1S2^12 (6.3) a = ~~~~~11s112(l +^ IIWI1211Si112) Then, putting inlls^12 =^ m^ along with^ IS2l 12 =^ k -m andflS^112 =^ k,^ a^ legitimate

operation for orthonormal coordinates, we finally obtain exactly (6.1) again. The

final paragraph of the discussion of the case m _ k holds verbatim.

6.3. Summary of the preceding work of this section. We may summarize as

follows. (^) Suppose that (^) Po[Y' + Y2] is known explicitly, but that Y = yl + Y is (^) not weakly spherical. Suppose further (i) that Y' ranges over an (n - 1)-

dimensional space and in that space has the covariance transformation .21, and

(ii) that Y2^ ranges over a one-dimensional space orthogonal to that of Y'^ and in

that space has the covariance transformation r2VI, where r2 is known [and

rational]. Let y be the Gauss-Markov^ estimator,^ based^ on^ Y'^ alone,^ of^ the coordinate

of EY2 with^ respect to^ a^ unit^ vector t^ in the^ space of^ Y2.^ Let^ Var^9 =^ 02.2.

Let (^) y be the (^) coordinate of Y2 with respect to the same unit vector.

Let y be the weighted average of 9 and y with weights proportional to 1 - 2

and T-2(1 + 02) respectively.

Then P2[Yl + yt] is the Gauss-Markov estimator of EY.

The special cases 72 = 1 (original design) and '2 = oo^ (Y2 is missing)

work out as they should. The special case T2 = 0 (corresponding to m = 00 or

EY2 known a priori) may be useful in some circumstances. The bracketed

words, "and rational," above may be omitted by continuity of Gauss-Markov

estimation in^ I.

A related discussion is given by Gauss^ [5], section^36 of Theoria^ Combina-

tionis....

(^) Application to apparent outliers

One approach to apparently outlying observations is to apply some criterion

and then either decide that the suspect observation is not an outlier and handle

GAUSS-MARKOV ESTIMATION^451 it in the usual way for analysis, or else decide that the observation is an outlier and omit it completely from analysis. One might, however, consider intermediate positions in which a suspect ob- servation is treated with a lower weight than the rest, that is, has an imputed variance higher than that of the other observations. To completely omit the observation (^) is, in effect, to give it an infinite variance, but why go that far? If there is only one such suspect^ observation,^ the^ method^ described above permits the relatively simple incorporation of the^ suspect observation into Gauss- Markov estimation, provided that the ratio of its imputed variance to^ the variance of the other observations is given.

REFERENCES [1] M. S. BARTLETT, "The vector representation of a sample," Proc. Camnbridge Philos. Soc., Vol. 30 (1933-34), pp.327-340. [2] L. C. A. CORSTEN, "Vectors, a tool in statistical regression theory," Meded. Landbouwhoge- school (^) Wageningen, Vol. 58 (1958), pp. 1-92. [3] J. DURBIN^ and^ M.^ G.^ KENDALL, "The^ geometry of^ estimation," Biometrika, Vol.^38 (1951), pp. 150-158. [4] D. A. S. FRASER, "On the combining of interblock and^ intrablock^ estimates,"^ Ann.^ Math. Statist., Vol. 28 (1957), pp. 814-816. [5] C.-F. GAUSS, Methode des Moindres Carres, Paris, Mallet-Bachelier, 1855; translation^ into French by J. Bertrand of Gauss's works on least squares. (A translation into English by H. F. Trotter is Technical Report No.5, Statistical Techniques Research Group, Princeton, 1957.) [6] P.^ R.^ HALMOS,^ Finite-Dimensional^ Vector^ Spaces,^ Princeton,^ Van^ Nostrand^ Press,^1958 (2nd ed.). [7] A. N.^ KOLMOGOROV, "On the^ proof^ of the^ method^ of least^ squares,"^ Uspehi^ Mat.^ Nauk, Vol. 1 (1946), pp. 57-70.^ (In Russian.) [8] W.^ KRUSKAL, "Discussion^ of the^ papers of^ Messrs.^ Anscombe^ and^ Daniel,"^ Technometrics, Vol. 2 (1960), pp. 157-166. (Pages indicated also include discussion^ by T.^ S.^ Ferguson, J. W. Tukey, and E. J. Gumbel.) [9] R. L. PLACKETT, "Some theorems in least squares," Biometrika, Vol. 37^ (1950), pp. 149-157. [10] K. D. TOCHER, "The design and analysis of block experiments," J.^ Roy. Statist.^ Soc., Ser. B, Vol. 14 (1952), pp. 45-91. Discussion, pp. 91-100. [11] G. N.^ WILKINSON, "Estimation^ of^ missing^ values^ for^ the^ analysis^ of^ incomplete^ data," Biometrics, Vol.^14 (1958), pp.^ 257-286. [12] M.^ A.^ WOODBURY, "Modified matrix functions and^ applications,"^ unpublished^ paper presented at^ the^ IBM^ Seminar for^ Directors^ of^ Computing Centers,^ June^ 26,^ 1959.

the coordinate-free approach, Exams of Vector Analysis

Related documents

Partial preview of the text

Download the coordinate-free approach and more Exams Vector Analysis in PDF only on Docsity!

THE COORDINATE-FREE APPROACH

TO GAUSS-MARKOV ESTIMATION,

AND ITS APPLICATION TO MISSING

AND EXTRA OBSERVATIONS

WILLIAM KRUSKAL

articles that deal with it^ ([2], plus some^ of the references^ in^ Dutch that it^ lists,

and, to some extent, [1], [3] and^ [7]). The coordinate-free viewpoint is^ implicit

does present a viewpoint different from that usually given. In contrast, the

topic of extra observations, although it was briefly considered by Gauss [5],

(I know only of papers by R. L. Plackett [9] and K. D. Tocher [10].) The prob-

with the treatment of so-called outliers. I^ shall discuss a^ method of treating extra

observations that bears some resemblance to that for missing observations. In

particular, it leads to possible methods for treating apparent outliers that I

described briefly in [8].

There are two major motivations for emphasizing the coordinate-free approach

to Gauss-Markov estimation. First, it permits a simpler, more general, more

elegant, and more direct treatment of the general theory of linear estimation

pedagogically important.

space, V, on^ which^ an^ inner product,^ (,^ ),^ is^ given.^ (It^ would also be^ possible^ to

start without a given inner product and to define one^ in^ terms^ of the^ covariance

structure of Y.) Perhaps more basically, Y^ is^ a^ function^ from^ an^ underlying

course, Y is the abstract entity usually corresponding, in^ a^ particular problem,

to the coordinate vector comprising the set of scalar observations; by not^ writing

able to present the general theory succinctly.

further mention.

Clearly, E(x, Y) is a linear functional of x^ E V, and hence there exists^ a

unique member of V, say g, such that E(x, Y) =^ (x, u) for all^ x^ E V.^ Call^ p the

(vector) expectation of^ Y,^ EY;^ this^ quantity^ is^ easily^ articulated^ with the^ vector

Similarly Cov^ [(x, Y), (z, Y)], where^ x,^ z^ E^ V,^ is^ clearly^ a^ quasi-inner^ product

unique linear transformation 2 on^ V^ such^ that

(2.1) Cov^ [(x,^ Y), (z,^ Y)]^ =^ (x, 2z),^ x,^ z^ E^ V.

It is easily seen that 2 is nonnegative definite and symmetrical with respect to

(, ); that^ is, (x,^ 2x)^2 0 and^ (x,^ 2z)^ =^ (2x,^ z)^ for all^ x,^ z^ E^ V.^ Naturally,

Var (x, Y) = (x, lx). Let us say that Y is weakly spherical if^2 is^ a^ (nonnegative)

It is often thought desirable to add side conditions so that p, the at, and

the f3j are uniquely determined by the gij (estimable, identifiable). A popular

What we have described is the model for two-way Model I analysis of

uncorrelated and^ have^ weakly spherical distributions^ in their^ own^ subspaces^ with

(restricted) covariance transformations^ O2I,^ where^ .2^ is the same as that for Y

(z, BY)] =^0 for all x, z E V. This immediately extends to orthogonal decom-

positions of Y to more than two components. Since Po and Qo are orthogonal

example, PQ =^ Po or, equivalently, (x,^ Paz) =^ (Pox,^ z)^ for all^ x,^ z^ (E Q]^ with

respect to (,

If we require normality of Y, that is, if^ we^ require that (x, Y) be normal for

independent chi-square distributions with p =^ dim Q and n -^ p =^ dim e2 de-

estimator of a linear functional (x, ,u) of IA is (x, PoY) =^ (Pax, Y). (Com-

mentary on the historical accuracy or inaccuracy of the designation "Gauss-

Markov" appears in the discussion of [10].) These Gauss-Markov estimators are

characterized by the following well-known properties. To avoid trivialities, we

(a) PuY is the^ unique linear^ transformation^ of^ Y^ that is^ an^ unbiased^ estimator

tionals. In other words,

(a,) E(PQY)^ =^ ,^ for^ all p^ E^ Q,

(a2) For all x, and for linear transformations D $ Po satisfying E(DY) =

(b) For all^ x, the^ unique minimum variance^ linear functional^ of^ Y^ that is^ un-

(e) For all x, the^ unique linear functional of^ Y^ whose^ "coefficient^ vector,"

when the functional is expressed in the form (z, Y), lies in a2 and which

estimates (x, p) unbiasedly is (Pax, Y); that is, for w C Q, (w, Y) is the Gauss-

GAUSS-MARKOV ESTIMATION^439

project orthogonally.^ For^ if we^ guess^ (Pox, Y) by choosing^ a^ vector^ w^ E^ Q,^ we

(f) When Y is normal, PQY and (x, PQY) are the maximum likelihood esti-

mators of u and (x, p) respectively. Further,^ (x,^ PnY) is^ the^ minimum^ variance

unbiased estimator of (x, p).

Various other characterizations and properties of PaY can be^ stated,^ for^ exam-

Gauss-Markov estimation is linear, that is,^ that^ a(x,^ PoY) +^ ,B(z,^ PnY)^ =

Turn now to our example, with^ the^ description^ of Q in^ the^ form^ Ai,^ =

is + ai + fj, where^ Lcai =^ L fj =^ 0.^ Let^ a^ bar denote^ simple^ averaging

mators of their own expectations, Fl, as, and^ ,.^ Hence^ PQY^ is^ the^ coordinate

complement of w^ with^ respect to^ U.^ This test^ statistic^ has, under the^ null^ hy-

pothesis, the central F distribution with^ p -^ q and^ n^ -^ p degrees of^ freedom.

The geometrical interpretation of^ the F statistic is well^ known.

analysis of^ variance^ may be^ interpreted^ in^ terms^ of^ forming^ and^ combining

orthogonal projections, especially^ when^ Q^ is^ regarded^ as^ the direct sum of several

GAUSS-MARKOV ESTIMATION 441

of the experiment than is his robust brother. In such a case we are dealing with a

complicated selection process.

shall assume that the experimental design is for Y1 from the start and that,

A simple interchange of indexes gives us &2, * , c1-1; 3,I^ *,I Simi-