









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
, I shall illustrate the general theory in terms of a simple illustration, two-way analysis of variance with one observation per cell. The vector space ...
Typology: Exams
1 / 17
This page cannot be seen from the preview
Don't miss anything!
UNIVERSITY OF CHICAGO
The purposes of this paper are (1) to describe the coordinate-free approach to Gauss-Markov (linear least squares) estimation in the context of Model I analysis of variance and (2) to discuss, in coordinate-free language, the topics of missing observations and extra observations. It is (^) curious that the (^) coordinate-free approach to Gauss-Markov estimation, although known to many statisticians, has infrequently been discussed in the literature on least squares and analysis of variance. The major textbooks in these areas do not use the coordinate-free (^) approach, and I know of only a few journal
in R. A. (^) Fisher's geometrical approach to sampling problems. The subject of missing observations in Model I^ analysis of variance is well understood and often discussed. This paper presents no new results here, but it
section 35 of Theoria Combinationis... , has elicited hardly any papers since.
lem (^) of extra (^) observations is (^) important in its own (^) right and also in connection
The work leading to this paper was supported in^ part by the^ Logistics and^ Mathematical Statistics Branch of the Office^ of^ Naval Research^ and^ in^ part by the National Science Foundation. 435
436 FOURTH BERKELEY SYMPOSIUM:^ KRUSKAL
than do its notational competitors, the matrix and scalar approaches. Second, it is useful as an introduction to infinite-dimensional spaces, which are important, for example, in the consideration of^ stochastic^ processes. A related point is that more or less^ coordinate-free treatments^ of^ finite-di- mensional vector spaces are now more common than they once were^ and^ are being taught to students at an earlier stage.^ With^ such^ mathematical^ back- ground, a student can learn the theoretical side of^ Model^ I^ analysis^ of^ variance quickly and efficiently. The treatment in this paper will, however, be^ compact and without the motivational material and the many examples that would be
Nonetheless, it may be^ useful^ to keep one concrete^ example^ before^ us.^ Ac- cordingly, I shall illustrate the general theory in terms of a simple illustration, two-way analysis of variance with one observation per cell. The vector space viewpoint and notation will mostly^ be^ taken^ from^ P. R. Halmos's text [6]. My own introduction to the coordinate-free approach^ came from discussions with L. J. Savage and I acknowledge my great debt^ to^ him,^ a debt of which coordinate freedom forms but a part.
We consider a sample point, Y, that ranges over an n-dimensional real vector
probability space onto V such that all^ sets in^ the^ underlying^ space^ of form {ej(x, Y(e)) 5 c}, where x E^ V and c is a real number, are^ measurable. Of
in terms of coordinates, that is, by not requiring that a basis be specified, we are
All first and second moments discussed will be assumed to exist without
expectation in^ coordinate^ form.
(like an inner product but possibly nonnegative definite). Hence there exists^ a
438 FOURTH BERKELEY SYMPOSIUM: KRUSKAL
set of side conditions is E2 ai =^ E (^) ij =^0 for all i, j. Then it follows that p =^ E E Ai./(IJ), that (^) ai =^ (Ej pAi/J) -^ p, and that (^) ,B =^ (i Ai,/I) -^ q.
variance, with one observation per cell and no interaction. If, in addition, we require joint normality of the (^) Yij, then the (^) Yij are independent. A basic fact is that, under conditions (i) and (ii), the orthogonal projections, PQY of^ Y^ on^02 and^ Y^ -^ PnY =^ QQY^ on^ the^ orthogonal^ complement^ of^ Q,^ are
itself. To say that AY and BY are uncorrelated is to say that Cov [(x, AY),
projections, they are idempotent (for example, P0Pa =^ Po) and symmetric [for
all x E V, then it is almost immediate that (^) ilPoY -^ ;LI2/q2 and (^) IIQUYI12/o2 have
grees of freedom respectively. In any case, these quantities have expectations p and n - (^) p. The (^) vector Gauss-Markov estimator of (^) p is (^) PQY and the (^) scalar Gauss-Markov
assume that a2 > 0.
of u and leads to minimum variance for^ all^ derived estimators of linear^ func-
all EC2, it follows that Var (x, (^) PoY) < Var (x, DY).
biased for (^) (x, (^) Iu) is (^) (Pox, Y) =^ (x, PoY). (c) For^ all^ x, the^ unique minimum^ variance^ linear^ functional of^ Y^ that^ has bounded mean (^) square error in (^) A is (Pox, Y). (d) The^ unique vector , in^ Q minimizing (^) IIY - ,112 is^ PnY. This is the least squares characterization.
Markov estimator of its expectation (w, (^) A). This characterization often leads to
an easy method of obtaining Gauss-Markov^ estimators^ when^ there is^ high^ sym- metry or when Q is similar to another manifold onto^ which we^ know^ how^ to
need only compute (w, p) to^ see^ if our^ guess^ is^ right^ and, if^ it is^ nearly^ right,^ we can often see immediately how it^ should^ be modified.^ A^ similar^ vector^ character- ization for PnY is^ readily^ written^ down.
ple, in terms of invariance under^ relevant^ linear^ transformations.^ Note that
(ax (^) + f3z, PQY), the Gauss-Markov estimator of (ax +^ ,z,^ A). Further,^ in^ terms of a fixed basis, the coordinates of PQY are the Gauss-Markov estimators of the respective coordinates^ of u. The^ conventional^ (unbiased) estimator^ of^ a2^ is IIQnYII2/(n -^ p).
with respect to dotted subscripts and consider^ linear^ functionals^ of^ Y^ as follows: Y.., Yr. -^ Y.., and (^) Y.j-^ Y...^ Note that there are^ I^ of^ the^ second kind and J of the third. The coefficient vectors of^ these^ functionals^ are easily seen to be^ in^ R.^ Hence the^ functionals^ are^ the^ Gauss-Markov^ esti-
vector with (i,j) component Y.. + (Yi. -^ Y..) + (Y.3^ -^ Y..)^ =^ Yri + .j-^ V...^ There^ is^ a^ standard^ orthogonal^ decomposition^ of^ PaY^ into three vectors corresponding to^ "over-all^ mean,"^ "row effects,"^ and^ "column effects," which I^ do not discuss here. The^ dimension of Q2 is^ readily^ seen^ to be I + J- 1. If w^ is a q-dimensional linear manifold within^ Q2, then^ the^ standard^ F^ statistic for the null hypothesis ,u E X against all alternatives is
( p q)'1( IQ.YI 2--^ IQYI12) (p -q)-1l IPQ__^ Yl^2 (2.3) (^) -(n-p)-111QUYI12 (n (^) -p)- Q YI (^12)
with large values^ critical.^ Here^ Qs,Y^ =^ Y^ -^ PY,^ and^ Q -^ w^ is the^ orthogonal
The purpose of this paper is^ to^ discuss^ estimation,^ not^ testing.^ Hence,^ except for a few^ scattered remarks, we^ leave^ testing^ with the above brief^ paragraph.^ A good bit of the standard^ literature^ on^ Gauss-Markov^ estimation and Model^ I
linear manifolds with interesting statistical^ meanings.
ple, the sickly piglet is probably (^) more likely to die accidentally before the end
I (^) do not intend to discuss these (^) important problems further here. Instead we
if observations are (^) missing by chance, it is meaningful to carry out statistical analyses conditionally on the "observed" V,.
(3.1) i^ =^ Pvjs =^ EYi, i = (^) 1, 2, so that (^) %1 1 22 and A' (^1) A2. In (^) general, Q id Q1 + Q2, although Q C Q1 + 22.
and we ask how to find (^) Pn1Y1, or its (^) equivalent, in (^) terms of Po, which we sup-
guity of using "u1," say, both to denote the unknown true (^) A' =^ EY' and to
there corresponds a unique u2 (^) E Q2, say (^) AMl', such that (^) (I + (^) A)A1 EQ, Pp,(I +^ A)A1 =^ M',^ and^ P-,(I^ +^ A)A1 =^ AA1. In^ short, to^ each (^) A', there corre- sponds a (^) unique uCE with (^) Po,lA = (^) A'; A,.I' is that (^) A minus (^) AI&. For (^) completeness,
readily shown to be linear.
seek its analogue (I + A)PoY'l = in R. Note that, if we know 42 =^ APo,Y1, we could easily obtain (^) a since (3.3) A=Pg(yl+42)
(3.4) Pa(Y' + 42) =^ PQ(Po1Yl + 42) + p0(Y'- (^) PflYl) = (^) P0(Al + (^) a2) = (^) A,
where the last expression on the (^) right of (^) the first (^) line is (^) zero since Y' - (^) PQY' is obviously orthogonal to both (^) &21 and (^) % and hence to (^) Q.
in dim (^) f2 scalar (^) unknowns:
which, when added to (^) Y', followed (^) by orthogonal projection on a and then on (Q2, gives us 42 back again. It is (^) easy to see that (^) (3.5) holds, for 42 = (^) P%A,
442 FOURTH BERKELEY (^) SYMPOSIUM: KRUSKAL
Then u -^ = (^) Ph2P12(u - ut). But (^) IIu - (^) u1I > (^) IIPho,1(u - (^) u)II unless both
The use of (3.5) in practice is simple for the kind of (^) application envisaged
Y' has components like Y (^) except that (^) the (I, J) component is zero. Further, (^) p' has (i, j) components (^) A + ai + (^) 3j, as before, except that the (I, J) component is zero; IA2^ has all components zero except the (I, J) one which is (^) A + (^) ar + 13j. It is readily checked that dim Q = dim QN = I (^) + J - 1.
P02P12[Y' + ,2] has all^ components zero^ but^ the^ (I, J) (^) one, which we know from the previous discussion is
(3.6) YIo^ +^ Y^ +^ YCJ +Y^ YOO^ +^ Y,
yo (*) (^) Y (^) ~~I_ (^) I J _ (^) (J + (^) I - (^) IJ00o).
As a check, we may compute the expectation of this (^) quantity, a + (^) ar + 13g. Having found y =^ ,Ai, by (3.3) we (^) may use this value to "complete" Y' and apply (^) Po to the (^) completion, thus getting ut, (^) di, and (^) O,. It is straight- forward to (^) write down explicit descriptions of these quantities. Other methods of treating missing observations are (i) minimization of the quadratic form (^) IIQ,2[Y' + 122112 in (^) 4A, which leads (^) to (3.5) again, and (^) (ii) the use of traditional covariance analysis with dummy covariate vectors, each
zero coordinates elsewhere. (This last (^) method requires caution and modification if dim Q2 < dim V2, that is, if the expectations of the missing observations are linearly related.) The covariance (^) method is based on the easily shown identity
The major discussion of (^) this section, in which the normal equations (3.5) for 42 are obtained directly, may be regarded as the coordinate-free analogue of a
tion toward (^) missing observations, then we require dim X = (^) dim Pv1co as well as (3.2). Let (^) w, =^ Pvco. (^) Equation (3.5) must be worked out separately for (^) 11Qg1YI and (^) IQ=,YI12 in (^) (2.3). A (^) suggestion first made, I believe, by Yates is to approxi- mate the F statistic by (^) solving (3.5) for 42 under 01 only and using this quan-
444 FOURTH BERKELEY SYMPOSIUM: KRUSKAL
estimate Pu,u from Y', as it seems clear that y2 can tell us nothing about
estimators of orthogonal quantities are added. It is often easier to work in terms of estimating linear functionals (x, ji) of (^) A
(x,,u) is (4.4) (x, P,PnY) =^ (Pnx, Y) = (^) (X, Y) -^ (P12,-ux, Y') + (P(I+A)(c0-U)X, Y),
for x^ E Q1. If, in^ addition, dim Q2 =^ 1, then (4.4) may be simplified further, as follows.
computation gives
(4.5)
(x, PsnP%Y) -^ (x, Y') - (x, z)(z, Y')^ + (x, z)[(z, Y') + (Az, 12)] (X, PRY1Y) =^ (X Y') (^) ll )-11 l(+ (^) IIA l12 +1l IAzll Z l)-A,F (X,Y) x,z)^ F14AzJJ^ (z,^ Y')^ -^ (Az,Y llz1l2 +^ IlAzl1II2 zI
as the coefficient vector of Y'^ for the Gauss-Markov estimator of the expecta-
extra (^) observation. let us illustrate the use of (4.3) to (4.5) in our example, supposing that
coordinates zero except the (IJ + 1)st. Then Au for u E (^) Q1 is that vector in (^) %2 whose (IJ + I)st coordinate is the same as the (I, J) coordinate of u. Hence to (^) say that Au =^0 is to (^) say that (^) p(u) + ai(u) + ,3j(u) =^ 0. (^) (Here I have kept u in the expression as an argument to emphasize that ,i, ar,
of z becomes J-1 + I-' -^ (IJ)-1 = (^) X, say, equal to the common (I, J) coordinate. It^ is readily computed that (^) 11zI12 =^ X and 11(I + (^) A)zJ12 = X(1 +^ X). Note that z could also be obtained as follows: the Gauss-Markov esti-
GAUSS-MARKOV ESTIMATION 445
linear functional of Y' bas z as its coefficient vector. Suppose that we^ want to^ find^ the^ Gauss-Markov^ estimator^ of^ a,. We
hence we may conveniently write^ a1^ as (x,^ IA), where^ x^ C 01 is the^ vector
Compute directly that (x, Y1) =Y1.-Y.., (X, z) = -^ (IJ)
41z112 +^ IIAzJI2^ =^ I1(I^ +^ A)zJ12^ =^ X(1^ +^ X),
. IAzI 2 = x (Z, Y') =^ YI.^ +^ Y..J^ -^ Y..,
(4.8) F1.^ -^ ..^ + [(1 + X)IJ]-'[(F1. +^ Y.J-^ ..)^ -YIJ2] Note that the last term in square brackets is the difference between^ the Gauss-Markov estimators^ of^ p +^ ax^ +^ ,BJ from^ Y'^ and^ Y2^ alone,^ respec-
IJ + I + J -^ 1, we see that
(4.9) 1=,^ Y1.-Y^ +^ A
lar computations provide^ dx,^ I4J, and^ ju.^ The final results^ may be^ summarized as follows: A
airi. ~ (^) A(DI5r- 1) IJ (^) + I + (^) J- (4.10)
:~~~~Y. Wai(J8JJ-1)) p i .jY--I (^) + (^) I +J-
tion" term.
GAUSS-MARKOV ESTIMATION 447
of EY21, is (w, (^) Y')sj since w £ 01 and E(w, Y') = (^) (w, M1) = M21. Further,
The relationship (w, x) =^ 0 for x (^) C Q1 determines U. Hence, w spans the one- dimensional UN^ -^ U.^ Also,^ (I^ +^ A)w^ =^ w^ +^ jjwll2s^ spans^ (I^ +^ A)(Q1-^ U). Hence, PQ1Y'= Pn1P(I+Ai)9,[Y' +^ (W, YP)sA],
and from (4.3) or (4.2)
since s I (^) f1. Hence the desired y21 E Q21, if it exists, must satisfy
direct sum of U and the manifold spanned by w + (^) IWI 12S1, (5.5) may be written
or, simplifying,
(5.7) b - 11H21 2(W, Yl) +^ flIS12(1 +^ flWjj2jsll2)[(, y2)/11S112]
Observe that this is a weighted average of (w, Y1), the si coordinate of the
448 FOURTH BERKELEY SYMPOSIUM: KRUSKAL
Further, (^) llwli2 is (^) l/l-2 times the variance of the replicated observation's (^) expec- tation, as estimated from Y'. So we may summarize the result as follows, for
Let y be the Gauss-Markov estimator of the replicated observation's expecta- tion from Y1 alone. Let Var y = 02o2. Let y be the Gauss-Markov estimator of the same quantity from Y2 alone. (It will be the arithmetic average of the replicated observations.)
In the special case m =^1 (original design), the weights are 0 and 1, as they
the above analysis but can be handled by similar (^) methods, the (^) weights are 1
(IJ + m - l)-dimensional space (^) {Yi, for i = (^) 1, * *, I; j = (^) 1, (^) *, J; (i, (^) j).^ (I, J); Y1J1; YIJ2, * * *, YlJ,j}. The^ three groups of coordinates cor- responding to V1, V21, and V22 are separated by semicolons. The U's are given via EY (^) ,j= (^) p + aci + f3and EYIjj = (^) , + XrI + (^) oj for I = (^) 1, (^) *--, m. We
coordinates are one.
[I(I -^ 1)][J/(J -^ 1) [J-'i5i + (^) I-16,jJ -^ (IJ)-1]; of course, w has zero (IJI) coordinates.^ Further, (w, Y'), the estimator of p + (^) a, + (^) #3T from Y' (^) alone, is (^) [I/(I - (^) 1)][J/(J - (^) 1)][J-1YI0 + I-YoJ - (^) (IJ)-1YoEa]- Next, observe^ that (s, Y2)/11s112 =^ Elm Y1I/m =^ YI. We compute that (^) 11w112 =^ [I/(I -^ 1)][J/(J- 1)]X(1 - ) = (I + J - 1)/[(I -1)(J - 1)] so that the weights are 1 - m and mIJ/[(I -^ 1)(J -1)] respectively. Hence
(5.9) b^ =^ (1^ m)IJrJ(1Y10 +^ I'JY0J -^ (IJ)'Y00] +^ IJ ElKuz
and we need only use this in place of all the (^) YrJI and apply the symmetrical explicit formulas of ordinary Gauss-Markov estimation to the resulting IJ-fold balanced (^) array.
results as in the last section.
450 FOURTH BERKELEY SYMPOSIUM: KRUSKAL
spanned by a^ single vector that^ is set up to^ correspond^ with^ si, that^ is,^ EY21 = (w, (^) ul)s1i and EY22 =^ (w, (^) /A)S2.
be easy for we would have a k-fold replication of a design for which we have the Gauss-Markov estimators. Although we do not have Y22, we can ask whether there is a y2 E (^) Q2 such that (6.2) P(I+Ai)n1PQ(Y1 +^ y2) =^ P(I+A.)n1(Y' +^ Y21) where A1,.' =^ -P=^ (w, (^) A')sl. In this case,^ with^ the orthonormal coordinates^ we have in^ mind, (^) 1Isi112 =^ m and (^) 1IS112 =^ k.
[(SI, (^) Y2l)/l11siI12]S. We^ can^ write^ Q^ as^ the direct^ sum^ of^ orthogonal subspaces,
out the^ operations indicated^ by (6.2), we^ obtain
(6.3) a =^ g1S2^12 (6.3) a = ~~~~~11s112(l +^ IIWI1211Si112) Then, putting inlls^12 =^ m^ along with^ IS2l 12 =^ k -m andflS^112 =^ k,^ a^ legitimate
final paragraph of the discussion of the case m _ k holds verbatim.
follows. (^) Suppose that (^) Po[Y' + Y2] is known explicitly, but that Y = yl + Y is (^) not weakly spherical. Suppose further (i) that Y' ranges over an (n - 1)-
(ii) that Y2^ ranges over a one-dimensional space orthogonal to that of Y'^ and in
rational]. Let y be the Gauss-Markov^ estimator,^ based^ on^ Y'^ alone,^ of^ the coordinate
Let (^) y be the (^) coordinate of Y2 with respect to the same unit vector.
words, "and rational," above may be omitted by continuity of Gauss-Markov
tionis....
and then either decide that the suspect observation is not an outlier and handle
GAUSS-MARKOV ESTIMATION^451 it in the usual way for analysis, or else decide that the observation is an outlier and omit it completely from analysis. One might, however, consider intermediate positions in which a suspect ob- servation is treated with a lower weight than the rest, that is, has an imputed variance higher than that of the other observations. To completely omit the observation (^) is, in effect, to give it an infinite variance, but why go that far? If there is only one such suspect^ observation,^ the^ method^ described above permits the relatively simple incorporation of the^ suspect observation into Gauss- Markov estimation, provided that the ratio of its imputed variance to^ the variance of the other observations is given.
REFERENCES [1] M. S. BARTLETT, "The vector representation of a sample," Proc. Camnbridge Philos. Soc., Vol. 30 (1933-34), pp.327-340. [2] L. C. A. CORSTEN, "Vectors, a tool in statistical regression theory," Meded. Landbouwhoge- school (^) Wageningen, Vol. 58 (1958), pp. 1-92. [3] J. DURBIN^ and^ M.^ G.^ KENDALL, "The^ geometry of^ estimation," Biometrika, Vol.^38 (1951), pp. 150-158. [4] D. A. S. FRASER, "On the combining of interblock and^ intrablock^ estimates,"^ Ann.^ Math. Statist., Vol. 28 (1957), pp. 814-816. [5] C.-F. GAUSS, Methode des Moindres Carres, Paris, Mallet-Bachelier, 1855; translation^ into French by J. Bertrand of Gauss's works on least squares. (A translation into English by H. F. Trotter is Technical Report No.5, Statistical Techniques Research Group, Princeton, 1957.) [6] P.^ R.^ HALMOS,^ Finite-Dimensional^ Vector^ Spaces,^ Princeton,^ Van^ Nostrand^ Press,^1958 (2nd ed.). [7] A. N.^ KOLMOGOROV, "On the^ proof^ of the^ method^ of least^ squares,"^ Uspehi^ Mat.^ Nauk, Vol. 1 (1946), pp. 57-70.^ (In Russian.) [8] W.^ KRUSKAL, "Discussion^ of the^ papers of^ Messrs.^ Anscombe^ and^ Daniel,"^ Technometrics, Vol. 2 (1960), pp. 157-166. (Pages indicated also include discussion^ by T.^ S.^ Ferguson, J. W. Tukey, and E. J. Gumbel.) [9] R. L. PLACKETT, "Some theorems in least squares," Biometrika, Vol. 37^ (1950), pp. 149-157. [10] K. D. TOCHER, "The design and analysis of block experiments," J.^ Roy. Statist.^ Soc., Ser. B, Vol. 14 (1952), pp. 45-91. Discussion, pp. 91-100. [11] G. N.^ WILKINSON, "Estimation^ of^ missing^ values^ for^ the^ analysis^ of^ incomplete^ data," Biometrics, Vol.^14 (1958), pp.^ 257-286. [12] M.^ A.^ WOODBURY, "Modified matrix functions and^ applications,"^ unpublished^ paper presented at^ the^ IBM^ Seminar for^ Directors^ of^ Computing Centers,^ June^ 26,^ 1959.