Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Asymptotically Efficient Estimation - Lecture Notes | STAT 9220, Study notes of Statistics

Material Type: Notes; Professor: Rempala; Class: Advanced Statistical Inference; Subject: Statistics; University: Medical College of Georgia; Term: Spring 2009;

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-5km
koofers-user-5km 🇺🇸

10 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 9220
Lecture 14
Asymptotically Efficient Estimation
Greg Rempala
Department of Biostatistics
Medical College of Georgia
Apr 21, 2009
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Asymptotically Efficient Estimation - Lecture Notes | STAT 9220 and more Study notes Statistics in PDF only on Docsity!

STAT 9220

Lecture 14

Asymptotically Efficient Estimation

Greg Rempala

Department of Biostatistics

Medical College of Georgia

Apr 21, 2009

14.1 Asymptotic comparison

Let {̂ θn} be a sequence of estimators of θ based on a sequence of samples {X = (X 1 , ..., Xn) : n = 1, 2 , ...}. Suppose that as n → ∞, ̂θn is asymptotically normal (AN) in the sense that

[Vn(θ)]−^1 /^2 (̂θn − θ) →d Nk(0, Ik),

where, for each n, Vn(θ) is a k × k positive definite matrix depending on θ.

If θ is one-dimensional (k = 1), then Vn(θ) is the asymptotic variance as well as the amse of θ̂n (text §2.5.2).

When k > 1, Vn(θ) is called the asymptotic covariance matrix of θ̂n and can be used as a measure of asymptotic performance of estimators.

If θ̂jn is AN with asymptotic covariance matrix Vjn(θ), j = 1, 2, and

V 1 n(θ) ≤ V 2 n(θ)

(in the sense that V 2 n(θ) − V 1 n(θ) is nonnegative definite) for all θ ∈ Θ, then ̂θ 1 n is said to be asymptotically more efficient than θ̂ 2 n.

14.2 Information Inequality

If θ̂n is AN, it is asymptotically unbiased. If Vn(θ) = Var( θ̂n), then, under some regularity conditions, it follows (Theorem 3.3 in text) that we have the following information inequality

Vn(θ) ≥ [In(θ)]−^1 ,

where, for every n, In(θ) is the Fisher information matrix for X of size n. The information inequality may lead to an optimal estimator.

Unfortunately, when Vn(θ) is an asymptotic covariance matrix, the information in- equality may not hold (even in the limiting sense), even if the regularity conditions are satisfied.

Example 14.2.1. Example (Hodges) Let X 1 , ..., Xn be i.i.d. from N (θ, 1), θ ∈ R. Then In(θ) = n. For a fixed constant t, define

θ̂n =

X | X¯| ≥ n−^1 /^4 t X¯ | X¯| < n−^1 /^4 ,

By Proposition 3.2, all conditions in Theorem 3.3 are satisfied. It can be shown (exercise) that ̂θn is AN with Vn(θ) = V (θ)/n, where V (θ) = 1 if θ 6 = 0 and V (θ) = t^2 if θ = 0. If t^2 < 1, the information inequality does not hold when θ = 0.

However, the following result, due to Le Cam (1953), shows that, for i.i.d. Xi’s, the information inequality holds except for θ in a set of Lebesgue measure 0.

Theorem 14.2.1. Let X 1 , ..., Xn be i.i.d. from a p.d.f. fθ w.r.t. a σ-finite measure ν on (R, B), where θ ∈ Θ and Θ is an open set in Rk. Suppose that for every x in the range of X 1 , fθ(x) is twice continuously differen- tiable in θ and satisfies

∂ ∂θ

ψθ(x)dν =

∂θ

ψθ(x)dν

for ψθ(x) = fθ(x) and = ∂fθ(x)/∂θ; the Fisher information matrix

I 1 (θ) = E

∂θ

log fθ(X 1 )

[

∂θ

log fθ(X 1 )

]>}

is positive definite; and for any given θ ∈ Θ, there exists a positive number cθ and a positive function hθ such that E[hθ(X 1 )] < ∞ and

sup γ:‖γ−θ‖<cθ

∥∥^ ∂

(^2) log fγ (x) ∂γ∂γ>

∥∥ ≤ hθ(x)

for all x in the range of X 1 , where ‖A‖ =

tr(A>A) for any matrix A. If ̂θn is an estimator of θ (based on X 1 , ..., Xn) and is AN with Vn(θ) = V (θ)/n, then there is a Θ 0 ⊂ Θ with Lebesgue measure 0 such that the information inequality holds if θ 6 ∈ Θ 0.

Thus, the information inequality becomes

[∇g(θ)]>Vn(θ)∇g(θ) ≥ [ I˜n(ϑ)]−^1 ,

where I˜n(ϑ) is the Fisher information matrix about ϑ contained in X. If p = k and g is one-to-one, then

[ I˜n(ϑ)]−^1 = [∇g(θ)]>[In(θ)]−^1 ∇g(θ)

and, therefore, ϑ̂ n is asymptotically efficient if and only if ̂θn is asymptotically efficient. For this reason, in the case of p < k, ϑ̂ n is considered to be asymptotically efficient if and only if θ̂n is asymptotically efficient, and we can focus on the estimation of θ only.

14.4 Asymptotic efficiency of MLE’s and RLE’s in the

i.i.d. case

It turns out that under some regularity conditions, a root of the likelihood equation (RLE), which is a candidate for an MLE, is asymptotically efficient.

Theorem 14.4.1. Assume the conditions of LeCam’s thm (Theorem14.2.1). (i) There is a sequence of estimators {̂ θn} such that

P

sn( θ̂n) = 0

→ 1 and ̂θn →p θ,

where sn(γ) = ∂ log `(γ)/∂γ. (ii) Any consistent sequence θ˜n of RLE’s is asymp- totically efficient.

Remark 14.4.1.

  • Part (i) is asymptotic existence and consistency.
  • If the RLE is unique, then it is consistent and asymptotically efficient, whether or not it is MLE.
  • If there are more than one sequences of RLE, the theorem does not tell which one is consistent and asymptotically efficient.
  • An MLE sequence is often consistent, but this needs to be verified.

Note that

E

‖∇sn(γ∗) − ∇sn(θ)‖ n

≤ E max γ∈Bn(c)

‖∇sn(γ) − ∇sn(θ)‖ n

≤ E max γ∈Bn(c)

∂^2 log fγ (X 1 ) ∂γ∂γ>^

∂^2 log fθ(X 1 ) ∂θ∂θ>

which follows from (a) ∂^2 log fγ (x)/∂γ∂γ>^ is continuous in a neighborhood of θ for any fixed x; (b) Bn(c) shrinks to {θ}; and (c) for sufficiently large n,

max γ∈Bn(c)

∥∥^ ∂

(^2) log fγ (X 1 ) ∂γ∂γ>^

∂^2 log fθ(X 1 ) ∂θ∂θ>

∥∥ ≤ 2 hθ(X 1 )

under the regularity condition. By the SLLN (text Theorem 1.13) and Proposition 3.1, n−^1 ∇sn(θ) →a.s. −I 1 (θ) (i.e., ‖n−^1 ∇sn(θ) + I 1 (θ)‖ →a.s. 0).

These results, together with (14.2), imply that log (γ) − log(θ) = cλ>[In(θ)]−^1 /^2 sn(θ) − [1 + op(1)]c^2 / 2. (14.4)

Note that maxλ{λ>[In(θ)]−^1 /^2 sn(θ)} = ‖[In(θ)]−^1 /^2 sn(θ)‖. Hence, (14.1) follows from (14.4) and

P

‖[In(θ)]−^1 /^2 sn(θ)‖ < c/ 4

≥ 1 − (4/c)^2 E‖[In(θ)]−^1 /^2 sn(θ)‖^2 = 1 − k(4/c)^2 = 1 − 

This completes the proof of (i). (ii) Let A = {γ : ‖γ − θ‖ ≤ } for  > 0. Since Θ is open, A ⊂ Θ for sufficiently small . Let {θ˜n} be a sequence of consistent RLE’s, i.e., P (sn(θ˜n) = 0 and θ˜n ∈ A) → 1 for any  > 0. Hence, we can focus on the set on which sn(θ˜n) = 0 and θ˜n ∈ A. Using the mean-value theorem for vector-valued functions, we obtain

−sn(θ) =

[∫ 1

0

∇sn

θ + t(θ˜n − θ)

dt

]

(θ˜n − θ).