Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Unbiased Estimation and Cramér-Rao Inequality, Exams of Probability and Statistics

An introduction to unbiased estimation and the Cramér-Rao inequality. It covers concepts such as bias, mean square error, consistent estimators, and the Fisher information. The document also includes examples for estimating the mean and variance of a distribution, as well as the efficiency of estimators.

Typology: Exams

2021/2022

Uploaded on 09/12/2022

agrima
agrima 🇺🇸

4.8

(10)

257 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Topic 14
Unbiased Estimation
14.1 Introduction
In creating a parameter estimator, a fundamental question is whether or not the estimator differs from the parameter
in a systematic manner. Let’s examine this by looking a the computation of the mean and the variance of 16 flips of a
fair coin.
Give this task to 10 individuals and ask them report the number of heads. We can simulate this in Ras follows
> (x<-rbinom(10,16,0.5))
[1] 85977978810
Our estimate is obtained by taking these 10 answers and averaging them. Intuitively we anticipate an answer
around 8. For these 10 observations, we find, in this case, that
> sum(x)/10
[1] 7.8
The result is a bit below 8. Is this systematic? To assess this, we appeal to the ideas behind Monte Carlo to perform
a 1000 simulations of the example above.
> meanx<-rep(0,1000)
> for (i in 1:1000){meanx[i]<-mean(rbinom(10,16,0.5))}
> mean(meanx)
[1] 8.0049
From this, we surmise that we the estimate of the sample mean ¯xneither systematically overestimates or un-
derestimates the distributional mean. From our knowledge of the binomial distribution, we know that the mean
µ=np = 16 ·0.5=8. In addition, the sample mean ¯
Xalso has mean
E¯
X=1
10(8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8) = 80
10 =8
verifying that we have no systematic error.
The phrase that we use is that the sample mean ¯
Xis an unbiased estimator of the distributional mean µ. Here is
the precise definition.
Definition 14.1. For observations X=(X1,X
2,...,X
n)based on a distribution having parameter value , and for
d(X)an estimator for h(), the bias is the mean of the difference d(X)h(), i.e.,
bd()=Ed(X)h().(14.1)
If bd()=0for all values of the parameter, then d(X)is called an unbiased estimator. Any estimator that is not
unbiased is called biased.
205
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Unbiased Estimation and Cramér-Rao Inequality and more Exams Probability and Statistics in PDF only on Docsity!

Topic 14

Unbiased Estimation

14.1 Introduction

In creating a parameter estimator, a fundamental question is whether or not the estimator differs from the parameter in a systematic manner. Let’s examine this by looking a the computation of the mean and the variance of 16 flips of a fair coin. Give this task to 10 individuals and ask them report the number of heads. We can simulate this in R as follows

(x<-rbinom(10,16,0.5)) [1] 8 5 9 7 7 9 7 8 8 10 Our estimate is obtained by taking these 10 answers and averaging them. Intuitively we anticipate an answer around 8. For these 10 observations, we find, in this case, that

sum(x)/ [1] 7.

The result is a bit below 8. Is this systematic? To assess this, we appeal to the ideas behind Monte Carlo to perform a 1000 simulations of the example above.

meanx<-rep(0,1000) for (i in 1:1000){meanx[i]<-mean(rbinom(10,16,0.5))} mean(meanx) [1] 8.

From this, we surmise that we the estimate of the sample mean x¯ neither systematically overestimates or un- derestimates the distributional mean. From our knowledge of the binomial distribution, we know that the mean μ = np = 16 · 0 .5 = 8. In addition, the sample mean X¯ also has mean

E X¯ =

verifying that we have no systematic error. The phrase that we use is that the sample mean X¯ is an unbiased estimator of the distributional mean μ. Here is the precise definition.

Definition 14.1. For observations X = (X 1 , X 2 ,... , X (^) n ) based on a distribution having parameter value ✓, and for d(X) an estimator for h(✓), the bias is the mean of the difference d(X) h(✓), i.e.,

b (^) d (✓) = E (^) ✓ d(X) h(✓). (14.1)

If b (^) d (✓) = 0 for all values of the parameter, then d(X) is called an unbiased estimator. Any estimator that is not unbiased is called biased.

Example 14.2. Let X 1 , X 2 ,... , X (^) n be Bernoulli trials with success parameter p and set the estimator for p to be d(X) = X¯, the sample mean. Then,

E (^) p X¯ =

n

(EX 1 + EX 2 + · · · + EX (^) n ) =

n

(p + p + · · · + p) = p

Thus, X¯ is an unbiased estimator for p. In this circumstance, we generally write pˆ instead of X¯. In addition, we can use the fact that for independent random variables, the variance of the sum is the sum of the variances to see that

Var(ˆp) =

n 2

(Var(X 1 ) + Var(X 2 ) + · · · + Var(X (^) n ))

=

n 2

(p(1 p) + p(1 p) + · · · + p(1 p)) =

n

p(1 p).

Example 14.3. If X 1 ,... , X (^) n form a simple random sample with unknown finite mean μ, then X¯ is an unbiased estimator of μ. If the X (^) i have variance 2 , then

Var( X¯) =

n

We can assess the quality of an estimator by computing its mean square error, defined by

E (^) ✓ [(d(X) h(✓)) 2 ]. (14.3)

Estimators with smaller mean square error are generally preferred to those with larger. Next we derive a simple relationship between mean square error and variance. We begin by substituting (14.1) into (14.3), rearranging terms, and expanding the square.

E (^) ✓ [(d(X) h(✓)) 2 ] = E (^) ✓ [(d(X) (E (^) ✓ d(X) b (^) d (✓))) 2 ] = E (^) ✓ [((d(X) E (^) ✓ d(X)) + b (^) d (✓)) 2 ] = E (^) ✓ [(d(X) E (^) ✓ d(X)) 2 ] + 2b (^) d (✓)E (^) ✓ [d(X) E (^) ✓ d(X)] + b (^) d (✓) 2 = Var✓ (d(X)) + b (^) d (✓) 2

Thus, the representation of the mean square error as equal to the variance of the estimator plus the square of the bias is called the bias-variance decomposition. In particular:

  • The mean square error for an unbiased estimator is its variance.
  • Bias always increases the mean square error.

14.2 Computing Bias

For the variance 2 , we have been presented with two choices:

1 n

X^ n

i=

(x (^) i ¯x) 2 and

n 1

X^ n

i=

(x (^) i ¯x) 2. (14.4)

Using bias as our criterion, we can now resolve between the two choices for the estimators for the variance 2. Again, we use simulations to make a conjecture, we then follow up with a computation to verify our guess. For 16 tosses of a fair coin, we know that the variance is np(1 p) = 16 · 1 / 2 · 1 /2 = 4 For the example above, we begin by simulating the coin tosses and compute the sum of squares

P 10

i=1 (x^ i^ ^ x¯)^

ssx<-rep(0,1000) for (i in 1:1000){x<-rbinom(10,16,0.5);ssx[i]<-sum((x-mean(x))ˆ2)} mean(ssx) [1] 35.

Using the identity above and the linearity property of expectation we find that

ES 2 = E

n

X^ n

i=

(X (^) i X¯) 2

= E

n

X^ n

i=

(X (^) i μ) 2 ( X¯ μ) 2

n

X^ n

i=

E[(X (^) i μ) 2 ] E[( X¯ μ) 2 ]

n

X^ n

i=

Var(X (^) i ) Var( X¯)

n

n 2

n

n 1 n

The last line uses (14.2). This shows that S 2 is a biased estimator for 2. Using the definition in (14.1), we can see that it is biased downwards.

b( 2 ) = n 1 n

n

Note that the bias is equal to Var( X¯). In addition, because

E

n n 1

S 2

n n 1

E

S 2

n n 1

n 1 n

and

S (^2) u =

n n 1

S 2 =

n 1

X^ n

i=

(X (^) i X¯) 2

is an unbiased estimator for 2. As we shall learn in the next section, because the square root is concave downward, S (^) u =

p S (^2) u as an estimator for is downwardly biased.

Example 14.6. We have seen, in the case of n Bernoulli trials having x successes, that pˆ = x/n is an unbiased estimator for the parameter p. This is the case, for example, in taking a simple random sample of genetic markers at a particular biallelic locus. Let one allele denote the wildtype and the second a variant. If the circumstances in which variant is recessive, then an individual expresses the variant phenotype only in the case that both chromosomes contain this marker. In the case of independent alleles from each parent, the probability of the variant phenotype is p 2. Na¨ıvely, we could use the estimator pˆ 2. (Later, we will see that this is the maximum likelihood estimator.) To determine the bias of this estimator, note that

E pˆ 2 = (E pˆ) 2 + Var(ˆp) = p 2 +

n

p(1 p). (14.5)

Thus, the bias b(p) = p(1 p)/n and the estimator pˆ 2 is biased upward.

Exercise 14.7. For Bernoulli trials X 1 ,... , X (^) n ,

1 n

X^ n

i=

(X (^) i pˆ) 2 = ˆp(1 ˆp).

Based on this exercise, and the computation above yielding an unbiased estimator, S (^2) u , for the variance,

E

n 1

ˆp(1 ˆp)

n

E

n 1

X^ n

i=

(X (^) i pˆ) 2

n

E[S (^2) u ] =

n

Var(X 1 ) =

n

p(1 p).

In other words, 1 n 1

pˆ(1 pˆ)

is an unbiased estimator of p(1 p)/n. Returning to (14.5),

E

p ˆ 2

n 1

pˆ(1 pˆ)

p 2 +

n

p(1 p)

n

p(1 p) = p 2.

Thus,

p^ b (^2) u = ˆp 2 1 n 1

pˆ(1 pˆ)

is an unbiased estimator of p 2. To compare the two estimators for p 2 , assume that we find 13 variant alleles in a sample of 30, then pˆ = 13/30 =

  1. 4333 ,

p ˆ 2 =

= 0. 1878 , and pb (^2) u =

The bias for the estimate pˆ 2 , in this case 0.0085, is subtracted to give the unbiased estimate pb (^2) u. The heterozygosity of a biallelic locus is h = 2p(1p). From the discussion above, we see that h has the unbiased estimator ˆh = 2 n n 1

pˆ(1 pˆ) =

2 n n 1

⇣ (^) x n

⌘ ✓^ n x n

2 x(n x) n(n 1)

14.3 Compensating for Bias

In the methods of moments estimation, we have used g( X¯) as an estimator for g(μ). If g is a convex function, we can say something about the bias of this estimator. In Figure 14.2, we see the method of moments estimator for the estimator g( X¯) for a parameter in the Pareto distribution. The choice of = 3 corresponds to a mean of μ = 3/ 2 for the Pareto random variables. The central limit theorem states that the sample mean X¯ is nearly normally distributed with mean 3/2. Thus, the distribution of X¯ is nearly symmetric around 3/2. From the figure, we can see that the interval from 1.4 to 1.5 under the function g maps into a longer interval above = 3 than the interval from 1.5 to 1. maps below = 3. Thus, the function g spreads the values of X¯ above = 3 more than below. Consequently, we anticipate that the estimator ˆ will be upwardly biased. To address this phenomena in more general terms, we use the characterization of a convex function as a differen- tiable function whose graph lies above any tangent line. If we look at the value μ for the convex function g, then this statement becomes g(x) g(μ) g 0 (μ)(x μ).

Now replace x with the random variable X¯ and take expectations.

E (^) μ [g( X¯) g(μ)] E (^) μ [g 0 (μ)( X¯ μ)] = g 0 (μ)E (^) μ [ X¯ μ] = 0.

Consequently, E (^) μ g( X¯) g(μ) (14.6)

and g( X¯) is biased upwards. The expression in (14.6) is known as Jensen’s inequality.

Exercise 14.8. Show that the estimator S (^) u is a downwardly biased estimator for .

To estimate the size of the bias, we look at a quadratic approximation for g centered at the value μ

g(x) g(μ) ⇡ g 0 (μ)(x μ) +

g 00 (μ)(x μ) 2.

Thus, the bias

b (^) g () ⇡

g 00 (μ)

n

n( 1) 2 ( 2)

n( 2)

So, for = 3 and n = 100, the bias is approximately 0.06. Compare this to the estimated value of 0.053 from the simulation in the previous section.

Example 14.11. For estimating the population in mark and recapture, we used the estimate

N = g(μ) = kt μ

for the total population. Here μ is the mean number recaptured, k is the number captured in the second capture event and t is the number tagged. The second derivative

g 00 (μ) = 2 kt μ 3

and hence the method of moments estimate is biased upwards. In this siutation, n = 1 and the number recaptured is a hypergeometric random variable. Hence its variance

kt N

(N t)(N k) N (N 1)

Thus, the bias

b (^) g (N ) =

2 kt μ 3

kt N

(N t)(N k) N (N 1)

(N t)(N k) μ(N 1)

(kt/μ t)(kt/μ k) μ(kt/μ 1)

kt(k μ)(t μ) μ 2 (kt μ)

In the simulation example, N = 2000, t = 200, k = 400 and μ = 40. This gives an estimate for the bias of 36.02. We can compare this to the bias of 2031.03-2000 = 31.03 based on the simulation in Example 13.2. This suggests a new estimator by taking the method of moments estimator and subtracting the approximation of the bias.

N^ ˆ = kt r

kt(k r)(t r) r 2 (kt r)

kt r

(k r)(t r) r(kt r)

The delta method gives us that the standard deviation of the estimator is |g 0 (μ)|/

p n. Thus the ratio of the bias of an estimator to its standard deviation as determined by the delta method is approximately

g 00 (μ) 2 /(2n) |g 0 (μ)|/

p n

g 00 (μ) |g 0 (μ)|

p n

If this ratio is ⌧ 1 , then the bias correction is not very important. In the case of the example above, this ratio is

  1. 02
  2. 40

and its usefulness in correcting bias is small.

14.4 Consistency

Despite the desirability of using an unbiased estimator, sometimes such an estimator is hard to find and at other times impossible. However, note that in the examples above both the size of the bias and the variance in the estimator decrease inversely proportional to n, the number of observations. Thus, these estimators improve, under both of these criteria, with more observations. A concept that describes properties such as these is called consistency.

Definition 14.12. Given data X 1 , X 2 ,... and a real valued function h of the parameter space, a sequence of estima- tors d (^) n , based on the first n observations, is called consistent if for every choice of ✓

lim n! d (^) n (X 1 , X 2 ,... , X (^) n ) = h(✓)

whenever ✓ is the true state of nature.

Thus, the bias of the estimator disappears in the limit of a large number of observations. In addition, the distribution of the estimators d (^) n (X 1 , X 2 ,... , X (^) n ) become more and more concentrated near h(✓).

For the next example, we need to recall the sequence definition of continuity: A function g is continuous at a real number x provided that for every sequence {x (^) n ; n 1 } with

x (^) n! x, then, we have that g(x (^) n )! g(x).

A function is called continuous if it is continuous at every value of x in the domain of g. Thus, we can write the expression above more succinctly by saying that for every convergent sequence {x (^) n ; n 1 },

lim n! g(x (^) n ) = g( lim n! x (^) n ).

Example 14.13. For a method of moment estimator, let’s focus on the case of a single parameter (d = 1). For independent observations, X 1 , X 2 ,... , having mean μ = k(✓), we have that

E X¯ (^) n = μ,

i. e. X¯ (^) n , the sample mean for the first n observations, is an unbiased estimator for μ = k(✓). Also, by the law of large numbers, we have that

lim n! X^ ¯ (^) n = μ.

Assume that k has a continuous inverse g = k ^1. In particular, because μ = k(✓), we have that g(μ) = ✓. Next, using the methods of moments procedure, define, for n observations, the estimators

✓^ ˆ (^) n (X 1 , X 2 ,... , X (^) n ) = g

n

(X 1 + · · · + X (^) n )

= g( X¯ (^) n ).

for the parameter ✓. Using the continuity of g, we find that

lim n! ✓^ ˆ (^) n (X 1 , X 2 ,... , X (^) n ) = lim n! g( X¯ (^) n ) = g( lim n! X^ ¯ (^) n ) = g(μ) = ✓

and so we have that g( X¯ (^) n ) is a consistent sequence of estimators for ✓.

14.5 Cram´er-Rao Bound

This topic is somewhat more advanced and can be skipped for the first reading. This section gives us an introduction to the log-likelihood and its derivative, the score functions. We shall encounter these functions again when we introduce maximum likelihood estimation. In addition, the Cram´er Rao bound, which is based on the variance of the score function, known as the Fisher information, gives a lower bound for the variance of an unbiased estimator. These concepts will be necessary to describe the variance for maximum likelihood estimators.

Among unbiased estimators, one important goal is to find an estimator that has as small a variance as possible, A more precise goal would be to find an unbiased estimator d that has uniform minimum variance. In other words, d(X) has has a smaller variance than for any other unbiased estimator d˜ for every value ✓ of the parameter.

Now, return to the review on correlation with Y = d(X), the unbiased estimator for h(✓) and the score function Z = @ ln f (X|✓)/@✓. From equations (14.14) and then (14.9), we find that

h 0 (✓) 2 = E (^) ✓

d(X)

@ ln f (X|✓) @✓

= Cov✓

d(X),

@ ln f (X|✓) @✓

 Var✓ (d(X))Var✓

@ ln f (X|✓) @✓

or,

Var✓ (d(X))

h 0 (✓) 2 I(✓)

where

I(✓) = Var✓

@ ln f (X|✓) @✓

= E ✓

@ ln f (X|✓) @✓

is called the Fisher information. For the equality, recall that the variance Var(Z) = EZ 2 (EZ) 2 and recall from equation (14.13) that the random variable Z = @ ln f (X|✓)/@✓ has mean EZ = 0. Equation (14.15), called the Cram´er-Rao lower bound or the information inequality, states that the lower bound for the variance of an unbiased estimator is the reciprocal of the Fisher information. In other words, the higher the information, the lower is the possible value of the variance of an unbiased estimator.

If we return to the case of a simple random sample, then take the logarithm of both sides of equation (14.10)

ln f (x|✓) = ln f (x 1 |✓) + · · · + ln f (x (^) n |✓)

and then differentiate with respect to the parameter ✓,

@ ln f (x|✓) @✓

@ ln f (x 1 |✓) @✓

@ ln f (x (^) n |✓) @✓

The random variables {@ ln f (X (^) k |✓)/@✓; 1  k  n} are independent and have the same distribution. Using the fact that the variance of the sum is the sum of the variances for independent random variables, we see that I (^) n , the Fisher information for n observations is n times the Fisher information of a single observation.

I (^) n (✓) = Var

@ ln f (X 1 |✓) @✓

@ ln f (X (^) n |✓) @✓

= nVar(

@ ln f (X 1 |✓) @✓

) = nE[(

@ ln f (X 1 |✓) @✓

) 2 ].

Notice the correspondence. Information is linearly proportional to the number of observations. If our estimator is a sample mean or a function of the sample mean, then the variance is inversely proportional to the number of observations.

Example 14.15. For independent Bernoulli random variables with unknown success probability ✓, the density is

f (x|✓) = ✓ x^ (1 ✓) (1x)^.

The mean is ✓ and the variance is ✓(1 ✓). Taking logarithms, we find that

ln f (x|✓) = x ln ✓ + (1 x) ln(1 ✓),

@ @✓

ln f (x|✓) =

x ✓

1 x 1 ✓

x ✓ ✓(1 ✓)

The Fisher information associated to a single observation

I(✓) = E

ln f (X|✓)

E[(X ✓) 2 ] =

Var(X)

Thus, the information for n observations I (^) n (✓) = n/(✓(1 ✓)). Thus, by the Cram´er-Rao lower bound, any unbiased estimator of ✓ based on n observations must have variance al least ✓(1 ✓)/n. Now, notice that if we take d(x) = ¯x, then

E (^) ✓ X¯ = ✓, and Var✓ d(X) = Var( X¯) =

n

These two equations show that X¯ is a unbiased estimator having uniformly minimum variance.

Exercise 14.16. For independent normal random variables with known variance 20 and unknown mean μ, X¯ is a uniformly minimum variance unbiased estimator.

Exercise 14.17. Take two derivatives of ln f (x|✓) to show that

I(✓) = E ✓

@ ln f (X|✓) @✓

= E ✓

@ 2 ln f (X|✓) @✓ 2

This identity is often a useful alternative to compute the Fisher Information.

Example 14.18. For an exponential random variable,

ln f (x|) = ln x, @ 2 f (x|) @^2

Thus, by (14.16),

I() =

Now, X¯ is an unbiased estimator for h() = 1/ with variance

1 n 2

By the Cram´er-Rao lower bound, we have that

g 0 () 2 nI()

n 2

n 2

Because X¯ has this variance, it is a uniformly minimum variance unbiased estimator.

Example 14.19. To give an estimator that does not achieve the Cram´er-Rao bound, let X 1 , X 2 ,... , X (^) n be a simple random sample of Pareto random variables with density

f (^) X (x|) =

x +^

, x > 1.

The mean and the variance

μ =

Thus, X¯ is an unbiased estimator of μ = /( 1)

Var( X¯) =

n( 1) 2 ( 2)

To compute the Fisher information, note that

ln f (x|) = ln ( + 1) ln x and thus

@ 2 ln f (x|) @ 2

Thus, Poisson random variables are an exponential family with c() = exp(), h(x) = 1/x!, and natural parameter ⇡() = ln . Because = E (^) X,¯

X^ ¯ is an unbiased estimator of the parameter . The score function @ @

ln f (x|) =

(x ln ln x! ) =

x

The Fisher information for one observation is

I() = E

X

E [(X ) 2 ] =

Thus, I (^) n () = n/ is the Fisher information for n observations. In addition,

Var ( X¯) =

n

and d(x) = ¯x has efficiency Var( X¯) 1 /I (^) n ()

This could have been predicted. The density of n independent observations is

f (x|) =

e x 1!

x^1 · · ·

e x (^) n!

x^ n^ =

e n^ x^1 ···+x^ n x 1! · · · x (^) n!

e n^ nx¯ x 1! · · · x (^) n!

and so the score function @ @

ln f (x|) =

(n + n¯x ln ) = n +

nx¯

showing that the estimate x¯ and the score function are linearly related.

Exercise 14.21. Show that a Bernoulli random variable with parameter p is an exponential family.

Exercise 14.22. Show that a normal random variable with known variance 02 and unknown mean μ is an exponential family.

14.7 Answers to Selected Exercises

14.4. Repeat the simulation, replacing mean(x) by 8.

ssx<-rep(0,1000) for (i in 1:1000){x<-rbinom(10,16,0.5);ssx[i]<-sum((x-8)ˆ2)} mean(ssx)/10;mean(ssx)/ [1] 3. [1] 4.

Note that division by 10 gives an answer very close to the correct value of 4. To verify that the estimator is unbiased, we write

E

n

X^ n

i=

(X (^) i μ) 2

n

X^ n

i=

E[(X (^) i μ) 2 ] =

n

X^ n

i=

Var(X (^) i ) =

n

X^ n

i=

14.7. For a Bernoulli trial note that X (^) i^2 = X (^) i. Expand the square to obtain

X^ n

i=

(X (^) i pˆ) 2 =

X^ n

i=

X (^) i^2 ˆp

X^ n

i=

X (^) i + npˆ 2 = npˆ 2 npˆ 2 + nˆp 2 = n(ˆp pˆ 2 ) = npˆ(1 pˆ).

Divide by n to obtain the result.

14.8. Recall that ES (^) u^2 = 2. Check the second derivative to see that g(t) =

p t is concave down for all t. For concave down functions, the direction of the inequality in Jensen’s inequality is reversed. Setting t = S (^2) u , we have that

ESu = Eg(S (^2) u )  g(ES (^) u^2 ) = g( 2 ) =

and S (^) u is a downwardly biased estimator of .

14.9. Set g(p) = p 2. Then, g 00 (p) = 2. Recall that the variance of a Bernoulli random variable 2 = p(1 p) and the bias

b (^) g (p) ⇡

g 00 (p)

n

p(1 p) n

p(1 p) n

14.14. Cov(Y, Z) = EY Z EY · EZ = EY Z whenever EZ = 0.

14.16. For independent normal random variables with known variance 02 and unknown mean μ, the density

f (x|μ) =

p 2 ⇡

exp (x μ) 2 2 02

ln f (x|μ) = ln( (^0)

p 2 ⇡) (x μ) 2 2 (^20)

Thus, the score function @ @μ

ln f (x|μ) =

(x μ).

and the Fisher information associated to a single observation

I(μ) = E

ln f (X|μ)

E[(X μ) 2 ] =

Var(X) =

Again, the information is the reciprocal of the variance. Thus, by the Cram´er-Rao lower bound, any unbiased estimator based on n observations must have variance al least 20 /n. However, if we take d(x) = ¯x, then

Varμ d(X) =

n

and x¯ is a uniformly minimum variance unbiased estimator.

14.17. First, we take two derivatives of ln f (x|✓).

@ ln f (x|✓) @✓

@f (x|✓)/@✓ f (x|✓)

and

@ 2 ln f (x|✓) @✓ 2

@ 2 f (x|✓)/@✓ 2 f (x|✓)

(@f (x|✓)/@✓) 2 f (x|✓) 2

@ 2 f (x|✓)/@✓ 2 f (x|✓)

@f (x|✓)/@✓) f (x|✓)

@ 2 f (x|✓)/@✓ 2 f (x|✓)

@ ln f (x|✓) @✓