





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Professor: Rempala; Class: Advanced Statistical Inference; Subject: Statistics; University: Medical College of Georgia; Term: Spring 2009;
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!
Let X be a sample from a population P ∈ P. A statistical decision is an action
we take after observing X concerning (conclusion about) P.
Let A denote the set of allowable actions and let F A
be a σ-field on A. Then the
measurable space (A, F A
) is called the action space. Let X be the range of X and
X
be a σ-field on X. A decision rule is a measurable function (a statistic) T from
X
) to (A, F A
). Typically a decision rule is based on a loss function L, where
The value of the loss is L(P, T (X)) if X = x. The average loss for the decision
rule T is defined as
T
and called the risk. If P is a parametric family indexed by θ ∈ Θ, the loss and
risk are denoted by L(θ, a) and R T
(θ). A rule T 1
is as good as T 2
if and only if
1
2
(P ) for any P ∈ P,
and is better than T 2
if, in addition, R T 1
T 2
(P ) for at least one P ∈ P.
Two decision rules T 1
and T 2
are equivalent if and only if R T 1
T 2
(P ) for all
P ∈ P. It is also possible to consider randomized decision rules, i.e. a function δ
on the space (X × F A
) such that, for every A ∈ F A
, δ(·, A) is a Borel function and,
for every x ∈ X , δ(x, ·) is a probability measure on (A, F A
). If one wants to select
an action in A, one needs to simulate a random element according to δ(x, ·).
In similar problems we can consider different loss functions as well, e.g. L(P, a) =
|θ − a| (absolute loss).
Example 6.1.2 (Hypothesis testing). Let P be a family of distributions, P = P 0
1
and P 0 ∩ P 1 = ∅. A hypothesis testing problem can be formulated as that of decid-
ing which of the following two statements is true: H 0
0
versus H 1
1
Here, H 0
is called the null hypothesis and H 1
is called the alternative hypothesis.
The action space is A = { 0 , 1 }, where 0 is the action of accepting H 0
and 1 is
the action of rejecting H 0
. A decision rule is called a test T : X → { 0 , 1 }, so
C
(X), where C ∈ F X
is called the rejection region or critical region for
testing H 0
versus H 1
. A loss function in this problem is 0 − 1 loss:
L(P, a) =
0 if a correct decision is made,
1 otherwise.
Under this loss, the risk is
0
Example 6.1.3. Let X 1
n
be i.i.d. random variables from a population P ∈
P that is the family of populations having finite mean μ and variance σ
2
. Consider
the estimation of μ (A = R) under the square loss function L(μ, a) = (μ − a)
2
. Let
T be the class of all linear functions in X = (X 1
n
), i.e., T (X) =
n
i=
c i
i
with known ci ∈ R, i = 1,... , n.
Then
T
(P ) = E(μ − T (X))
2
= E
n ∑
i=
c i
i
− μ
2
n ∑
i=
c i
i
− μ) + (
n ∑
i=
c i
− 1)μ
2
n ∑
i=
c
2
i
σ
2
2
n ∑
i=
c i
2
(a) We show that there is no T (X) that uniformly in P (μ, σ
2
) minimizes RT (P ).
The minimum of R T
(P ) as a function of (c) is attained at c 1
= · · · = c n
and
n
i=
c i
μ
2
σ
2 +nμ
2
, which depends on P. There is no c i
’s that would minimize
RT (P ) uniformly over any P.
(b) Consider now a subclass T 0
⊂ T with c i
’s satisfying
n
i=
c i
= 1. Then
T
n
i=
c
2
i
σ
2 if T ∈ T 0
. Minimizing
n
i=
c
2
i
σ
2 subject to
n
i=
c i
= 1 leads
to an optimal solution of c i
1
n
for all i. Thus, the sample mean
X is T 0
-optimal.
Example 6.1.4. Assume that the sample X has the binomial distribution b(n, θ)
with an unknown θ ∈ (0, 1) and a fixed integer n > 1. Consider the hypothesis
testing problem described in Example 1.1.2. with H 0
: θ ∈ (0, θ 0
] versus H 1
θ ∈ (θ 0 , 1), where θ 0 ∈ (0, 1) is a fixed value. Suppose that we are only interested in
the following class of nonrandomized decision rules: T = {T j
: j = 0, 1 ,... , n− 1 },
where T j
{j+1,...,n}
(X). The risk function for T j
under the 0 − 1 loss is
Tj
(θ) = P (X > j) 1 (0,θ 0 ]
(θ) + P (X ≤ j) 1 (θ 0 ,1)
(θ).
One of the most important aspects of statistical decision theory is that it may be
used to formalize the notion of optimality of statistical estimators via minimizing
Example 6.2.1. Let X 1
n
be i.i.d. from an unknown c.d.f. F. Suppose
that the parameter of interest is ϑ = 1 − F (t) for a fixed t > 0.
a) If F is not in a parametric family, then a nonparametric estimator of F (t) is
the empirical c.d.f.
n
(t) =
n
n ∑
i=
(−∞,t]
i
), t ∈ R.
Since 1 (−∞,t]
i
i
∈ { 0 , 1 } and
n
i=
i
∼ Binomial(1 − ϑ, n), F n
(t) =
Y , and
V ar(F n
(t)) = mse Fn(t)
(P ) = F (t)[1 − F (t)]/n = ϑ(1 − ϑ)/n. Consequently, F n
(t)
is a non-paramtric estimator of (1 − ϑ). By linearity of expectations an unbiased
estimator of ϑ is U (X) = 1 − F n
(t), which has the same variance and mse as F n
(t).
b) The estimator U (X) can be improved in terms of the mse if there is further
information about F. Suppose that F is the c.d.f. of the exponential distribution
E(0, θ) with an unknown θ > 0. Then F (t) = 1 − e
−t/θ and ϑ = e
−t/θ
. The sample
mean
X is sufficient for θ > 0. Now, we take
X) = h(
X). Then
mse(
U ) < mse(U )
E[U (X) − ϑ]
2
= E{U (X) − E[U (X)|
X] − ϑ + E[U (X)|
2
2
2
X][ϑ − E[U (X)|
2
2
X}E{[ϑ − E[U (X)|
2
2
X])E{[ϑ − E[U (X)|
2
2
E{ϑ − E[U (X)|
2
This is method of improving estimators is sometimes called the blackwellization
method after one of its inventors David Blackwell.