Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Lecture Notes on Statistical Decision Theory | STAT 9220, Study notes of Statistics

Medical College of Georgia (MCG)Statistics

Prof. Grzegorz A. Rempala

Material Type: Notes; Professor: Rempala; Class: Advanced Statistical Inference; Subject: Statistics; University: Medical College of Georgia; Term: Spring 2009;

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-5km 🇺🇸

10 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

STAT 9220

Lecture 6

Statistical Decision Theory

Greg Rempala

Department of Biostatistics

Medical College of Georgia

Feb 17, 2009

Partial preview of the text

Download Lecture Notes on Statistical Decision Theory | STAT 9220 and more Study notes Statistics in PDF only on Docsity!

STAT 9220

Lecture 6

Statistical Decision Theory

Greg Rempala

Department of Biostatistics

Medical College of Georgia

Feb 17, 2009

6.1 Basics

Let X be a sample from a population P ∈ P. A statistical decision is an action

we take after observing X concerning (conclusion about) P.

Let A denote the set of allowable actions and let F A

be a σ-field on A. Then the

measurable space (A, F A

) is called the action space. Let X be the range of X and

F

be a σ-field on X. A decision rule is a measurable function (a statistic) T from

(X, F

) to (A, F A

). Typically a decision rule is based on a loss function L, where

L : (P × A) → R+

The value of the loss is L(P, T (X)) if X = x. The average loss for the decision

rule T is defined as

R

(P ) = E[L(P, T (X))]

and called the risk. If P is a parametric family indexed by θ ∈ Θ, the loss and

risk are denoted by L(θ, a) and R T

(θ). A rule T 1

is as good as T 2

if and only if

RT

(P ) ≤ RT

(P ) for any P ∈ P,

and is better than T 2

if, in addition, R T 1

(P ) < R

T 2

(P ) for at least one P ∈ P.

Two decision rules T 1

and T 2

are equivalent if and only if R T 1

(P ) = R

T 2

(P ) for all

P ∈ P. It is also possible to consider randomized decision rules, i.e. a function δ

on the space (X × F A

) such that, for every A ∈ F A

, δ(·, A) is a Borel function and,

for every x ∈ X , δ(x, ·) is a probability measure on (A, F A

). If one wants to select

an action in A, one needs to simulate a random element according to δ(x, ·).

In similar problems we can consider different loss functions as well, e.g. L(P, a) =

|θ − a| (absolute loss).

Example 6.1.2 (Hypothesis testing). Let P be a family of distributions, P = P 0

∪ P

and P 0 ∩ P 1 = ∅. A hypothesis testing problem can be formulated as that of decid-

ing which of the following two statements is true: H 0

: P ∈ P

versus H 1

: P ∈ P

Here, H 0

is called the null hypothesis and H 1

is called the alternative hypothesis.

The action space is A = { 0 , 1 }, where 0 is the action of accepting H 0

and 1 is

the action of rejecting H 0

. A decision rule is called a test T : X → { 0 , 1 }, so

T (X) = 1

(X), where C ∈ F X

is called the rejection region or critical region for

testing H 0

versus H 1

. A loss function in this problem is 0 − 1 loss:

L(P, a) =

0 if a correct decision is made,

1 otherwise.

Under this loss, the risk is

RT (P ) =

P (T (X) = 1) = P (X ∈ C), P ∈ P

P (T (X) = 0) = P (X /∈ C), P ∈ P 1.

Example 6.1.3. Let X 1

,... , X

be i.i.d. random variables from a population P ∈

P that is the family of populations having finite mean μ and variance σ

. Consider

the estimation of μ (A = R) under the square loss function L(μ, a) = (μ − a)

. Let

T be the class of all linear functions in X = (X 1

,... , X

), i.e., T (X) =

c i

X

with known ci ∈ R, i = 1,... , n.

Then

R

(P ) = E(μ − T (X))

= E

n ∑

c i

X

− μ

= E

[

n ∑

c i

(X

− μ) + (

n ∑

c i

− 1)μ

]

n ∑

c i

(a) We show that there is no T (X) that uniformly in P (μ, σ

) minimizes RT (P ).

The minimum of R T

(P ) as a function of (c) is attained at c 1

= · · · = c n

and

c i

2 +nμ

, which depends on P. There is no c i

’s that would minimize

RT (P ) uniformly over any P.

(b) Consider now a subclass T 0

⊂ T with c i

’s satisfying

c i

= 1. Then

R

(P ) =

2 if T ∈ T 0

. Minimizing

2 subject to

c i

= 1 leads

to an optimal solution of c i

for all i. Thus, the sample mean

X is T 0

-optimal.

Example 6.1.4. Assume that the sample X has the binomial distribution b(n, θ)

with an unknown θ ∈ (0, 1) and a fixed integer n > 1. Consider the hypothesis

testing problem described in Example 1.1.2. with H 0

: θ ∈ (0, θ 0

] versus H 1

θ ∈ (θ 0 , 1), where θ 0 ∈ (0, 1) is a fixed value. Suppose that we are only interested in

the following class of nonrandomized decision rules: T = {T j

: j = 0, 1 ,... , n− 1 },

where T j

(X) = 1

{j+1,...,n}

(X). The risk function for T j

under the 0 − 1 loss is

R

(θ) = P (X > j) 1 (0,θ 0 ]

(θ) + P (X ≤ j) 1 (θ 0 ,1)

(θ).

6.2 Example: Minimizing MSE

One of the most important aspects of statistical decision theory is that it may be

used to formalize the notion of optimality of statistical estimators via minimizing

MSE.

Example 6.2.1. Let X 1

,... , X

be i.i.d. from an unknown c.d.f. F. Suppose

that the parameter of interest is ϑ = 1 − F (t) for a fixed t > 0.

a) If F is not in a parametric family, then a nonparametric estimator of F (t) is

the empirical c.d.f.

F

(t) =

n ∑

(−∞,t]

(X

), t ∈ R.

Since 1 (−∞,t]

(X

) = Y

∈ { 0 , 1 } and

Y

∼ Binomial(1 − ϑ, n), F n

(t) =

Y , and

V ar(F n

(t)) = mse Fn(t)

(P ) = F (t)[1 − F (t)]/n = ϑ(1 − ϑ)/n. Consequently, F n

(t)

is a non-paramtric estimator of (1 − ϑ). By linearity of expectations an unbiased

estimator of ϑ is U (X) = 1 − F n

(t), which has the same variance and mse as F n

(t).

b) The estimator U (X) can be improved in terms of the mse if there is further

information about F. Suppose that F is the c.d.f. of the exponential distribution

E(0, θ) with an unknown θ > 0. Then F (t) = 1 − e

−t/θ and ϑ = e

−t/θ

. The sample

mean

X is sufficient for θ > 0. Now, we take

U (X) = E(U (X)|

X) = h(

X). Then

mse(

U ) < mse(U )

E[U (X) − ϑ]

= E{U (X) − E[U (X)|

X] − ϑ + E[U (X)|

X]}

= E{U (X) − E[U (X)|

X]}

E[ϑ − E[U (X)|

X]]

+2E{[U (X) − E[U (X)|

X][ϑ − E[U (X)|

X]]}

= E{U (X) − E[U (X)|

X]}

E{ϑ − E[U (X)|

X]}

+2E{[U (X) − E[U (X)|

X]]|

X}E{[ϑ − E[U (X)|

X]]|

X}

= E{U (X) − E[U (X)|

X]}

E{ϑ − E[U (X)|

X]}

+2(E[U (X)|

X] − E[U (X)|

X])E{[ϑ − E[U (X)|

X]]|

X}

= E{U (X) − E[U (X)|

X]}

E{ϑ − E[U (X)|

X]}

E{ϑ − E[U (X)|

X]}

This is method of improving estimators is sometimes called the blackwellization

method after one of its inventors David Blackwell.

Lecture Notes on Statistical Decision Theory | STAT 9220, Study notes of Statistics

Related documents

Partial preview of the text

Download Lecture Notes on Statistical Decision Theory | STAT 9220 and more Study notes Statistics in PDF only on Docsity!

STAT 9220

Lecture 6

Statistical Decision Theory

Greg Rempala

Department of Biostatistics

Medical College of Georgia

Feb 17, 2009

6.1 Basics

F

(X, F

L : (P × A) → R+

R

(P ) = E[L(P, T (X))]

RT

(P ) ≤ RT

(P ) < R

(P ) = R

∪ P

: P ∈ P

: P ∈ P

T (X) = 1

RT (P ) =

P (T (X) = 1) = P (X ∈ C), P ∈ P

P (T (X) = 0) = P (X /∈ C), P ∈ P 1.

,... , X

,... , X

X

R

X

= E

[

(X

]

R

(P ) =

(X) = 1

R

6.2 Example: Minimizing MSE

MSE.

,... , X

F

(X

(X

) = Y

Y

U (X) = E(U (X)|

X]}

= E{U (X) − E[U (X)|

X]}

X]]

+2E{[U (X) − E[U (X)|

X]]}

= E{U (X) − E[U (X)|

X]}

X]}

+2E{[U (X) − E[U (X)|

X]]|

X]]|

X}

= E{U (X) − E[U (X)|

X]}

X]}

+2(E[U (X)|

X] − E[U (X)|

X]]|

X}

= E{U (X) − E[U (X)|

X]}

X]}

X]}