Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

mathematical statistics with applications, Summaries of Mathematical Statistics

American InterContinental University (AIU) - Atlanta Mathematical Statistics

Completes schematical summary of "mathematical statistics with applications"

Typology: Summaries

2018/2019

Uploaded on 07/31/2019

ekapad 🇮🇳

(17)

266 documents

1 / 142

This page cannot be seen from the preview

Don't miss anything!

Mathematical Statistics

Sara van de Geer

September 2010

Partial preview of the text

Download mathematical statistics with applications and more Summaries Mathematical Statistics in PDF only on Docsity!

Mathematical Statistics

Sara van de Geer

September 2010

1 Introduction
- 1.1 Some notation and model assumptions
- 1.2 Estimation
- 1.3 Comparison of estimators: risk functions
- 1.4 Comparison of estimators: sensitivity
- 1.5 Confidence intervals
  - 1.5.1 Equivalence confidence sets and tests
- 1.6 Intermezzo: quantile functions
- 1.7 How to construct tests and confidence sets
- 1.8 An illustration: the two-sample problem
  - 1.8.1 Assuming normality
  - 1.8.2 A nonparametric test
  - 1.8.3 Comparison of Student’s test and Wilcoxon’s test
- 1.9 How to construct estimators
  - 1.9.1 Plug-in estimators
  - 1.9.2 The method of moments
  - 1.9.3 Likelihood methods
2 Decision theory
- 2.1 Decisions and their risk
- 2.2 Admissibility
- 2.3 Minimaxity
- 2.4 Bayes decisions
- 2.5 Intermezzo: conditional distributions
- 2.6 Bayes methods
- 2.7 Discussion of Bayesian approach (to be written)
- 2.8 Integrating parameters out (to be written)
- 2.9 Intermezzo: some distribution theory
  - 2.9.1 The multinomial distribution
  - 2.9.2 The Poisson distribution
  - 2.9.3 The distribution of the maximum of two random variables
- 2.10 Sufficiency
  - 2.10.1 Rao-Blackwell
  - 2.10.2 Factorization Theorem of Neyman
  - 2.10.3 Exponential families
  - 2.10.4 Canonical form of an exponential family
  - 2.10.5 Minimal sufficiency 4 CONTENTS
3 Unbiased estimators
- 3.1 What is an unbiased estimator?
- 3.2 UMVU estimators
  - 3.2.1 Complete statistics
- 3.3 The Cramer-Rao lower bound
- 3.4 Higher-dimensional extensions
- 3.5 Uniformly most powerful tests
  - 3.5.1 An example
  - 3.5.2 UMP tests and exponential families
  - 3.5.3 Unbiased tests
  - 3.5.4 Conditional tests
4 Equivariant statistics
- 4.1 Equivariance in the location model
- 4.2 Equivariance in the location-scale model (to be written)
5 Proving admissibility and minimaxity
- 5.1 Minimaxity
- 5.2 Admissibility
- 5.3 Inadmissibility in higher-dimensional settings (to be written)
6 Asymptotic theory
- 6.1 Types of convergence
  - 6.1.1 Stochastic order symbols
  - 6.1.2 Some implications of convergence
- 6.2 Consistency and asymptotic normality
  - 6.2.1 Asymptotic linearity
  - 6.2.2 The δ-technique
- 6.3 M-estimators
  - 6.3.1 Consistency of M-estimators
  - 6.3.2 Asymptotic normality of M-estimators
- 6.4 Plug-in estimators
  - 6.4.1 Consistency of plug-in estimators
  - 6.4.2 Asymptotic normality of plug-in estimators
- 6.5 Asymptotic relative efficiency
- 6.6 Asymptotic Cramer Rao lower bound
  - 6.6.1 Le Cam’s 3rd Lemma
- 6.7 Asymptotic confidence intervals and tests
  - 6.7.1 Maximum likelihood
  - 6.7.2 Likelihood ratio tests
- 6.8 Complexity regularization (to be written)
7 Literature

CONTENTS 5

These notes in English will closely follow Mathematische Statistik, by H.R. K¨unsch (2005), but are as yet incomplete. Mathematische Statistik can be used as supplementary reading material in German.

Mathematical rigor and clarity often bite each other. At some places, not all subtleties are fully presented. A snake will indicate this.

Chapter 1 Introduction

Statistics is about the mathematical modeling of observable phenomena, using stochastic models, and about analyzing data: estimating parameters of the model and testing hypotheses. In these notes, we study various estimation and testing procedures. We consider their theoretical properties and we investigate various notions of optimality.

1.1 Some notation and model assumptions

The data consist of measurements (observations) x 1 ,... , xn, which are regarded as realizations of random variables X 1 ,... , Xn. In most of the notes, the Xi are real-valued: Xi ∈ R (for i = 1,... , n), although we will also consider some extensions to vector-valued observations.

Example 1.1.1 Fizeau and Foucault developed methods for estimating the speed of light (1849, 1850), which were later improved by Newcomb and Michel- son. The main idea is to pass light from a rapidly rotating mirror to a fixed mirror and back to the rotating mirror. An estimate of the velocity of light is obtained, taking into account the speed of the rotating mirror, the distance travelled, and the displacement of the light as it returns to the rotating mirror.

Fig. 1

The data are Newcomb’s measurements of the passage time it took light to travel from his lab, to a mirror on the Washington Monument, and back to his lab.

8 CHAPTER 1. INTRODUCTION

distance: 7.44373 km.

66 measurements on 3 consecutive days

first measurement: 0.000024828 seconds= 24828 nanoseconds

The dataset has the deviations from 24800 nanoseconds.

The measurements on 3 different days:

l llll

lll l

ll l (^) l l ll l ll

0 5 10 15 20 25 −^

0 20

day 1

X1^ l l

l (^) ll^ l^ ll^ l (^) ll^ lll l^ l^ l^ l^ ll^ ll

20 25 30 35 40 45 −^

0 20

day 2

ll l l l l l (^) l lll l llll (^) l lll l (^) l l l ll

40 45 50 55 60 65 −^

0 20

day 3

All measurements in one plot:

l l

l l lll

l l

l l l

lll ll l

l l

l ll l

l l

l ll l

l lll l

l l

l l l

l l l l

l l l l l l

0 10 20 30 40 50 60

−

10 CHAPTER 1. INTRODUCTION

The class F 0 is for example modeled as the class of all symmetric distributions, that is F 0 := {F 0 (x) = 1 − F 0 (−x), ∀ x}. (1.2)

This is an infinite-dimensional collection: it is not parametrized by a finite dimensional parameter. We then call F 0 an infinite-dimensional parameter.

A finite-dimensional model is for example

F 0 := {Φ(·/σ) : σ > 0 }, (1.3)

where Φ is the standard normal distribution function.

Thus, the location model is

Xi = μ + i, i = 1,... , n,

with 1 ,... , n i.i.d. and, under model (1.2), symmetrically but otherwise un- known distributed and, under model (1.3), N (0, σ^2 )-distributed with unknown variance σ^2.

1.2 Estimation

A parameter is an aspect of the unknown distribution. An estimator T is some given function T (X) of the observations X. The estimator is constructed to estimate some unknown parameter, γ say.

In Example 1.1.2, one may consider the following estimators ˆμ of μ:

The average

μˆ 1 :=

∑^ N

Xi.

Note that ˆμ 1 minimizes the squared loss

∑^ n

(Xi − μ)^2.

It can be shown that ˆμ 1 is a “good” estimator if the model (1.3) holds. When (1.3) is not true, in particular when there are outliers (large, “wrong”, obser- vations) (Ausreisser), then one has to apply a more robust estimator.

The (sample) median is

μˆ 2 :=

X((n+1)/2) when n odd {X(n/2) + X(n/2+1)}/ 2 when n is even ,

where X(1) ≤ · · · ≤ X(n) are the order statistics. Note that ˆμ 2 is a minimizer of the absolute loss (^) n ∑

|Xi − μ|.

1.2. ESTIMATION 11

The Huber estimator is

μˆ 3 := arg min μ

∑n

ρ(Xi − μ), (1.4)

where

ρ(x) =

x^2 if |x| ≤ k k(2|x| − k) if |x| > k

with k > 0 some given threshold.

We finally mention the α-trimmed mean, defined, for some 0 < α < 1, as

μˆ 4 :=

n − 2[nα]

n− ∑[nα]

i=[nα]+

X(i).

Note To avoid misunderstanding, we note that e.g. in (1.4), μ is used as variable over which is minimized, whereas in (1.1), μ is a parameter. These are actually distinct concepts, but it is a general convention to abuse notation and employ the same symbol μ. When further developing the theory (see Chapter 6) we shall often introduce a new symbol for the variable, e.g., (1.4) is written as

μˆ 3 := arg min c

∑n

ρ(Xi − c).

An example of a nonparametric estimator is the empirical distribution function

Fˆn(·) :=^1 n #{Xi ≤ ·, 1 ≤ i ≤ n}.

This is an estimator of the theoretical distribution function

F (·) := P (X ≤ ·).

Any reasonable estimator is constructed according the so-called a plug-in princi- ple (Einsetzprinzip). That is, the parameter of interest γ is written as γ = Q(F ), with Q some given map. The empirical distribution Fˆn is then “plugged in”, to obtain the estimator T := Q( Fˆn). (We note however that problems can arise, e.g. Q( Fˆn) may not be well-defined ....).

Examples are the above estimators ˆμ 1 ,... , μˆ 4 of the location parameter μ. We define the maps

Q 1 (F ) :=

xdF (x)

(the mean, or point of gravity, of F ), and

Q 2 (F ) := F −^1 (1/2)

(the median of F ), and

Q 3 (F ) := arg min μ

ρ(· − μ)dF,

1.5. CONFIDENCE INTERVALS 13

Break down point Let for m ≤ n,

(m) := sup x∗ 1 ,...,x∗ m

|μˆ(x∗ 1 ,... , x∗ m, Xm+1,... , Xn)|.

If (m) := ∞, we say that with m outliers the estimator can break down. The break down point is defined as

∗^ := min{m : (m) = ∞}/n.

1.5 Confidence intervals

Consider the location model (Example 1.1.2).

Definition A subset I = I(X) ⊂ R, depending (only) on the data X = (X 1 ,... , Xn), is called a confidence set (Vertrauensbereich) for μ, at level 1 −α, if

IPμ,F 0 (μ ∈ I) ≥ 1 − α, ∀ μ ∈ R, F 0 ∈ F 0.

A confidence interval is of the form

I := [μ, μ¯],

where the boundaries μ = μ(X) and ¯μ = ¯μ(X) depend (only) on the data X.

1.5.1 Equivalence confidence sets and tests

Let for each μ 0 ∈ R, φ(X, μ 0 ) ∈ { 0 , 1 } be a test at level α for the hypothesis

Hμ 0 : μ = μ 0.

Thus, we reject Hμ 0 if and only if φ(X, μ 0 ) = 1, and

IPμ 0 ,F 0 (φ(X, μ 0 ) = 1) ≤ α.

Then

I(X) := {μ : φ(X, μ) = 0 }

is a (1 − α)-confidence set for μ.

Conversely, if I(X) is a (1 − α)-confidence set for μ, then, for all μ 0 , the test φ(X, μ 0 ) defined as

φ(X, μ 0 ) =

1 if μ 0 ∈/ I(X) 0 else

is a test at level α of Hμ 0.

14 CHAPTER 1. INTRODUCTION

1.6 Intermezzo: quantile functions

Let F be a distribution function. Then F is cadlag (continue a droite, limitea gauche). Define the quantile functions

qF + (u) := sup{x : F (x) ≤ u},

and q −F (u) := inf{x : F (x) ≥ u} := F −^1 (u).

It holds that F (q −F (u)) ≥ u

and, for all h > 0, F (qF + (u) − h) ≤ u.

Hence F (q +F (u)−) := lim h↓ 0 F (q +F (u) − h) ≤ u.

1.7 How to construct tests and confidence sets

Consider a model class P := {Pθ : θ ∈ Θ}.

Moreover, consider a space Γ, and a map

g : Θ → Γ, g(θ) := γ.

We think of γ as the parameter of interest (as in the plug-in principle, with γ = Q(Pθ) = g(θ)).

For instance, in Example 1.1.2, the parameter space is Θ := {θ = (μ, F 0 ), μ ∈ R, F 0 ∈ F 0 }, and, when μ is the parameter of interest, g(μ, F 0 ) = μ.

To test

Hγ 0 : γ = γ 0 ,

we look for a pivot (T¨ur-Angel). This is a function Z(X, γ) depending on the data X and on the parameter γ, such that for all θ ∈ Θ, the distribution

IPθ(Z(X, g(θ)) ≤ ·) := G(·)

does not depend on θ. We note that to find a pivot is unfortunately not always possible. However, if we do have a pivot Z(X, γ) with distribution G, we can compute its quantile functions

qL := qG +

( (^) α 2

, qR := q −G

α 2

and the test φ(X, γ 0 ) :=

1 if Z(X, γ 0 ) ∈/ [qL, qR] 0 else

16 CHAPTER 1. INTRODUCTION

is an asymptotic pivot, with limiting distribution G = Φ.

Comparison of confidence intervals and tests When comparing confidence intervals, the aim is usually to take the one with smallest length on average (keeping the level at 1 − α). In the case of tests, we look for the one with maximal power. In the location model, this leads to studying

EIμ,F 0 |¯μ(X) − μ(X)|

for (1 − α)-confidence sets [μ, μ¯], or to studying the power of test φ(X, μ 0 ) at level α. Recall that the power is Pμ,F 0 (φ(X, μ 0 ) = 1) for values μ 6 = μ 0.

1.8 An illustration: the two-sample problem

Consider the following data, concerning weight gain/loss. The control group x had their usual diet, and the treatment group y obtained a special diet, designed for preventing weight gain. The study was carried out to test whether the diet works.

control group group

treatment

rank( x ) rank( y ) x y 7

Table 2

Let n (m) be the sample size of the control group x (treatment group y). The mean in group∑ x (y) is denoted by ¯x (¯y). The sums of squares are SSx := n i=1(xi^ −^ x¯)

(^2) and SSy := ∑m j=1(yj^ −^ y¯)

(^2). So in this study, one has n = m = 5

and the values ¯x = 6.4, ¯y = 0, SSx = 161.2 and SSy = 114. The ranks, rank(x) and rank(y), are the rank-numbers when putting all n + m data together (e.g., y 3 = −6 is the smallest observation and hence rank(y 3 ) = 1).

We assume that the data are realizations of two independent samples, say X = (X 1 ,... , Xn) and Y = (Y 1 ,... , Ym), where X 1 ,... , Xn are i.i.d. with distribution function FX , and Y 1 ,... , Ym are i.i.d. with distribution function FY. The distribution functions FX and FY may be in whole or in part un- known. The testing problem is: H 0 : FX = FY against a one- or two-sided alternative.

1.8. AN ILLUSTRATION: THE TWO-SAMPLE PROBLEM 17

1.8.1 Assuming normality

The classical two-sample student test is based on the assumption that the data come from a normal distribution. Moreover, it is assumed that the variance of FX and FY are equal. Thus, (FX , FY ) ∈ { FX = Φ

· − μ σ

, FY = Φ

· − (μ + γ) σ

: μ ∈ R, σ > 0 , γ ∈ Γ

Here, Γ ⊃ { 0 } is the range of shifts in mean one considers, e.g. Γ = R for two-sided situations, and Γ = (−∞, 0] for a one-sided situation. The testing problem reduces to H 0 : γ = 0.

We now look for a pivot Z(X, Y, γ). Define the sample means

X¯ :=^1

∑^ n

Xi, Y¯ :=

∑^ m

Yj ,

and the pooled sample variance

S^2 :=

m + n − 2

{ (^) ∑n

(Xi − X¯)^2 +

∑^ m

(Yj − Y¯ )^2

Note that X¯ has expectation μ and variance σ^2 /n, and Y¯ has expectation μ + γ and variance σ^2 /m. So Y¯ − X¯ has expectation γ and variance

σ^2 n

σ^2 m = σ^2

n + m nm

The normality assumption implies that

Y¯ − X¯ is N

γ, σ^2

n + m nm

−distributed.

Hence (^) √ nm n + m

Y − X¯ − γ σ

is N (0, 1)−distributed.

To arrive at a pivot, we now plug in the estimate S for the unknown σ:

Z(X, Y, γ) :=

nm n + m

Y − X¯ − γ S

Indeed, Z(X, Y, γ) has a distribution G which does not depend on unknown parameters. The distribution G is Student(n + m − 2) (the Student-distribution with n+m−2 degrees of freedom). As test statistic for H 0 : γ = 0, we therefore take T = T Student^ := Z(X, Y, 0).

1.8. AN ILLUSTRATION: THE TWO-SAMPLE PROBLEM 19

Large values of T mean that the Xi are generally larger than the Yj , and hence indicate evidence against H 0.

To check whether or not the observed value of the test statistic is compatible with the null-hypothesis, we need to know its null-distribution, that is, the distribution under H 0. Under H 0 : FX = FY , the vector of ranks (R 1 ,... , Rn) has the same distribution as n random draws without replacement from the numbers { 1 ,... , N }. That is, if we let

r := (r 1 ,... , rn, rn+1,... , rN )

denote a permutation of { 1 ,... , N }, then

IPH 0

(R 1 ,... , Rn, Rn+1,... RN ) = r

N!

(see Theorem 1.8.1), and hence

IPH 0 (T = t) = #{r :

∑n i=1 ri^ =^ t} N!

This can also be written as

IPH 0 (T = t) =

(N

) #{r 1 < · · · < rn < rn+1 < · · · < rN :

∑^ n

ri = t}.

So clearly, the null-distribution of T does not depend on FX or FY. It does however depend on the sample sizes n and m. It is tabulated for n and m small or moderately large. For large n and m, a normal approximation of the null-distribution can be used.

Theorem 1.8.1 formally derives the null-distribution of the test, and actually proves that the order statistics and the ranks are independent. The latter result will be of interest in Example 2.10.4.

For two random variables X and Y , use the notation

X D= Y

when X and Y have the same distribution.

Theorem 1.8.1 Let Z 1 ,... , ZN be i.i.d. with continuous distribution F on R. Then (Z(1),... , Z(N )) and R := (R 1 ,... , RN ) are independent, and for all permutations r := (r 1 ,... , rN ),

IP(R = r) =

N!

Proof. Let ZQi := Z(i), and Q := (Q 1 ,... , QN ). Then

R = r ⇔ Q = r−^1 := q,

20 CHAPTER 1. INTRODUCTION

where r−^1 is the inverse permutation of r.^1 For all permutations q and all measurable maps f ,

f (Z 1 ,... , ZN )

D

= f (Zq 1 ,... , ZqN ).

Therefore, for all measurable sets A ⊂ RN^ , and all permutations q,

IP

(Z 1 ,... , ZN ) ∈ A, Z 1 <... < ZN

= IP

(Zq 1... , ZqN ) ∈ A, Zq 1 <... < ZqN

Because there are N! permutations, we see that for any q,

IP

(Z(1),... , Z(n)) ∈ A

= N !IP

(Zq 1... , ZqN ) ∈ A, Zq 1 <... < ZqN

= N !IP

(Z(1),... , Z(N )) ∈ A, R = r

where r = q−^1. Thus we have shown that for all measurable A, and for all r,

IP

(Z(1),... , Z(N )) ∈ A, R = r

N!

IP

(Z(1),... , Z(n)) ∈ A

Take A = RN^ to find that (1.5) implies

IP

R = r

N!

Plug this back into (1.5) to see that we have the product structure

IP

(Z(1),... , Z(N )) ∈ A, R = r

= IP

(Z(1),... , Z(n)) ∈ A

IP

R = r

which holds for all measurable A. In other words, (Z(1),... , Z(N )) and R are independent. tu

1.8.3 Comparison of Student’s test and Wilcoxon’s test

Because Wilcoxon’s test is ony based on the ranks, and does not rely on the assumption of normality, it lies at hand that, when the data are in fact normally distributed, Wilcoxon’s test will have less power than Student’s test. The loss (^1) Here is an example, with N = 3:

(z 1 , z 2 , z 3 ) = ( 5 , 6 , 4 ) (r 1 , r 2 , r 3 ) = ( 2 , 3 , 1 ) (q 1 , q 2 , q 3 ) = ( 3 , 1 , 2 )

mathematical statistics with applications, Summaries of Mathematical Statistics

Related documents

Partial preview of the text

Download mathematical statistics with applications and more Summaries Mathematical Statistics in PDF only on Docsity!

Mathematical Statistics

Sara van de Geer

September 2010

CONTENTS 5

Chapter 1

Introduction

8 CHAPTER 1. INTRODUCTION

10 CHAPTER 1. INTRODUCTION

∑^ N

1.2. ESTIMATION 11

1.5. CONFIDENCE INTERVALS 13

1.5.1 Equivalence confidence sets and tests

14 CHAPTER 1. INTRODUCTION

16 CHAPTER 1. INTRODUCTION

1.8. AN ILLUSTRATION: THE TWO-SAMPLE PROBLEM 17

1.8.1 Assuming normality

, FY = Φ

X¯ :=^1

S^2 :=

1.8. AN ILLUSTRATION: THE TWO-SAMPLE PROBLEM 19

IPH 0

N!

(N

X D= Y

N!

20 CHAPTER 1. INTRODUCTION

D

IP

(Z 1 ,... , ZN ) ∈ A, Z 1 <... < ZN

= IP

IP

= N !IP

= N !IP

IP

N!

IP

IP

N!

IP

= IP

IP

1.8.3 Comparison of Student’s test and Wilcoxon’s test