Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture Notes on Probability Theory - Advanced Statistical Inference | STAT 9220, Study notes of Statistics

Material Type: Notes; Professor: Rempala; Class: Advanced Statistical Inference; Subject: Statistics; University: Medical College of Georgia; Term: Spring 2009;

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-6jz
koofers-user-6jz 🇺🇸

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 9220
Lecture 1
Probability Theory - Overview Part I
Greg Rempala
Department of Biostatistics
Medical College of Georgia
Jan 13, 2009
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Lecture Notes on Probability Theory - Advanced Statistical Inference | STAT 9220 and more Study notes Statistics in PDF only on Docsity!

STAT 9220

Lecture 1

Probability Theory - Overview Part I

Greg Rempala

Department of Biostatistics

Medical College of Georgia

Jan 13, 2009

1.1 Measure Spaces

Definition 1.1.1. Let F be a collection of subsets of a set Ω. F is a σ-field if and only if it has the following properties

(i) ∅ ∈ F

(ii) If A ∈ F, then Ac^ ∈ F

(iii) If Ai ∈ F, i = 1, 2 ,... , then

⋃^ ∞

i=

Ai ∈ F

A pair (Ω, F) is called a measurable space.

Definition 1.1.2. Let (Ω, F) be a measurable space. A set function ν defined on F is called a measure if and only if it has the following properties

(i) 0 ≤ ν(A) ≤ ∞ for any A ∈ F

(ii) ν(∅) = 0

(iii) If Ai ∈ F, i = 1, 2 ,... , and Ai ∩ Aj = ∅, i 6 = j, then ν(

⋃^ ∞

i=

Ai) =

∑^ ∞

i=

ν(Ai)

The triple (Ω, F, ν) is called a measure space. If ν(Ω) = 1, then ν is called a probability measure which is usually denoted by P instead of ν, and (Ω, F, P ) is called a probability space.

1.3 Distribution functions on R

Let P be a probability measure on R, the cumulative distribution function (c.d.f.) of P is defined to be F (x) = P {(−∞, x]} for every x ∈ R.

Proposition 1.3.1. Let F be a c.d.f. on R. Then (a) F (+∞) = 1, F (−∞) = 0 (b) F is nondecreasing (c) F is right-continuous.

Proof. Part (c). First note that if A 1 ⊂ A 2 ⊂ · · · then for A =

Ai we have P (A) = lim P (Ai) (so called continuity from above– more in discussion). Note that by taking A˜i = Aci it follows that also for a sequence of sets A 1 ⊃ A 2 ⊃ A 3 ⊃... if A =

Ai then P (A) = limi P (Ai) (continuity from below). Take ti ↓ t and Ai = (−∞, ti]

1.4 Product measures

A measure ν on (Ω, F) is called σ-finite if and only if there exists a sequence {A 1 , A 2 ,... } such that ∪Ai = Ω and ν(Ai) < ∞ for all i.

Example 1.4.1. Lebesgue measure on R is σ-finite, since R =

⋃^ ∞

n=

(−n, n).

The Cartesian product of sets A 1 , A 2 ,... , Ak is defined as the set of all k-tuple (a 1 ,... , ak) such that ai ∈ Ai and is denoted by A 1 × A 2 × · · · × Ak. Product measures are objects on Ω 1 , Ω 2 ,... , Ωk. Note that F 1 × F 2 · · · × Fk does not have to be σ-field, thus we need to equip the space Ωk^ = Ω 1 , Ω 2 ,... , Ωk with its own σ-field called product σ-field σ(F 1 × F 2 · · · × Fk) (smallest σ-field containing F 1 × F 2 · · · × Fk).

Proposition 1.4.1. Let (Ωi, Fi, νi), i = 1,... , k be measure spaces and νi be σ- finite measures. Then there exists a unique σ-finite measure on σ(F 1 ×F 2 · · ·×Fk) called product measureand denoted by ν 1 × ν 2 · · · × νk such that

ν 1 × ν 2 · · · × νk(A 1 , A 2 ,... , Ak) = ν 1 (A 1 ) × ν 2 (A 2 ) · · · × νk(Ak)

for all Ai ∈ Fi, i = 1,... , k.

Distribution function in Rk^ is defined as F (x 1 ,... , xk) = P {(−∞, x 1 ] × · · · × (−∞, xk]} where P is any probability measure on Rk^ (not necessarily product).

1.6 Integration

Definition 1.6.1. (a) Let ϕ(ω) be a simple function ϕ(ω) =

∑^ k

i=

aiIAi (ω) (ai ≥ 0)

where

IAi (ω) =

0 ω /∈ Ai 1 ω ∈ Ai

then

ϕ dν =

∑^ k

i=

aiν(Ai).

(b) Let f ≥ 0 be a Borel function, then

∫ f dν = sup

ϕn dν

where ϕn are simple functions such that ϕn ≤ f. (c) (^) ∫

f dν =

f +^ dν −

f −^ dν

where f +^ = max(f, 0) and f −^ = max(−f, 0).

Remark 1.6.1. Notation (^1) Ai (·), I(A 1 ) is often used for IAi.

Theorem 1.6.3 (Fubini). Let νi be a σ-finite measure on (Ωi, Fi), i = 1, 2 and let f be a Borel function on (Ω 1 , F 1 ) × (Ω 2 , F 2 ) whose integral w.r.t. ν 1 × ν 2 exists. Then

g 2 (ω 2 ) =

Ω 1

f (ω 1 , ω 2 )dν 1

exists a.e. ν 2 and defines a Borel function on Ω 2 whose integral w.r.t. ν 2 exists, and (^) ∫

Ω 1 ×Ω 2

f (ω 1 , ω 2 )dν 1 × ν 2 =

Ω 2

g 2 (ω 2 ) dν 2

and the same holds for g 1 (ω 1 ) =

Ω 2

f (ω 1 , ω 2 ) dν 2.

Example 1.6.1. Let Ω 1 = Ω 2 = { 0 , 1 , 2 , 3 ,... } and ν 1 = ν 2 be the counting measure. A function f on Ω 1 × Ω 2 defines a double sequence. If

f dν 1 × ν 2 exists, then (^) ∞ ∑

i=

∑^ ∞

j=

f (i, j) =

∑^ ∞

j=

∑^ ∞

i=

f (i, j)

1.7 Radon-Nikodym derivative

Let (Ω, F, ν) be a measure space and f be a nonnegative Borel function. Define

λ(A) =

A

f dν, A ∈ F (1.2)

Then λ is a measure on F and

ν(A) = 0 implies λ(A) = 0. (1.3)

Definition 1.7.1. For two measures ν, λ for which (1.3) holds true, we write λ  ν and say that λ is absolutely continuous w.r.t. ν.

Theorem 1.7.1 (Radon-Nikodym). Let ν and λ be two measures on (Ω, F) and ν be σ-finite. If λ  ν, then there exists a nonnegative Borel function f on Ω such that (1.2) holds. Furthermore, f is unique a.e. ν.

The function f is called Radon-Nikodym derivative and is denoted by f = dλdν.

Example 1.7.1. If F ′^ = f then F (x) =

∫^ x −∞

f (y) dy, x ∈ R is distribution function

and f is Radon-Nikodym derivative.

1.8 Transformations of random variables

Example 1.8.1. Let X be a random variable with c.d.f. FX and Lebesgue p.d.f. fX , and let Y = X^2. Since Y −^1 ((−∞, x]) is empty if x < 0 and equals Y −^1 ([0, x]) = X−^1 ([−

x,

x]) if x ≥ 0, the c.d.f. of Y is

FY (x) = P ◦ Y −^1 ((−∞, x]) = P ◦ X−^1 ([−

x,

x]) = FX (

x) − FX (−

x)

if x ≥ 0 and FY (x) = 0 if x < 0. Clearly, (via differentiation) the Lebesgue p.d.f. of FY is

fY (x) =

x

[fX (

x) + fX (−

x)]I(0, ∞)(x).

In particular, if

fX (x) =

2 π

e−x

(^2) / 2 ,

which is the Lebesgue p.d.f. of the standard normal distribution N (0, 1), then

fY (x) =

2 πx

e−x/^2 I(0, ∞)(x),

which is the Lebesgue p.d.f. for the chi-square distribution χ^21 (see book Table 1.2). This is actually an important result in statistics.

Proposition 1.8.1. Let X be a random k-vector with a Lebesgue p.d.f. fX and let Y = g(X), where g is a Borel function from (Rk, Bk) to (Rk, Bk). Let A 1 ,... , Am be disjoint sets in Bk^ such that Rk^ − (A 1 ∪ · · · ∪ Am) has Lebesgue measure 0 and g on Aj is one-to-one with a nonvanishing Jacobian, i.e., the determinant Det(∂g(x)/∂x) 6 = 0 on Aj , j = 1,... , m. Then Y has the following Lebesgue p.d.f.:

fY (x) =

∑^ m

j=

|Det(∂hj (x)/∂x)|fX (hj (x)),

where hj is the inverse function of g on Aj , j = 1,... , m.

Note: in previous example A 1 = (−∞, 0), A 2 = (0, ∞), g(x) = x^2 , h 1 (x) = −

x, h 2 (x) =

x, and |dhj (x)/dx| = 1/(

x). (Other examples in discussion).

Example 1.8.3 ( t-distribution and F-distribution). Let X 1 and X 2 be indepen- dent random variables having the chi-square distributions χ^2 n 1 and χ^2 n 2 (book Table 1.2), respectively. The p.d.f. of Z = X 1 /X 2 is

fZ(z) =

zn^1 /^2 −^1 I(0, ∞)(z) 2 (n^1 +n^2 )/^2 Γ(n 1 /2)Γ(n 2 /2)

0

x( 2 n 1 +n^2 )/^2 −^1 e−(1+z)x^2 /^2 dx 2

Γ[(n1 + n2)/2] Γ(n 1 /2)Γ(n 2 /2)

zn^1 /^2 −^1 (1 + z)(n^1 +n^2 )/^2

I(0, ∞)(z)

Using Proposition 1.8.1, one can show that the p.d.f. of Y = (X 1 /n 1 )/(X 2 /n 2 ) = (n 2 /n 1 )Z is the p.d.f. of the F -distribution Fn 1 ,n 2 given in Table 1.2 of the book.

Remark 1.8.1. Let U 1 be a random variable having the standard normal distri- bution N (0, 1) and U 2 a random variable having the chi-square distribution χ^2 n. Using the same argument, one can show that if U 1 and U 2 are independent, then the distribution of T = U 1 /

U 2 /n is the t-distribution tn given in Table 1.2 of the text.

1.9 Noncentral chi-square distribution

Let X 1 ,... , Xn be independent random variables and Xi = N (μi, σ 2 ), i = 1, ..., n. The distribution of Y = (X 12 + · · · + X n^2 )/σ^2 is called the noncentral chi-square distribution and denoted by χ^2 n(δ), where δ = (μ^21 +· · ·+μ^2 n)/σ^2 ) is the noncentrality parameter. χ^2 k(δ) with δ = 0 is called a central chi-square distribution. It can be shown (exercise) that Y has the following Lebesgue p.d.f.:

e−δ/^2

∑^ ∞

j=

(δ/2)j j!

f 2 j+n(x)

where fk(x) is the Lebesgue p.d.f. of the chi-square distribution χ^2 k. If Y 1 ,... , Yk are independent random variables and Yi has the noncentral chi-square distribution χ^2 ni (δi), i = 1, ..., k, then Y = Y 1 +· · ·+Yk has the noncentral chi-square distribution χ^2 n 1 +···+nk (δ 1 + · · · + δk). In similar manner one may define noncentral t-distribution and F -distribution (in discussion).

Theorem 1.9.1 (Cochran). Suppose that X = Nn(μ, In) and

X>X = X>A 1 X + · · · + X>AkX,

where In is the n×n identity matrix and Ai is an n×n symmetric matrix with rank ni, i = 1,... , k. A necessary and sufficient condition that X>AiX has the noncen- tral chi-square distribution χ^2 ni (δi), i = 1,... k, and X>AiXs are independent is n = n 1 + · · · + nk , in which case δi = μ>Aiμ and δ 1 + · · · + δk = μ>μ.