Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Taylor app delta method, Lecture notes of Probability and Statistics

Indian Statistical Institute Probability and Statistics

Taylor app delta method Taylor app delta method

Typology: Lecture notes

2016/2017

Uploaded on 09/25/2017

SAYAN-ROUNAK 🇮🇳

3 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Taylor Approximation and the Delta Method

Alex Papanicolaou∗

April 28, 2009

1 Taylor Approximation

1.1 Motivating Example: Estimating the odds

Suppose we observe X1, . . . , Xnindependent Bernoulli(p) random variables. Typically, we are

interested in pbut there is also interest in the parameter p

1−p, which is known as the odds. For

example, if the outcomes of a medical treatment occur with p= 2/3, then the odds of getting better

is 2 : 1. Furthermore, if there is another treatment with success probability r, we might also be

interested in the odds ratio p

1−p/r

1−r, which gives the relative odds of one treatment over another.

If we wished to estimate p, we would typically estimate this quantity with the observed success

probability ˆp=PiXi/n. To estimate the odds, it then seems perfectly natural to use ˆp

1−ˆpas an

estimate for p

1−p. But whereas we know the variance of our estimator ˆpis p(1 −p) (check this

be computing var(ˆp)), what is the variance of ˆp

1−ˆp? Or, how can we approximate its sampling

distribution?

The Delta Method gives a technique for doing this and is based on using a Taylor series approxi-

mation.

1.2 The Taylor Series

Definition: If a function g(x)has derivatives of order r, that is g(r)(x) = dr

dxrg(x)exists, then for

any constant a, the Taylor polynomial of order rabout ais

Tr(x) =

r

X

k=0

g(k)(a)

k!(x−a)k.

While the Taylor polynomial was introduced as far back as beginning calculus, the major theorem

from Taylor is that the remainder from the approximation, namely g(x)−Tr(x), tends to 0 faster

than the highest-order term in Tr(x).

Theorem: If g(r)(a) = dr

dxrg(x)|x=aexists, then

lim

x→a

g(x)−Tr(x)

(x−a)r= 0.

∗The material here is almost word for word from pp. 240-245 of Statistical Inference by George Casella and Roger

L. Berger and credit is really to them.

1

Partial preview of the text

Download Taylor app delta method and more Lecture notes Probability and Statistics in PDF only on Docsity!

Taylor Approximation and the Delta Method

Alex Papanicolaou∗

April 28, 2009

1 Taylor Approximation

1.1 Motivating Example: Estimating the odds

Suppose we observe X 1 ,... , Xn independent Bernoulli(p) random variables. Typically, we are interested in p but there is also interest in the parameter (^1) −pp , which is known as the odds. For example, if the outcomes of a medical treatment occur with p = 2/3, then the odds of getting better is 2 : 1. Furthermore, if there is another treatment with success probability r, we might also be interested in the odds ratio (^1) −pp / (^1) −rr , which gives the relative odds of one treatment over another.

If we wished to estimate p, we would typically estimate this quantity with the observed success probability ˆp =

i Xi/n. To estimate the odds, it then seems perfectly natural to use^

pˆ 1 −pˆ as an estimate for (^1) −pp. But whereas we know the variance of our estimator ˆp is p(1 − p) (check this

be computing var(ˆp)), what is the variance of (^1) −pˆˆp? Or, how can we approximate its sampling distribution?

The Delta Method gives a technique for doing this and is based on using a Taylor series approxi- mation.

1.2 The Taylor Series

Definition: If a function g(x) has derivatives of order r, that is g(r)(x) = d r dxr^ g(x)^ exists, then for any constant a, the Taylor polynomial of order r about a is

Tr(x) =

∑^ r

k=

g(k)(a) k! (x − a)k.

While the Taylor polynomial was introduced as far back as beginning calculus, the major theorem from Taylor is that the remainder from the approximation, namely g(x) − Tr(x), tends to 0 faster than the highest-order term in Tr(x).

Theorem: If g(r)(a) = d

r dxr^ g(x)|x=a^ exists, then

x^ lim→a

g(x) − Tr(x) (x − a)r^

∗The material here is almost word for word from pp. 240-245 of Statistical Inference by George Casella and Roger L. Berger and credit is really to them.

For the purposes of the Delta Method, we will only be considering r = 1. Furthermore, we will not be concerned with the remainder term since, (1), we are interested in approximations, and (2), we will have a nice convergence result that says from a probabilistic point of view, the remainder will vanish.

1.3 Applying the Taylor Theorem

Let’s now put the first-order Taylor polynomial to use from a statistical point of view: Let T 1 ,... , Tk be random variables with means θ 1 ,... , θk, and define T = (T 1 ,... , Tk) and θ = (θ 1 ,... , θk). Suppose there is a differentiable function g(T ) (say, an estimator of some parameter. In our motivating example, T = p and g(p) = (^1) −pp ) for which we want an estimate of variance. Define the partial derivatives as

g i′(θ) =

∂ti

g(t)|t 1 =θ 1 ,...,tk =θk ,

where t = (t 1 ,... , tk) is just any k-dimensional coordinates. The first-order Taylor series expansion (this is actually coming from the multivariate version of the Taylor series which shall be addressed later) of g about θ is

g(t) = g(θ) +

∑^ k

i=

g′ i(θ)(ti − θi) + Remainder.

So far, we have done nothing special. Now, let’s turn this into a statistical approximation by bringing in T and dropping the remainder. This gives

g(T ) ≈ g(θ) +

∑^ k

i=

g′ i(θ)(Ti − θi). (1)

Continuing, let’s take expectations on both sides (noticing that everything but the Ti terms on the right-hand side are non-random) to get

Eg(T ) ≈ g(θ) +

∑^ k

i=

g i′(θ)E(Ti − θi)

= g(θ). (2)

We can also approximate the variance of g(T ) by

Varg(T ) ≈ E[(g(T ) − g(θ))^2 ] From Eq. (2).

≈ E

∑k i=1g ′ i(θ)(Ti^ −^ θi)

From Eq. (1).

∑^ k

i=

g′ i(θ)^2 VarTi + 2

i>j

g′ i(θ)g j′ (θ)Cov(Ti, Tj ), (3)

where the last equality comes from expanding the square and using the definitions of variance and covariance. Notice that we have approximated the variance to our estimator g(T ) using only the variances and covariances of the Ti, which if the problem is set up well, are not terribly difficult to compute or estimate. Let’s now put this to work.

2.2 Delta Method: A Generalized CLT

Theorem: Let Yn be a sequence of random variables that satisfies

n(Yn − θ) → N (0, σ^2 ) in distribution. For a given function and a specific value of θ, suppose that g′(θ) exists and is not 0. Then, (^) √ n(g(Yn) − g(θ)) → N (0, σ^2 g′(θ)^2 ) in distribution. Proof: The Taylor expansion of g(Yn) around Yn = θ is g(Yn) = g(θ) + g′(θ)(Yn − θ) + Remainder,

where the remainder → 0 as Yn → θ. From the assumption that Yn satisfies the standard CLT, we have Yn → θ in probability, so it follows that the remainder → 0 in probability as well. Rearranging terms, we have (^) √ n(g(Yn) − g(θ)) = g′(θ)

n(Yn − θ) + Remainder.

Applying Slutsky’s Theorem with Wn = g′(θ)

n(Yn − θ) and Zn as the remainder, we have the right-hand side converging to N (0, σ^2 g′(θ)^2 ), and thus the desired result follows.

2.3 Continuation: Approximate Mean and Variance

Before, we considered the case of just estimating g(μ) with g(X). Suppose now we have took an i.i.d. random sample of a population to get X 1 ,... , Xn to get a sample mean X̂ n^1 For μ 6 = 0, from the Delta Method we have

√ n(

X̂

μ

) → N

μ

VarX 1

in distribution.

This is pretty good! But what if we don’t know the variance of X 1? Furthermore, we’re trying to estimate 1/μ and the variance on the right-hand side requires knowledge of μ. This actually poses no major problem since we shall just estimate everything to get the approximate variance

̂ Var

X̂

S^2 ,

where ˆσ^2 is an estimate of the variance of X 1 , say the sample variance. Now, we know that both ( X̂ and ˆσ^2 are consistent estimators in that X̂ → μ and ˆσ^2 → σ^2 in probability. Thus, 1 X̂

ˆσ →

1 μ

σ in probability. This allows us to apply Slutsky’s Theorem to get

√ n

1 X̂ −^

1 μ

1 X̂

σ ˆ

1 μ

σ ( 1 X̂

σ ˆ

n

1 X̂ −^

1 μ

σ

→ N (0, 1)

in distribution. It bears pointing out that the written form of the convergence has changed since now our parameters that were once in the limiting distribution are estimates dependent on n. It

would not make much sense to have convergence of

n

1 X̂ −^

1 μ

to a distribution with variance

dependent on n. (^1) The statement of the Delta Method allows for great generality of sequences Yn satisfying the CLT. This is because there are multiple forms of the CLT. Typically, the sample mean is used in these types of approximations and from elementary probability, we know the sample mean is one such sequence of random variables that satisfies the CLT.

3 Second-Order Delta Method

A natural question to ask is, in all the above work, what happens if g′(θ) = 0? To answer this, we go back to the Taylor expansion. Using the notation from the Delta Method theorem, we add the second-order term to get

g(Yn) = g(θ) + g′(θ)(Yn − θ) + g′′(θ) 2

(Yn − θ)^2 + Remainder.

Since g′(θ) = 0, this gives

g(Yn) − g(θ) =

g′′(θ) 2 (Yn − θ)^2 + Remainder.

Now, just like

n(Yn − θ)/σ^2 → N (0, 1) in distribution, we also have

n(Yn − θ)^2 σ

→ χ^21

in distribution where χ^21 is a chi-squared random variable with 1 degree of freedom. This new convergence is all very natural because we are now dealing with a second-order term. The first- order approximation converged to a Gaussian random variable so we could reasonably guess that the second-order term would converge to the square of a Gaussian, which just so happens to be a chi-squared random variable. In precise terms, we give the Second-Order Delta Method:

Theorem: (Second-Order Delta Method) Let Yn be a sequence of random variables that satisfies

n(Yn − θ) → N (0, σ^2 ) in distribution. For a given function g and a specific value of θ, suppose that g′(θ) = 0 and ′′(θ) exists and is not 0. Then

n(g(Yn) − g(θ)) → σ^2 g′′(θ) 2 χ^21 in distribution.

4 Multivariate Delta Method

We have actually already seen the multivariate precursor to the multivariate extension to the Delta Method. We use an example to illustrate the usage.

4.1 Moments of a Ratio Estimator

Suppose X and Y are random variables with nonsero means μX and μY , respectively. The para- metric function to be estimated is g(μX , μY ) = μX /μY. It is straightforward to calculate

∂ ∂μX g(μX , μY ) =

μY

and ∂ ∂μY g(μX , μY ) = −

μX μ^2 Y

Taylor app delta method, Lecture notes of Probability and Statistics

Related documents

Partial preview of the text

Download Taylor app delta method and more Lecture notes Probability and Statistics in PDF only on Docsity!

Taylor Approximation and the Delta Method

Alex Papanicolaou∗

April 28, 2009

1 Taylor Approximation

1.1 Motivating Example: Estimating the odds

1.2 The Taylor Series

1.3 Applying the Taylor Theorem

2.2 Delta Method: A Generalized CLT

2.3 Continuation: Approximate Mean and Variance

X̂

) → N

X̂

X̂

S^2 ,

→ N (0, 1)

4.1 Moments of a Ratio Estimator