Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Independent Sums and Large Deviations, Lecture notes of Probability and Statistics

Sharp large deviation results for sums of independent random variables by Xiequan Fan, Ion Grama, Quansheng Liu

Typology: Lecture notes

2020/2021

Uploaded on 06/21/2021

jeny
jeny 🇺🇸

4.6

(14)

251 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Sharp large deviation results for sums of independent random variables
Xiequan Fan a,b,, Ion Gramac, Quansheng Liuc,d
aRegularity Team, INRIA Saclay, Palaiseau, 91120, France
bMAS Laboratory, Ecole Centrale Paris, 92295 Chˆatenay-Malabry, France
cLMBA, UMR 6205, Univ. Bretagne-Sud, Vannes, 56000, France
dCollege of Mathematics and Computing Science, Changsha University of Science and Technology, Changsha, 410004, China
Abstract
We show sharp bounds for probabilities of large deviations for sums of independent random variables satisfying
Bernstein’s condition. One such bound is very close to the tail of the standard Gaussian law in certain case;
other bounds improve the inequalities of Bennett and Hoeffding by adding missing factors in the spirit of
Talagrand (1995). We also complete Talagrand’s inequality by giving a lower bound of the same form, leading
to an equality. As a consequence, we obtain large deviation expansions similar to those of Cram´er (1938),
Bahadur-Rao (1960) and Sakhanenko (1991). We also show that our bound can be used to improve a recent
inequality of Pinelis (2014).
Keywords: Bernstein’s inequality, sharp large deviations, Cram´er large deviations, expansion of
Bahadur-Rao, sums of independent random variables, Bennett’s inequality, Hoeffding’s inequality
2000 MSC: primary 60G50; 60F10; secondary 60E15, 60F05
1. Introduction
Let ξ1, ..., ξnbe a finite sequence of independent centered random variables (r.v.’s). Denote by
Sn=
n
i=1
ξiand σ2=
n
i=1
E[ξ2
i].(1)
Starting from the seminal work of Cram´er [13] and Bernstein [10], the estimation of the tail probabilities
P(Sn> x),for large x > 0,has attracted much attention. Various precise inequalities and asymptotic results
have been established by Hoeffding [25], Nagaev [32], Saulis and Statulevicius [41], Chaganty and Sethuraman
[12] and Petrov [35] under different backgrounds.
Assume that (ξi)i=1,...,n satisfies Bernstein’s condition
|E[ξk
i]| 1
2k!εk2E[ξ2
i],for k3 and i= 1, ..., n, (2)
for some constant ε > 0. By employing the exponential Markov inequality and an upper bound for the moment
generating function E[eλξi], Bernstein [10] (see also Bennett [3]) has obtained the following inequalities: for all
x0,
P(Sn> )inf
λ0E[eλ(Sn)] (3)
Corresponding author.
E-mail: fanxiequan@hotmail.com (X. Fan), ion.grama@univ-ubs.fr (I. Grama),
quansheng.liu@univ-ubs.fr (Q. Liu).
Preprint submitted to Elsevier August 28, 2015
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Independent Sums and Large Deviations and more Lecture notes Probability and Statistics in PDF only on Docsity!

Sharp large deviation results for sums of independent random variables

Xiequan Fan a,b,∗, Ion Gramac, Quansheng Liuc,d aRegularity Team, INRIA Saclay, Palaiseau, 91120, France bMAS Laboratory, Ecole Centrale Paris, 92295 Chˆatenay-Malabry, France cLMBA, UMR 6205, Univ. Bretagne-Sud, Vannes, 56000, France dCollege of Mathematics and Computing Science, Changsha University of Science and Technology, Changsha, 410004, China

Abstract

We show sharp bounds for probabilities of large deviations for sums of independent random variables satisfying Bernstein’s condition. One such bound is very close to the tail of the standard Gaussian law in certain case; other bounds improve the inequalities of Bennett and Hoeffding by adding missing factors in the spirit of Talagrand (1995). We also complete Talagrand’s inequality by giving a lower bound of the same form, leading to an equality. As a consequence, we obtain large deviation expansions similar to those of Cram´er (1938), Bahadur-Rao (1960) and Sakhanenko (1991). We also show that our bound can be used to improve a recent inequality of Pinelis (2014).

Keywords: Bernstein’s inequality, sharp large deviations, Cram´er large deviations, expansion of Bahadur-Rao, sums of independent random variables, Bennett’s inequality, Hoeffding’s inequality

2000 MSC: primary 60G50; 60F10; secondary 60E15, 60F

  1. Introduction

Let ξ 1 , ..., ξn be a finite sequence of independent centered random variables (r.v.’s). Denote by

Sn =

∑^ n

i=

ξi and σ^2 =

∑^ n

i=

E[ξ i^2 ]. (1)

Starting from the seminal work of Cram´er [13] and Bernstein [10], the estimation of the tail probabilities P (Sn > x) , for large x > 0 , has attracted much attention. Various precise inequalities and asymptotic results have been established by Hoeffding [25], Nagaev [32], Saulis and Statulevicius [41], Chaganty and Sethuraman [12] and Petrov [35] under different backgrounds. Assume that (ξi)i=1,...,n satisfies Bernstein’s condition

|E[ξki ]| ≤

k!εk−^2 E[ξ^2 i ], for k ≥ 3 and i = 1, ..., n, (2)

for some constant ε > 0. By employing the exponential Markov inequality and an upper bound for the moment generating function E[eλξi^ ], Bernstein [10] (see also Bennett [3]) has obtained the following inequalities: for all x ≥ 0,

P(Sn > xσ) ≤ inf λ≥ 0 E[eλ(Sn−xσ)] (3)

∗Corresponding author. E-mail: fanxiequan@hotmail.com (X. Fan), ion.grama@univ-ubs.fr (I. Grama), quansheng.liu@univ-ubs.fr (Q. Liu).

Preprint submitted to Elsevier August 28, 2015

≤ B

x,

ε σ

:= exp

xb^2 2

≤ exp

x^2 2(1 + xε/σ)

where

x b = 2 x 1 +

1 + 2xε/σ

see also van de Geer and Lederer [47] with a new method based on Bernstein-Orlicz norm and Rio [40]. Some extensions of the inequalities of Bernstein and Bennett can be found in van de Geer [46] and de la Pe˜na [14] for martingales; see also Rio [38, 39] and Bousquet [11] for the empirical processes with r.v.’s bounded from above.

Since limε/σ→ 0 P(Sn > xσ) = 1 − Φ(x) and limε/σ→ 0 B

x, εσ

= e−x (^2) / 2 , where Φ(x) = √^12 π

∫ (^) x −∞ e

− t (^22) dt is

the standard normal distribution function, the central limit theorem (CLT) suggests that Bennett’s inequality (4) can be substantially refined by adding the factor

M (x) =

1 − Φ(x)

exp

x^2 2

where

2 πM (x) is known as Mill’s ratio. It is known that M (x) is of order 1/x as x → ∞. To recover a factor of order 1/x as x → ∞ a lot of effort has been made. Certain factors of order 1/x have been recovered by using the following inequality: for some α > 1 ,

P(Sn ≥ xσ) ≤ inf t<xσ

E

[

((Sn − t)+)α ((xσ − t)+)α

]

where x+^ = max{x, 0 }; see Eaton [17], Bentkus [4], Pinelis [36] and Bentkus et al. [7]. Some bounds on tail probabilities of type

P(Sn ≥ xσ) ≤ C

1 − Φ(x)

where C > 1 is an absolute constant, are obtained for sums of weighted Rademacher r.v.’s; see Bentkus [4]. In particular, Bentkus and Dzindzalieta [6] proved that

C =

is sharp in (7). When the summands ξi are bounded from above, results of such type have been obtained by Talagrand [45], Bentkus [5] and Pinelis [37]. Using the conjugate measure technique, Talagrand (cf. Theorems 1.1 and 3.3 of [45]) proved that if the r.v.’s satisfy ξi ≤ 1 and |ξi| ≤ b for a constant b > 0 and all i = 1, ..., n, then there exists an universal constant K such that, for all 0 ≤ x ≤ (^) Kbσ ,

P(Sn > xσ) ≤ inf λ≥ 0 E[eλ(Sn−xσ)]

M (x) + K

b σ

≤ Hn(x, σ)

M (x) + K b σ

where

Hn(x, σ) =

σ x + σ

)xσ+σ^2 ( n n − xσ

)n−xσ }^ n+nσ 2 .

The interesting feature of the bound (10) is that it decays exponentially to 0 and also recovers closely the shape of the standard normal tail 1 − Φ(x) when r = (^) σε becomes small, which is not the case of Bennett’s bound B(x, (^) σε ) and Berry-Essen’s bound

P(Sn > xσ) ≤ 1 − Φ (x) + C ε σ

Our result can be compared with Cram´er’s large deviation result in the i.i.d. case (cf. (34)). With respect to Cram´er’s result, the advantage of (10) is that it is valid for all x ≥ 0. Notice that Theorem 2.1 improves Bennett’s bound only for moderate x. A further significant improvement of Bennett’s inequality (4) for all x ≥ 0 is given by the following theorem: We replace Bennett’s bound B

x, (^) σε

by the following smaller one:

Bn

x,

ε σ

= B

x,

ε σ

exp

−nψ

xb^2 2 n

1 + 2xε/σ

where ψ(t) = t − log(1 + t) is a nonnegative convex function in t ≥ 0.

Theorem 2.2. For all x ≥ 0 ,

P(Sn > xσ) ≤ Bn

x,

ε σ

F 2

x,

ε σ

≤ Bn

x,

ε σ

where

F 2

x,

ε σ

M (x) + 27. 99 R (xε/σ)

ε σ

and

R(t) =

(1−t+6t^2 )^3 (1− 3 t)^3 /^2 (1−t)^7 ,^ if^0 ≤^ t <^

1 3 , ∞, if t ≥ 13 ,

is an increasing function. Moreover, for all 0 ≤ x ≤ α σε with 0 ≤ α < 13 , it holds R(xε/σ) ≤ R(α). If α = 0. 1 , then 27. 99 R(α) ≤ 88. 41.

To highlight the improvement of Theorem 2.2 over Bennett’s bound, we note that Bn(x, (^) σε ) ≤ B(x, εσ ) and, in the i.i.d. case (or, more generally when εσ = √c^0 n , for some constant c 0 > 0),

Bn

nx, ε σ

= B

nx, ε σ

exp {−cx n} , (17)

where cx > 0, x > 0, does not depend on n. Thus Bennett’s bound is strengthened by adding a factor exp {−cx n} , n → ∞, which is similar to Hoeffding’s improvement on Bennett’s bound for sums of bounded r.v.’s [25]. The second improvement in the right-hand side of (13) comes from the missing factor F 2 (x, (^) σε ), which is of order M (x)[1 + o(1)] for moderate values of x satisfying 0 ≤ x = o( σε ), εσ → 0. This improvement is similar to Talagrand’s refinement on Hoeffding’s upper bound Hn(x, σ) by the factor F 1 (x, b/σ); see (9). The numerical values of the missing factor F 2 (x, (^) σε ) are displayed in Figure 1. Our numerical results confirm that the bound Bn(x, εσ )F 2 (x, εσ ) in (13) is better than Bennett’s bound B(x, (^) σε ) for all x ≥ 0. For the convenience of the reader, we display the ratios of Bn(x, r)F 2 (x, r) to B(x, r) in Figure 2 for various r = √^1 n.

The following corollary improves inequality (10) of Theorem 2.1 in the range 0 ≤ x ≤ α σε with 0 ≤ α < 13. It corresponds to taking δ = 0 in the definition (11) of ex.

0 10 20 30 40

The missing factor F 2 (x, r)

x

F^2 (

x,^ r

)

r = 0.

r = 0.

r = 0. r = 0.

Figure 1: The missing factor F 2 (x, r) is displayed as a function of x for various values of r = (^) σε.

0 5 10 15 20 25 30

Ratio of bound (13) to B(x, r)

x

Ratio

r = 0. r = 0.

r = 0. r = 0. r = 0.

Figure 2: Ratio of Bn (x, r) F 2 (x, r) to B(x, r) as a function of x for various values of r = εσ = √^1 n.

|θ| ≤ 1 and R(t) is defined by (16). Moreover, infλ≥ 0 E[eλ(Sn−xσ)] ≤ B(x, (^) σε ). In particular, in the i.i.d. case, we have the following non-uniform Berry-Esseen type bound: for all 0 ≤ x = o(

n),

P(Sn > xσ) − M (x) inf λ≥ 0

E[eλ(Sn−xσ)] ≤

C

n

B(x,

ε σ

Theorem 2.4 holds also for ξi’s bounded from above. In this case the term 27. 99 θR (4xε/σ) can be signifi- cantly refined; see [18]. In particular, if |ξi| ≤ ε, then 27. 99 θR (4xε/σ) can be improved to 3. 08. However, under the stated condition of Theorem 2.4, the term 27. 99 θR (4xε/σ) cannot be improved significantly. When Bernstein’s condition fails, we refer to Theorem 3.1 of Saulis and Statulevicius [41], where explicit and asymptotic expansions have been established via the Cram´er series (cf. Petrov [34] for details). When the Bernstein condition holds, their result reduces to the result of Cram´er [13]. However, they gave an explicit information on the term corresponding to our term 27. 99 θR (4xε/σ). Equality (22) shows that infλ≥ 0 E[eλ(Sn−xσ)] is the best possible exponentially decreasing rate on tail proba- bilities. It reveals the missing factor F 3 in Bernstein’s bound (3) (and thus in many other classical bounds such as Hoeffding, Bennett and Bernstein). Since θ ≥ −1, equality (22) completes Talagrand’s upper bound (8) by giving a sharp lower bound. If ξi are bounded from above ξi ≤ 1, it holds that infλ≥ 0 E[eλ(Sn−xσ)] ≤ Hn(x, σ) (cf. [25]). Therefore (22) implies Talagrand’s inequality (9). A precise large deviation expansion, as sharp as (22), can be found in Sakhanenko [43] (see also Gy¨orfi, Harrem¨oes and Tusn´ady [24]). In his paper, Sakhanenko proved an equality similar to (22) in a more narrow range 0 ≤ x ≤ 2001 σε ,

P(Sn > xσ) −

1 − Φ(tx)

≤ C

ε σ

e−t

(^2) x/ 2 , (25)

where

tx =

−2 ln

inf λ≥ 0 E[eλ(Sn−xσ)]

is a value depending on the distribution of Sn and satisfying |tx − x| = O(x^2 εσ ), εσ → 0 , for moderate x’s. It is worth noting that from Sakhanenko’s result, we find that the inequalities (24) and (27) hold also if M (x) is replaced by M (tx). Using the two sided bound 1 √ 2 π(1 + t)

≤ M (t) ≤

π(1 + t)

, t ≥ 0 , (26)

and

M (t) ∼

2 π(1 + t)

, t → ∞

(see p. 17 in It¯o and MacKean [22] or Talagrand [45]), equality (22) implies that the relative errors between P(Sn > xσ) and M (x) infλ≥ 0 E[eλ(Sn−xσ)] converges to 0 uniformly in the range 0 ≤ x = o

( (^) σ ε

as εσ → 0, i.e.

P(Sn > xσ) = M (x) inf λ≥ 0 E[eλ(Sn−xσ)]

1 + o(1)

Expansion (27) extends the following Cram´er large deviation expansion: for 0 ≤ x = o

√ 3 σ ε

as σε → ∞,

P(Sn > xσ) =

1 − Φ (x)

)[

1 + o(1)

]

To have an idea of the precision of expansion (27), we plot the ratio

Ratio(x, n) = P(Sn ≥ x

n) M (x) infλ≥ 0 E[eλ(Sn−x

√n) ]

in Figure 3 for the case of sums of Rademacher r.v.’s P(ξi = −1) = P(ξi = 1) = 12. From these plots we see that the error in (27) becomes smaller as n increases.

0 1 2 3 4 5 6 7

0

1

2

3

4

x

Ratio

n = 49

Ratio(x, n) 0 1 2 3 4 5 6 7

0

1

2

3

4

x

Ratio

n = 100

Ratio(x, n)

0 1 2 3 4 5 6 7

0

1

2

3

4

x

Ratio

n = 1000

Ratio(x, n)

Figure 3: The ratio Ratio(x, n) = P(Sn≥x

√n) M (x) infλ≥ 0 E[eλ(Sn−x √n) ] is displayed as a function of^ x^ for various^ n^ for sums of Rademacher r.v.’s.

  1. Some comparisons

3.1. Comparison with a recent inequality of Pinelis

In this subsection, we show that Theorem 2.4 can be used to improve a recent upper bound on tail proba- bilities due to Pinelis [37]. For simplicity of notations, we assume that ξi ≤ 1 and only consider the i.i.d. case. For other cases, the argument is similar. Let us recall the notations of Pinelis. Denote by Γa 2 the normal r.v. with mean 0 and variance a^2 > 0, and Πθ the Poisson r.v. with parameter θ > 0. Let also

Π^ eθ ∼ Πθ − θ.

Denote by

δ =

∑n i=1 E[(ξ

i )

3 ]

σ^2

Then it is obvious that δ ∈ (0, 1). Pinelis (cf. Corollary 2.2 of [37]) proved that: for all y ≥ 0 ,

P(Sn > y) ≤

2 e^3 9

PLC^ (Γ(1−δ)σ 2 + Πeδσ 2 > y), (30)

where, for any r.v. ζ, the function PLC^ (ζ > y) denotes the least log-concave majorant of the tail function P(ζ > y). So that PLC^ (ζ > y) ≥ P(ζ > y). By the remark of Pinelis, inequality (30) refines the Bennet-Hoeffding inequality by adding a factor of order (^1) x in certain range. By Theorem 2.4 and some simple calculations, we find that, for all 0 ≤ y = o(n),

PLC^ (Γ(1−δ)σ 2 + Πeδσ 2 > y)

Cram´er [13] (see also Theorem 3.1 of Saulis and Statulevicius [41] for more general results) proved that, for all 0 ≤ x = o(

n), P(Sn > xσ) 1 − Φ(x)

= exp

x^3 √ n

λ

x √ n

)} [

1 + O

1 + x √ n

)]

, n → ∞, (34)

where λ(·) is the Cram´er series. So the good rate function and the Cram´er series have the relation n Λ∗( √xn ) =

x^2 2 −^ √x^3 n λ

√x n

. Second, consider the large deviation probabilities P

( (^) Sn n > y

. Since S nn → 0 , a.s., as n → ∞,

we only place emphasis on the case where y is small positive constant. Bahadur-Rao proved that, for given positive constant y,

P

Sn n

y

e−n^ Λ ∗(y)

σ 1 y ty

2 πn

[

1 + O(

cy n

]

, n → ∞, (35)

where cy , σ 1 y and ty depend on y and the distribution of ξ 1 ; see also Bercu [8, 9], Rozovky [31] and Gy¨orfi, Harrem¨oes and Tusn´ady [24] for more general results. Our bound (22) implies that, for y ≥ 0 small enough,

P

Sn n

y

= e−n^ Λ

∗(y) M (y

n)

[

1 + O(y +

n

]

In particular, when 0 < y = y(n) → 0 and y

n → ∞ as n → ∞, we have

P

Sn n

y

e−n^ Λ

∗(y)

y

2 πn

[

1 + o(1)

]

, n → ∞. (37)

Expansion (36) or (37) is less precise than (35). However, the advantage of the expansions (36) and (37) over the Bahadur-Rao expansion (35) is that the expansions (36) or (37) are uniform in y (where y may be dependent of n), in addition to the simpler expressions (without the factors ty and σy ).

  1. Auxiliary results

We consider the positive r.v.

Zn(λ) =

∏^ n

i=

eλξi E[eλξi^ ] , |λ| < ε−^1 ,

(the Esscher transformation) so that E[Zn(λ)] = 1. We introduce the conjugate probability measure Pλ defined by dPλ = Zn(λ)dP. (38)

Denote by Eλ the expectation with respect to Pλ. Setting

bi(λ) = Eλ[ξi] =

E[ξieλξi^ ] E[eλξi^ ]

, i = 1, ..., n,

and ηi(λ) = ξi − bi(λ), i = 1, ..., n,

we obtain the following decomposition:

Sk = Tk(λ) + Yk(λ), k = 1, ..., n, (39)

where

Tk(λ) =

∑^ k

i=

bi(λ) and Yk(λ) =

∑^ k

i=

ηi(λ).

In the following, we give some lower and upper bounds of Tn(λ), which will be used in the proofs of theorems.

Lemma 4.1. For all 0 ≤ λ < ε−^1 ,

(1 − 2. 4 λε)λσ^2 ≤

(1 − 1. 5 λε)(1 − λε) 1 − λε + 6λ^2 ε^2

λσ^2 ≤ Tn(λ) ≤

1 − 0. 5 λε (1 − λε)^2

λσ^2.

Proof. Since E[ξi] = 0, by Jensen’s inequality, we have E[eλξi^ ] ≥ 1. Noting that

E[ξieλξi^ ] = E[ξi(eλξi^ − 1)] ≥ 0 , λ ≥ 0 ,

by Taylor’s expansion of ex, we get

Tn(λ) ≤

∑^ n

i=

E[ξieλξi^ ]

= λσ^2 +

∑^ n

i=

k=

λk k!

E[ξk i +1]. (40)

Using Bernstein’s condition (2), we obtain, for all 0 ≤ λ < ε−^1 ,

∑^ n

i=

k=

λk k!

|E[ξk i +1]| ≤

λ^2 σ^2 ε

∑^ +∞

k=

(k + 1)(λε)k−^2

3 − 2 λε 2(1 − λε)^2 λ^2 σ^2 ε. (41)

Combining (40) and (41), we get the desired upper bound of Tn(λ). By Jensen’s inequality and Bernstein’s condition (2),

(E[ξ^2 i ])^2 ≤ E[ξ i^4 ] ≤ 12 ε^2 E[ξ i^2 ],

from which we get E[ξ^2 i ] ≤ 12 ε^2.

Using again Bernstein’s condition (2), we have, for all 0 ≤ λ < ε−^1 ,

E[eλξi^ ] ≤ 1 +

k=

λk k!

|E[ξki ]|

λ^2 E[ξ i^2 ] 2(1 − λε)

≤ 1 +

6 λ^2 ε^2 1 − λε

= 1 − λε + 6λ^2 ε^2 1 − λε

Notice that g(t) = et^ − (1 + t + 12 t^2 ) satisfies that g(t) > 0 if t > 0 and g(t) < 0 if t < 0 , which leads to tg(t) ≥ 0 for all t ∈ R. That is, tet^ ≥ t(1 + t + 12 t^2 ) for all t ∈ R. Therefore, for all 0 ≤ λ < ε−^1 ,

ξieλξi^ ≥ ξi

1 + λξi +

λ^2 ξ i^2 2

Taking expectation, we get

E[ξieλξi^ ] ≥ λE[ξ^2 i ] + λ^2 2

E[ξ^3 i ] ≥ λE[ξ i^2 ] − λ^2 2

3!εE[ξ i^2 ] = (1 − 1. 5 λε)λE[ξ^2 i ],

which completes the proof of the second assertion of the lemma.  Denote σ^2 (λ) = Eλ[Y (^) n^2 (λ)]. By the relation between E and Eλ, we have

σ^2 (λ) =

∑^ n

i=

E[ξ i^2 eλξi^ ] E[eλξi^ ]

(E[ξieλξi^ ])^2 (E[eλξi^ ])^2

, 0 ≤ λ < ε−^1.

Lemma 4.3. For all 0 ≤ λ < ε−^1 ,

(1 − λε)^2 (1 − 3 λε) (1 − λε + 6λ^2 ε^2 )^2

σ^2 ≤ σ^2 (λ) ≤

σ^2 (1 − λε)^3

Proof. Denote f (λ) = E[ξ i^2 eλξi^ ]E[eλξi^ ] − (E[ξieλξi^ ])^2. Then,

f ′(0) = E[ξ^3 i ] and f ′′(λ) = E[ξ^4 i eλξi^ ]E[eλξi^ ] − (E[ξ i^2 eλξi^ ])^2 ≥ 0.

Thus,

f (λ) ≥ f (0) + f ′(0)λ = E[ξ^2 i ] + λE[ξ i^3 ]. (48)

Using (48), (42) and Bernstein’s condition (2), we have, for all 0 ≤ λ < ε−^1 ,

Eλ[η i^2 ] =

E[ξ^2 i eλξi^ ]E[eλξi^ ] − (E[ξieλξi^ ])^2 (E[eλξi^ ])^2

≥ E[ξ^2 i ] + λE[ξ^3 i ] (E[eλξi^ ])^2

1 − λε 1 − λε + 6λ^2 ε^2

(E[ξ^2 i ] + λE[ξ^3 i ])

(1 − λε)^2 (1 − 3 λε) (1 − λε + 6λ^2 ε^2 )^2

E[ξ^2 i ].

Therefore

σ^2 (λ) ≥

(1 − λε)^2 (1 − 3 λε) (1 − λε + 6λ^2 ε^2 )^2

σ^2.

Using Taylor’s expansion of ex^ and Bernstein’s condition (2) again, we obtain

σ^2 (λ) ≤

∑^ n

i=

E[ξ^2 i eλξi^ ] ≤

σ^2 (1 − λε)^3

This completes the proof of Lemma 4.3.  For the r.v. Yn(λ) with 0 ≤ λ < ε−^1 , we have the following result on the rate of convergence to the standard normal law.

Lemma 4.4. For all 0 ≤ λ < ε−^1 ,

sup y∈R

Yn(λ) σ(λ)

≤ y

− Φ(y) ≤ 13. 44

σ^2 ε σ^3 (λ)(1 − λε)^4

Proof. Since Yn(λ) =

∑n i=1 ηi(λ) is the sum of independent and centered (respect to^ Pλ) r.v.’s^ ηi(λ), using standard results on the rate of convergence in the central limit theorem (cf. e.g. Petrov [34], p. 115) we get, for 0 ≤ λ < ε−^1 ,

sup y∈R

Yn(λ) σ(λ)

≤ y

− Φ(y) ≤ C 1

σ^3 (λ)

∑^ n

i=

Eλ[|ηi|^3 ],

where C 1 > 0 is an absolute constant. For 0 ≤ λ < ε−^1 , using Bernstein’s condition, we have

∑^ n

i=

Eλ[|ηi|^3 ] ≤ 4

∑^ n

i=

Eλ[|ξi|^3 + (Eλ[|ξi|])^3 ]

∑^ n

i=

Eλ[|ξi|^3 ]

∑^ n

i=

E[|ξi|^3 exp{|λξi|}]

∑^ n

i=

E

[ ∑∞

j=

λj j!

|ξi|3+j^

]

≤ 4 σ^2 ε

∑^ ∞

j=

(j + 3)(j + 2)(j + 1)(λε)j^.

As (^) ∞ ∑

j=

(j + 3)(j + 2)(j + 1)xj^ = d^3 dx^3

∑^ ∞

j=

xj^ =

(1 − x)^4

, |x| < 1 ,

we obtain, for 0 ≤ λ < ε−^1 , ∑n

i=

Eλ[|ηi|^3 ] ≤ 24

σ^2 ε (1 − λε)^4

Therefore, we have, for 0 ≤ λ < ε−^1 ,

sup y∈R

Yn(λ) σ(λ)

≤ y

− Φ(y) ≤ 24 C 1 σ^2 ε σ^3 (λ)(1 − λε)^4

≤ 13. 44

σ^2 ε σ^3 (λ)(1 − λε)^4

where the last step holds as C 1 ≤ 0 .56 (cf. Shevtsova [42]).  Using Lemma 4.4, we easily obtain the following lemma.

Lemma 4.5. For all 0 ≤ λ ≤ 0. 1 ε−^1 ,

sup y∈R

Yn(λ) ≤

yσ 1 − λε

− Φ(y) ≤ 1. 07 λε + 42. 45

ε σ

Proof. Using Lemma 4.3, we have, for all 0 ≤ λ < 13 ε−^1 ,

√ 1 − λε ≤

σ σ(λ)(1 − λε)

1 − λε + 6λ^2 ε^2 (1 − λε)^2

1 − 3 λε

It is easy to see that

Yn(λ) ≤

yσ 1 − λε

− Φ(y)

≤ Pλ

Yn(λ) σ(λ)

yσ σ(λ)(1 − λε)

yσ σ(λ)(1 − λε)

yσ σ(λ)(1 − λε)

− Φ(y)

=: I 1 + I 2.

This definition and Lemma 4.1 implies that

λ = 2 x/σ 1 + 2xε/σ +

1 + 4(1 − β)xε/σ

and Tn(λ) ≤ xσ. (53)

Using (52) with λ = λ, we get

P(Sn > xσ) ≤ e−^

(^12) (1+(1− 2 β)λε) x˜ 2

0

e−tPλ(0 < Un(λ) ≤ t)dt, (54)

where

ex =

λσ 1 − λε

By (53) and Lemma 4.5, we have, for 0 ≤ λ ≤ 0. 1 ε−^1 , ∫ (^) ∞

0

e−tPλ(0 < Un(λ) ≤ t)dt

0

e−y˜xPλ

0 < Un(λ) ≤ yex

exdy

0

e−y˜xP (0 < N (0, 1) ≤ y) xdye + 2

  1. 07 λε + 42. 45

ε σ

= M (xe) + 2. 14 λε + 84. 9

ε σ

Since

0 e

−tP λ(0^ < Un(λ)^ ≤^ t)dt^ ≤^ 1 and^ M^

− (^1) (t) ≤ √ 2 π (1 + t) for t ≥ 0 (cf. (26)), combining (54) and (55),

we deduce, for all x ≥ 0 ,

P(Sn > xσ) ≤ e−^

1 2 (1−^2 β)λε˜x^ (^2) − 1 2 ˜x^ 2 (^1) {λε> 0. 1 }

+e−^

(^12) (1− 2 β)λε˜x 2 [ 1 − Φ (ex) + e−^

(^12) x˜ 2 (

  1. 14 λε + 84. 9

ε σ

)]

(^1) {λε≤ 0. 1 } ≤ (1 − Φ (ex)) (I 11 + I 12 ), (56)

with

I 11 = exp

(1 − 2 β)λεxe^2

} [√

2 π (1 + xe)

]

(^1) {λε> 0. 1 } (57)

and

I 12 = e−^

(^12) (1− 2 β)λε˜x 2 [ 1 +

2 π (1 + ex)

  1. 14 λε + 84. 9

ε σ

)]

(^1) {λε≤ 0. 1 }.

Now we shall give estimates for I 11 and I 12. If λε > 0 .1, then I 12 = 0 and

I 11 ≤ exp

− 0 .1(1 − 2 β) ex^2 2

} [

2 π (1 + xe)

]

By a simple calculation, I 11 ≤ 1 provided that xe ≥ (^1) −^82 β (note that β ∈ [0, 0 .5)). For 0 ≤ x <e (^1) −^82 β , we get

λσ = ex(1 − λε) < (^1) −^82 β (1 − 0 .1) = (^17) −. 22 β. Then, using 10λε > 1, we obtain

I 11 ≤ 1 +

2 π (1 + ex) ≤ 1 + 10

2 π (1 + ex) λσ ε σ ≤ 1 +

1 − 2 β

(1 + xe) ε σ

If 0 ≤ λε ≤ 0 .1, we have I 11 = 0. Since

1 +

2 π (1 + ex)

  1. 14 λε + 84. 9 ε σ

2 π (1 + xe) λε

2 π (1 + xe)

ε σ

= J 1 J 2 ,

it follows that I 12 ≤ exp

− 12 (1 − 2 β)λεxe^2

J 1 J 2. Using the inequality 1 + x ≤ ex, we deduce

I 12 ≤ exp

−λε

(1 − 2 β)

xe^2 2

2 π (1 + ex)

J 2.

If xe ≥ (^111) −.^652 β , we see that 12 (1 − 2 β)ex^2 − 2. 14

2 π (1 + ex) ≥ 0, so I 12 ≤ J 2. For 0 ≤ ex < (^111) −. 265 β , we get

λσ = ex(1 − λε) < (^111) −.^652 β. Then

I 12 ≤ 1 +

2 π (1 + xe)

  1. 14 λε + 84. 9 ε σ

2 π (1 + xe)

1 − 2 β

ε σ

≤ 1 +

1 − 2 β

(1 + xe)

ε σ

Hence, whenever 0 ≤ λε < 1, we have

I 11 + I 12 ≤ 1 +

1 − 2 β

1 − 2 β

(1 + ex) ε σ

Therefore, substituting λ from (53) in the expression of xe = (^1) −λσλε and replacing 1− 2 β by δ, we obtain inequality

(10) in Theorem 2.1 from (56) and (59).

5.2. Proof of Theorem 2. For any x ≥ 0, let λ = λ(x) ∈ [0, ε−^1 ) be the unique solution of the equation λ − 0. 5 λ^2 ε (1 − λε)^2

x σ

By Lemma 4.1, it follows that

λ =

2 x/σ 1 + 2xε/σ +

1 + 2xε/σ

and Tn(λ) ≤ xσ. (61)

Using Lemma 4.4 and Tn(λ) ≤ xσ, we have, for all 0 ≤ λ < ε−^1 , ∫ (^) ∞

0

e−tPλ(0 < Un(λ) ≤ t)dt

0

e−yλσ(λ)Pλ

0 < Un(λ) ≤ yλσ(λ)

λσ(λ)dy

0

e−yλσ(λ)P (0 < N (0, 1) ≤ y) λσ(λ)dy + 26. 88

σ^2 ε σ^3 (λ)(1 − λε)^4

0

e−yλσ(λ)dΦ(y) + 26. 88

σ^2 ε σ^3 (λ)(1 − λε)^4

= F := M

λσ(λ)

σ^2 ε σ^3 (λ)(1 − λε)^4

This definition and Lemma 4.1 implies that, for all 0 ≤ x ≤ σ/(9. 6 ε),

λ = 2 x/σ 1 +

1 − 9. 6 xε/σ

and xσ ≤ Tn(λ). (68)

Therefore,

P(Sn > xσ) ≥ exp

λ^2 σ^2 2(1 − λε)^6

Eλ[e−λYn(λ) (^1) {Yn(λ)> 0 }].

Setting Vn(λ) = λYn(λ), we get

P(Sn > xσ) ≥ exp

ˇx^2 2

0

e−tPλ(0 < Vn(λ) ≤ t)dt, (69)

where ˇx = (^) (1−λσλε) 3. By Lemma 4.4 and an argument similar to that used to prove (62), it is easy to see that

∫ (^) ∞

0

e−tPλ(0 < Vn(λ) ≤ t)dt ≥ M

λσ(λ)

− G,

where G = 26. 88 σ

(^2) ε σ^3 (λ)(1−λε)^4.^ Since^ M^ (t) is decreasing in^ t^ ≥^ 0 and^ σ(λ)^ ≤^

σ (1−λε)^3 (cf. Lemma 4.3), it follows that ∫ (^) ∞

0

e−tPλ(0 < Vn(λ) ≤ t)dt ≥ M (ˇx) − G.

Returning to (69), we obtain

P(Sn > xσ) ≥ 1 − Φ (ˇx) − G exp

xˇ^2 2

Using Lemma 4.3, for all 0 ≤ x ≤ σ/(9. 6 ε), we have 0 ≤ λε ≤ 1 / 4 .8 and

G ≥ 26. 88 R

λε

) (^) ε σ

Therefore, for all 0 ≤ x ≤ σ/(9. 6 ε),

P(Sn > xσ) ≥ 1 − Φ (ˇx) − 26. 88 R

λε

) (^) ε σ exp

ˇx^2 2

Using the inequality M −^1 (t) ≤

2 π (1 + t) for t ≥ 0 , we get, for all 0 ≤ x ≤ σ/(9. 6 ε),

P(Sn > xσ) ≥

1 − Φ (ˇx)

) [

1 − 67. 38 R

λε

(1 + ˇx)

ε σ

]

In particular, for all 0 ≤ x ≤ ασ/ε with 0 ≤ α ≤ 1 / 9 .6, a simple calculation shows that

0 ≤ λε ≤

2 α 1 +

1 − 9. 6 α

and

  1. 38 R

λε

≤ 67. 38 R

2 α 1 +

1 − 9. 6 α

≤ 67. 38 R

This completes the proof of Theorem 2.3.

  1. Proof of Theorem 2.

Notice that Ψ′ n(λ) = Tn(λ) ∈ [0, ∞) is nonnegative in λ ≥ 0. Let λ = λ(x) ≥ 0 be the unique solution of the equation xσ = Ψ′ n(λ). This definition implies that Tn(λ) = xσ, Un(λ) = λYn(λ) and

e−λxσ+Ψn(λ)^ = inf λ≥ 0

e−λxσ+Ψn(λ)^ = inf λ≥ 0

E[eλ(Sn−xσ)]. (70)

From (51), using Lemma 4.4 with λ = λ and an argument similar to (62), we obtain

P(Sn > xσ) =

M

λσ(λ)

  1. 88 θ 1 σ^2 ε σ^3 (λ)(1 − λε)^4

inf λ≥ 0

E[eλ(Sn−xσ)], (71)

where |θ 1 | ≤ 1. Since M (t) is decreasing in t ≥ 0 and |M ′(t)| ≤ √π t^12 in t > 0, it follows that

M

λσ(λ)

− M (x) ≤

π

x − λσ(λ) λ

2 σ^2 (λ) ∧ x^2

By Lemma 4.1, we have the following two-sided bound of x:

(1 − 1. 5 λε)(1 − λε) 1 − λε + 6λ 2 ε^2

λσ ≤ Tn(λ) σ = x ≤

1 − 0. 5 λε (1 − λε)^2

λσ. (73)

Using the two-sided bound in Lemma 4.3 and (73), by a simple calculation, we deduce

λ 2 σ^2 (λ) ∧ x^2 ≥ (1 − λε)^2 (1 − 3 λε) (1 − λε + 6λ 2 ε^2 )^2

λ 2 σ^2 (74)

and

x − λσ(λ) ≤ λσ

 1 −^0.^5 λε (1 − λε)^2

(1 − λε)

(1 − 3 λε)+

1 − λε + 6λ 2 ε^2

From (72), (74), (75) and Lemma 4.3, we easily obtain

M

λσ(λ)

− M (x) ≤ 1. 11 R

λε

) (^) ε σ

By Lemma 4.3, it is easy to see that

  1. 88 σ^2 ε σ^3 (λ)(1 − λε)^4

≤ 26. 88 R

λε

) (^) ε σ

Combining (76) and (77), we get, for all 0 ≤ λ < 13 ε−^1 ,

M

λσ(λ)

  1. 88 θ 1 σ^2 ε σ^3 (λ)(1 − λε)^4

= M (x) + 27. 99 θ 2 R

λε

) (^) ε σ

where |θ 2 | ≤ 1. By (73), it follows that, for all 0 ≤ λ < 13 ε−^1 ,

λε ≤

1 − λε + 6λ

2 ε^2 (1 − 1. 5 λε)(1 − λε)

x

ε σ ≤ 4 x

ε σ

Implementing (78) into (71) and using (79), we obtain equality (22) of Theorem 2.4. Notice that R < ∞ restricts 0 ≤ 4 x (^) σε < 13 , which implies that 0 ≤ x < (^121) σ^ ε.