Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Confidence Intervals for a Proportion: Lecture Notes, Lecture notes of Statistics

Lecture notes on calculating confidence intervals for a proportion using R. It covers the formulas for 95%, 90%, and 85% confidence intervals, as well as ways to write and interpret the intervals. The document also includes examples and critical values for various confidence levels.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

explain
explain 🇺🇸

4

(2)

230 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
150 Chapter 4. Statistics (LECTURE NOTES 8)
4.5 Confidence Intervals for a Proportion
Let Zbe N(0,1) and pbe a number between 0 and 1; critical z-value zpis
P(Z > zp)=1Φ(zp) = p.
Let 0 < α < 1 and xbe number of successes in nobserved trials of a Bernoulli
experiment with unknown probability of success p. For ˆp=x
n, the 100(1 α)%
confidence interval for proportion pis
ˆp±zα
2rˆp(1 ˆp)
n="ˆpzα
2rˆp(1 ˆp)
n,ˆp+zα
2rˆp(1 ˆp)
n#,
where
E=zα
2rˆp(1 ˆp)
n,and rˆp(1 ˆp)
n
are the margin of error and standard deviation of the proportion respectively and
αis the level of significance. We assume a large random sample is chosen, both
np 5 and np(1 p)5 and the conditions of a binomial distribution is satisfied.
Also, one-sided confidence interval estimates for pinclude lower and upper bound
respectively:
"ˆpzαrˆp(1 ˆp)
n,1#,"0,ˆp+zαrˆp(1 ˆp)
n#.
Exercise 4.5 (Confidence Intervals for a Proportion)
1. Confidence interval (CI) for proportion, p, of purchase slips made with Visa.
It is found 54 of 180 (or ˆp=54
180 = 0.3) randomly selected from all credit card
purchase slips are made with Visa where conditions of binomial distribution are
satisfied. Calculate a 95% confidence interval (CI) of proportion pof purchase
slips made with Visa.
(a) Point estimate.
Point estimate of population (actual, true) proportion of all credit card
purchase slips made with Visa, p, is
ˆp= (i) 0.3(ii) 54 (iii) 180.
Statistic ˆp= 0.3 probably does not exactly equal unknown parameter p.
(b) Check assumptions.
Since random sample chosen,
conditions of binomial distribution are satisfied,
and np(1 p)nˆp(1 ˆp) = 180(0.3)(0.7) = 37.85,
and np nˆp= 180(0.3) = 54 5,
assumptions (i) have (ii) have not been satisfied
and so it is appropriate ˆp±zα
2qˆp(1ˆp)
nestimate parameter p.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Confidence Intervals for a Proportion: Lecture Notes and more Lecture notes Statistics in PDF only on Docsity!

150 Chapter 4. Statistics (LECTURE NOTES 8)

4.5 Confidence Intervals for a Proportion

Let Z be N (0, 1) and p be a number between 0 and 1; critical z-value zp is

P (Z > zp) = 1 − Φ(zp) = p.

Let 0 < α < 1 and x be number of successes in n observed trials of a Bernoulli experiment with unknown probability of success p. For ˆp = xn , the 100(1 − α)% confidence interval for proportion p is

pˆ ± zα 2

pˆ(1 − pˆ) n

[

p ˆ − zα 2

pˆ(1 − pˆ) n

, pˆ + zα 2

pˆ(1 − pˆ) n

]

where

E = zα 2

pˆ(1 − pˆ) n

, and

pˆ(1 − pˆ) n are the margin of error and standard deviation of the proportion respectively and α is the level of significance. We assume a large random sample is chosen, both np ≥ 5 and np(1 − p) ≥ 5 and the conditions of a binomial distribution is satisfied. Also, one-sided confidence interval estimates for p include lower and upper bound respectively: (^) [

p ˆ − zα

pˆ(1 − pˆ) n

]

[

0 , pˆ + zα

pˆ(1 − pˆ) n

]

Exercise 4.5 (Confidence Intervals for a Proportion)

  1. Confidence interval (CI) for proportion, p, of purchase slips made with Visa. It is found 54 of 180 (or ˆp = 18054 = 0.3) randomly selected from all credit card purchase slips are made with Visa where conditions of binomial distribution are satisfied. Calculate a 95% confidence interval (CI) of proportion p of purchase slips made with Visa.

(a) Point estimate. Point estimate of population (actual, true) proportion of all credit card purchase slips made with Visa, p, is pˆ = (i) 0. 3 (ii) 54 (iii) 180. Statistic ˆp = 0.3 probably does not exactly equal unknown parameter p. (b) Check assumptions. Since random sample chosen, conditions of binomial distribution are satisfied, and np(1 − p) ≈ npˆ(1 − pˆ) = 180(0.3)(0.7) = 37. 8 ≥ 5, and np ≈ npˆ = 180(0.3) = 54 ≥ 5, assumptions (i) have (ii) have not been satisfied and so it is appropriate ˆp ± zα 2

pˆ(1−pˆ) n estimate parameter^ p.

Section 5. Confidence Intervals for a Proportion (LECTURE NOTES 8) 151

(c) 95% Confidence Interval (CI) using R. The 95% CI for proportion of all credit cards made with Visa, p, is (i) (0. 251 , 0 .349) (ii) (0. 273 , 0 .367) (iii) (0. 233 , 0 .367). prop1.interval <- function(x,n,conf.level) # function of 1-proportion CI for p { p <- x/n z.crit <- -1qnorm((1-conf.level)/2) margin.error <- z.critsqrt(p*(1-p)/n) ci.lower <- p - margin.error ci.upper <- p + margin.error dat <- c(p, z.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } prop1.interval(54,180,0.95) # 1-proportion 95% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.95996398 0.06694551 0.23305449 0. where this interval includes not only smallest possible proportion of 0. and largest possible proportion of 0.367, but also other proportions in between these two extremes such as point estimate, ˆp = 0.3. Length of this CI is L ≈ 0. 367 − 0 .233 = 0.134. So, 95% confident population parameter p in (0.233, 0.367). (d) 90% CI using R. The 90% CI for proportion of all credit cards made with Visa, p, is (i) (0. 251 , 0 .349) (ii) (0. 244 , 0 .356) (iii) (0. 233 , 0 .367). Length of this CI is L ≈ 0. 356 − 0 .244 = 0.112. prop1.interval(54,180,0.90) # 1-proportion 90% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.64485363 0.05618245 0.24381755 0. (e) 85% CI using R. The 85% CI for proportion of all credit cards made with Visa, p, is (i) (0. 251 , 0 .349) (ii) (0. 273 , 0 .367) (iii) (0. 233 , 0 .367). Length of this CI is L ≈ 0. 349 − 0 .251 = 0.098. prop1.interval(54,180,0.85) # 1-proportion 85% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.43953147 0.04916936 0.25083064 0.

(f) Comparing CI lengths. Length of 95% CI for p, L = 0.134, is (i) longer than (ii) same length as (iii) shorter than length of 90% CI for p, L = 0.112, which is (i) longer than (ii) same length as (iii) shorter than length of 85% CI for p, L = 0.098. Increasing confidence increases CI length. (g) Margin of error. Half of length, L, is margin of error, E = L 2. Consequently, for 95% CI for p,

Section 5. Confidence Intervals for a Proportion (LECTURE NOTES 8) 153

f(z)

z

(a) z critical value

95% in middle of normal

f(z)

z

90% in middle of normal 97.5% to left 2.5% to right 95% to left

0.025 0.

2.5th percentille -z critical value

97.5th percentile z critical value

5% to right

95th percentile z critical value (b) z (^) 0.05critical value

5th percentile -z critical value

Figure 4.5: Critical values

Critical value for 90% = (1 − α) · 100% = (1 − 0 .10) · 100% CI is zα 2 = z^0. 210 = z 0. 05 = (i) 1. 96 (ii) 1. 645 (iii) 1. 44. qnorm(0.95) # critical value z_0.1/

qnorm(0.95) # critical value z_0.1/ [1] 1. Critical value for 85% = (1 − α) · 100% = (1 − 0 .15) · 100% CI is zα 2 = z 0. 215 = z 0. 075 = (i) 1. 96 (ii) 1. 645 (iii) 1. 44. qnorm(0.925) # critical value z_0.15/ qnorm(0.925) # critical value z_0.15/ [1] 1.

(k) CI using formula. A 95% CI for proportion of Visa credit card purchase slips, p, is ˆp ± zα 2

ˆp(1−pˆ) n =

i. 0. 3 ± 1. 96 ×

0 .3(1− 0 .3) 180 ii. 0. 3 ± 1. 645 ×

0 .3(1− 0 .3) 180 iii. 0. 3 ± 1. 44 ×

0 .3(1− 0 .3) 180 and a 90% CI for proportion of Visa credit card purchase slips, p, is

i. 0. 3 ± 1. 96 ×

0 .3(1− 0 .3) 180 ii. 0. 3 ± 1. 645 ×

0 .3(1− 0 .3) 180 iii. 0. 3 ± 1. 44 ×

0 .3(1− 0 .3) 180 and an 85% CI for proportion of Visa credit card purchase slips, p, is

i. 0. 3 ± 1. 96 ×

0 .3(1− 0 .3) 180 ii. 0. 3 ± 1. 645 ×

0 .3(1− 0 .3) 180

154 Chapter 4. Statistics (LECTURE NOTES 8)

iii. 0. 3 ± 1. 44 ×

0 .3(1− 0 .3) 180 (l) Population, Sample, Statistic and Parameter. Match columns.

terms credit card example (a) population (a) Visa or not, all purchase slips (b) sample (b) proportion of all slips made with Visa, p (c) statistic (c) Visa or not, 180 purchase slips (d) parameter (d) proportion of 180 slips made with Visa, ˆp

terms (a) (b) (c) (d) credit card example

  1. 95% CI, proportion of student heights over 6 feet tall. 37 of 102 students, chosen at random from PNW, over 6 feet tall.

(a) Point estimate Point estimate of proportion, p, of student heights over 6 feet tall is pˆ = 10237 ≈ (i) 0. 363 (ii) 0. 378 (iii) 0. 391. (b) Check assumptions. Since np ≈ npˆ = 102

102

and np(1 − p) ≈ npˆ(1 − pˆ) = 102

102

assumptions (i) have (ii) have not been satisfied and so it is appropriate ˆp ± zα 2

pˆ(1−pˆ) n estimate parameter^ p. (c) Using R. The 95% CI for p is (i) (0. 269 , 0 .456) (ii) (0. 273 , 0 .367) (iii) (0. 233 , 0 .367). prop1.interval(37,102,0.95) # 1-proportion 95% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.3627451 1.9599640 0.0933051 0.2694400 0. (d) Using formula: critical value using R. Critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI for p is zα 2 = z 0. 05 2 = z 0. 025 = (i) 1. 28 (ii) 1. 96 (iii) 2. 58. qnorm(0.975) # critical value z_0.05/2 for 95% CI

qnorm(0.975) # critical value z_0.05/ [1] 1.

(e) Using formula: critical value using Table C.1. Critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI for p is zα 2 = z 0. 205 = z 0. 025 = (i) 1. 28 (ii) 1. 96 (iii) 2. 58. (f) Using formula. Since ˆp = 10237 and n = 102, the 95% CI for p is pˆ ± zα 2

pˆ(1−pˆ) n =

156 Chapter 4. Statistics (LECTURE NOTES 8)

μ is called a z-interval:

¯x ± zα 2

σ √ n

The (1 − α) · 100% confidence interval for μ with unknown σ is called a t-interval:

x¯ ± tα 2

s √ n

where T = X¯−μ √^ Sn^ has a Student-t distribution and where

E = tα 2

s √ n

and

s √ n

are the margin of error and standard error of the mean respectively and α is the level of significance. We assume a large random sample, where either the underlying distribution is normal with no outliers or if the sample size large (n > 30). Also, one- sided confidence interval estimates for μ include lower and upper bound respectively:

( x ¯ − tα

s √ n

−∞, ¯x + tα

s √ n

Exercise 4.6 (Confidence Intervals for a Mean)

  1. Estimates for population average weight of PNW students. Average weight of simple random sample of 11 PNW students is ¯x = 167 pounds with sample SD s = 20.1 pounds. Weights normally distributed, no outliers.

(a) Point estimate. Point estimate of population weight of all students, μ, is x¯ = (i) 11 (ii) 20. 1 (iii) 167. Also notice σ is unknown and estimated by s = 20.1. (b) 95% CI i. Using R. The 95% CI for μ is (i) (143. 5 , 182 .5) (ii) (151. 5 , 180 .5) (iii) (153. 5 , 180 .5). mean1.t.interval <- function(m,s,n,conf.level) { t.crit <- -1qt((1-conf.level)/2,n-1) margin.error <- t.crits/sqrt(n) ci.lower <- m - margin.error ci.upper <- m + margin.error dat <- c(mean, t.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } mean1.t.interval(167,20.1,11,0.95) # m: mean, s: SD, n: sample size, 95% t-interval

Section 6. Confidence Intervals for a Mean (LECTURE NOTES 8) 157

Mean Critical Value Margin of Error CI lower CI upper 167.000000 2.228139 13.503364 153.496636 180. So, 95% confident population parameter μ in (153.5, 180.5). ii. Using formula: degrees of freedom (df ). df = n − 1 = 11 − 1 = (i) 10 (ii) 11. iii. Using formula: critical value using R. Critical value 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, 10 df tα 2 = t^0. 205 = t 0. 025 ≈ (i) 1. 28 (ii) 2. 23 (iii) 2. 58. qt(0.975,10) # critical value t, 10 df, for 95% CI

qt(0.975,10) # critical value t for 95% CI [1] 2. iv. Using formula: critical value using Table C.3. Critical value 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, 10 df tα 2 = t 0. 205 = t 0. 025 ≈ (i) 1. 28 (ii) 2. 23 (iii) 2. 58. v. Using formula. The 95% CI for μ is x¯ ± tα 2 √^ sn = (i) 20. 1 ± 167 × √^2.^2311 (ii) 2. 23 ± 167 × 20 √ 11.^1 (iii) 167 ± 2. 23 × 20 √ 11.^1 which equals (i) 20. 1 ± 12. 51 (ii) 2. 23 ± 13. 51 (iii) 167 ± 13. 51 ≈ (153. 5 , 180 .5). (c) 99% CI i. Using R. The 99% CI for μ is (i) (147. 8 , 186 .2) (ii) (151. 5 , 180 .5) (iii) (153. 5 , 180 .5). mean1.t.interval(167,20.1,11,0.99) # m: mean, s: SD, n: sample size, 99% t-interval Mean Critical Value Margin of Error CI lower CI upper 167.000000 3.169273 19.206990 147.793010 186. So, 99% confident population parameter μ in (147.8, 186.2). ii. Using formula: degrees of freedom. df = n − 1 = 11 − 1 = (i) 10 (ii) 11. iii. Using formula: critical value. Critical value 99% = (1 − α) · 100% = (1 − 0 .01) · 100% CI, 10 df tα 2 = t 0. 201 = t 0. 005 ≈ (i) 1. 28 (ii) 2. 23 (iii) 3. 17. qt(0.995,10) # critical value t, 10 df, for 99% CI [1] 3. iv. Using formula. The 99% CI for μ is x¯ ± tα 2 √^ sn = (i) 20. 1 ± 20. 1 × 3 √.^1711 (ii) 3. 17 ± 167 × 20 √ 11.^1 (iii) 167 ± 3. 17 × 20 √ 11.^1. which equals (i) 20. 1 ± 19. 21 (ii) 3. 17 ± 19. 21 (iii) 167 ± 19. 21 ≈ (147. 8 , 186 .2)

Section 6. Confidence Intervals for a Mean (LECTURE NOTES 8) 159

iv. Using formula. The 95% CI for μ is x¯ ± tα 2 √^ sn = (i) 21. 6 ± 2. 15 × 2 √.^9715 (ii) 21. 6 ± 2. 15 × 3 √.^9715 (iii) 21. 6 ± 3. 15 × 2 √.^9715. (c) 99% CI i. Using R. The 99% CI for μ is (i) (19. 23 , 23 .45) (ii) (19. 96 , 23 .24) (iii) (19. 32 , 23 .88). mean1.t.interval(m,s,n,0.99) # m: mean, s: SD, n: sample size, 99% t-interval Mean Critical Value Margin of Error CI lower CI upper 21.600000 2.976843 2.283786 19.316214 23. ii. Using formula: degrees of freedom (df ). The df, here, for 99% CI is (i) same as (ii) different from degrees of freedom calculated for 95% CI above because same sample size is used in both cases. iii. Using formula: critical value. Critical value 99% = (1 − α) · 100% = (1 − 0 .01) · 100% CI, 14 df tα 2 = t 0. 201 = t 0. 005 ≈ (i) 1. 76 (ii) 2. 98. qt(0.995,14) # critical value t, 14 df, for 99% CI [1] 2. iv. Using formula. Thus, the 99% CI for μ is x¯ ± tα 2 √^ sn = (i) 21. 6 ± 2. 15 × 2 √.^9715 (ii) 21. 6 ± 2. 15 × 3 √.^9715 (iii) 21. 6 ± 2. 98 × 2 √.^9715. which equals (i) 21. 6 ± 1. 29 (ii) 21. 6 ± 2. 29 (iii) 21. 6 ± 3. 29 ≈ (19. 32 , 23 .88). (d) Some comments i. (i) True (ii) False. Long 99% CI better than shorter 95% CI in the sense we are more confident 99% contains or “captures” unknown parameter μ. However, 95% CI better than longer 99% CI in the sense, if unknown parameter μ is 95% interval estimate, we are more certain of location of this unknown parameter. ii. Since sample size is small, we can (ii) cannot use central limit theo- rem. iii. Match columns. terms corn example (a) population (a) average length of 15 plants, X¯ (b) sample (b) average length of all plants, μ (c) statistic (c) lengths of all plants (d) parameter (d) observed lengths of 15 plants terms (a) (b) (c) (d) corn example

160 Chapter 4. Statistics (LECTURE NOTES 8)

  1. Population, sample, statistic and parameter: CI for average corn cob length. Simple random sample of 15 corn cobs is taken. Assume sample SD in length is s = 2.97 and, although we typically don’t know it, population (not sample) length is μ = 22 inches. Assume normality.

(a) Population μ = 22 length Population μ = 22 is a (i) statistic (ii) parameter. Population μ (i) changes (ii) remains same for every random sample. Population μ (usually) (i) known (ii) unknown to us, (although we are pretending for this question we do know it.) (b) Sample ¯x length Sample ¯x is a (i) statistic (ii) parameter. Sample ¯x (i) changes (ii) remains same for every random sample. Sample ¯x (usually)(i) known (ii) unknown to us: it may be ¯x = 21.6 for one sample, but ¯x = 29.8 for another sample. (c) A 95% CI for μ, if ¯x = 21.6, is x¯ ± tα 2 √^ sn = 21. 6 ± 1. 96 2 √.^9715 = (i) (19. 95 , 23 .24) (ii) (23. 45 , 27 .80) (iii) (28. 16 , 31 .44). mean1.t.interval(21.6,2.97,14,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Mean Critical Value Margin of Error CI lower CI upper 21.600000 2.160369 1.714827 19.885173 23. This 95% CI (i) contains (ii) does not contain μ = 22. (d) A 95% CI for μ, if ¯x = 29.8, is x¯ ± tα 2 √^ sn = 29. 8 ± 1. 96 2 √.^9715 = (i) (19. 60 , 23 .60) (ii) (23. 45 , 27 .80) (iii) (28. 16 , 31 .44). mean1.t.interval(29.8,2.97,14,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Mean Critical Value Margin of Error CI lower CI upper 29.800000 2.160369 1.714827 28.085173 31. This 95% CI (i) contains (ii) does not contain μ = 22. (e) If sample average length, ¯x, changes, corresponding 95% CI, x¯ ± tα 2 √^ sn , (i) changes (ii) remains the same. More than this,

i. all possible 95% CIs contain μ = 22. ii. none of all possible 95% CIs contain μ = 22. iii. ninety–nine percent of all possible 95% CIs contain μ = 22, and so one percent of all possible 95% CIs do not contain μ = 22. iv. ninety–five percent of all possible 95% CIs contain μ = 22, and so five percent of all possible 95% CIs do not contain μ = 22. This is demonstrated in figure below. (f) Choose true or false.

162 Chapter 4. Statistics (LECTURE NOTES 8)

(a) Using R. The 95% CI for σ^2 is (i) (0. 39 , 1 .22) (ii) (0. 41 , 1 .25) (iii) (0. 44 , 1 .30). var1.chi2.interval = function(v,n,conf.level) { df = n - 1 chilower = qchisq((1 - conf.level)/2, df) chiupper = qchisq((1 - conf.level)/2, df, lower.tail = FALSE) ci.lower <- df * v/chiupper ci.upper <- df * v/chilower margin.error <- (ci.upper - ci.lower)/ dat <- c(v, chilower, chiupper, margin.error, ci.lower, ci.upper) names(dat) <- c("Variance", "Lower Crit Val", "Upper Crit Val", "Margin of Error", "CI lower", "CI upper") return(dat) } var1.chi2.interval(0.7,28,0.95) # 95% CI for variance, n = 28 Variance Lower Crit Val Upper Crit Val Margin of Error CI lower CI upper 0.7000000 14.5733827 43.1945110 0.4296647 0.4375556 1. (b) Upper critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI is χ^2 α 2 = χ^20. 05 2

= χ^20. 025 = (i) 8. 7 (ii) 40. 1 (iii) 43. 2 qchisq(0.975, 27) # 95% upper critical chi-square value [1] 43. (c) Lower critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI is χ^21 − α 2 = χ^21 − 0. 05 2

= χ^20. 975 = (i) 14. 6 (ii) 40. 1 (iii) 43. 2 qchisq(0.025, 27) # 95% lower critical chi-square value [1] 14. (d) Using Table C.4, lower critical value for 95% CI is χ^21 − α 2 = χ^21 − 0. 05 2

= χ^20. 975 = (i) between 13.12 and 16. 79 (ii) 40. 1 (iii) 43. 2 (e) So, 95% CI for variance σ^2 is ( (n − 1)s^2 χ^2 α/ 2

(n − 1)s^2 χ^21 −α/ 2

(i) (0. 61 , 1 .65) (ii) (0. 59 , 1 .29) (iii) (0. 43 , 1 .29). (f) Since 95% CI (0.43, 1.29) does not include 0.40, this indicates variance in distance between door and jamb (i) is (ii) is not 0.4 mm^2. (g) Population, parameter, sample and statistic. Match columns.

terms jamb example (a) population (a) variance in jamb–door distance, of 28 cars, s^2 (b) sample (b) variance in jamb–door distance, of all cars, σ^2 (c) statistic (c) jamb–door distances, of all cars (d) parameter (d) jamb–door distances, of 28 cars

terms (a) (b) (c) (d) jamb example

Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 163

  1. Estimation for variance: machine parts. In a simple random sample of 18 machine parts, variance in lengths is s^2 = 12^2. Calculate 90% CI. Assume normality with no outliers.

(a) Using R. The 90% CI for σ^2 is (i) (88. 1 , 281 .3) (ii) (88. 7 , 282 .3) (iii) (88. 2 , 282 .3). var1.chi2.interval(12^2,18,0.90) # 90% CI for variance, n = 18 Variance Lower Crit Val Upper Crit Val Margin of Error CI lower CI upper 144.00000 8.67176 27.58711 96.77927 88.73709 282. (b) Upper critical value for 90% = (1 − α) · 100% = (1 − 0 .10) · 100% CI is χ^2 α 2 = χ^20. 10 2

= χ^20. 05 = (i) 8. 7 (ii) 27. 6 (iii) 43. 2 qchisq(0.95, 17) # 90% upper critical chi-square value [1] 27. (c) Lower critical value for 90% = (1 − α) · 100% = (1 − 0 .10) · 100% CI is χ^21 − α 2 = χ^21 − 0. 10 2 = χ^20. 95 = (i) 8. 7 (ii) 40. 1 (iii) 43. 2 qchisq(0.05, 17) # 90% lower critical chi-square value [1] 8.

(d) So, 90% CI for variance( σ^2 is (there may round-off error) (n−1)s^2 χ^2 U^ ,^

(n−1)s^2 χ^2 L

(18−1)12^2

  1. 6 ,^

(18−1)12^2

  1. 7

(i) (80. 5 , 101 .4) (ii) (100. 5 , 104 .2) (iii) (88. 7 , 281 .4). (e) Since 90% CI (88.7, 281.4) includes test statistic 13^2 = 169, this indicates variance in lengths (i) is (ii) is not σ^2 = 13^2 mm^2. (f) Also, 90% CI for standard deviation σ is (√ (n−1)s^2 χ^2 U^ ,

(n−1)s^2 χ^2 L

(18−1)12^2

  1. 6 ,

(18−1)12^2

  1. 7

(i) (9. 4 , 16 .8) (ii) (10. 5 , 14 .2) (iii) (88. 7 , 281 .4).

4.8 Confidence Intervals for Differences

Let x 1 and x 2 be number of successes in two independent samples of size n 1 and n 2 (with ˆp 1 = x n^11 and ˆp 2 = (^) nx^22 ) taken two populations with proportions p 1 and p 2. The (1 − α) · 100% 2-proportion z-interval for p 1 − p 2 is

pˆ 1 − pˆ 2 ± zα 2

p ˆ 1 (1 − pˆ 1 ) n 1

pˆ 2 (1 − pˆ 2 ) n 2

where we assume the samples random and there are at least 5 successes and 5 failures in each sample.

Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 165

military (1) civilian (2) male doctors 358 6786 total doctors 407 7363

From above, ˆp 1 = 358407 , ˆp 2 = 67867363 ; also critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, of zα 2 = z 0. 05 2 = z 0. 025 ≈ (i) 1. 65 (ii) 1. 96 (iii) 2. 09 ,

qnorm(0.975) # critical value z, for 95% CI

[1] 1.

and so 95% CI for p 1 − p 2 is

pˆ 1 − pˆ 2 ± zα 2

p ˆ 1 (1 − pˆ 1 ) n 1

pˆ 2 (1 − pˆ 2 ) n 2

358 407

6786 7363

(i) (− 0. 054 , − 0 .008) (ii) (− 0. 064 , − 0 .009) (iii) (− 0. 074 , − 0 .010) prop2.interval <- function(x, n, conf.level) { x1 <- x[1]; x2 <- x[2]; n1 <- n[1]; n2 <- n[2] p.hat1 <- x1/n1; p.hat2 <- x2/n z.crit <- -1qnorm((1-conf.level)/2) margin.error <- z.critsqrt(p.hat1(1-p.hat1)/n1+p.hat2(1-p.hat2)/n2) ci.lower <- p.hat1-p.hat2 - margin.error ci.upper <- p.hat1-p.hat2 + margin.error dat <- c(p.hat1, p.hat2, z.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("p.hat1", "p.hat2", "z crit", "Margin of Error", "CI lower", "CI upper") return(dat) } prop2.interval(c(358,6786), c(407,7363), 0.95) # approx 2-proportion z-test for p, two-sided

p.hat1 p.hat2 z crit Margin of Error CI lower CI upper 0.879606880 0.921635203 1.959963985 0.032205624 -0.074233948 -0.

Since confidence interval does not include (is, in fact, smaller than) zero, this indicates population proportion of male military doctors (i) is less than (ii) equals (iii) is greater than (iv) is different from the population proportion of male civilian doctors.

  1. CI for μ 1 − μ 2 , independent samples, unknown σ^21 = σ^22 : progesterone. A study is conducted to determine cellular response to progesterone in females. Blood cells from four females are injected with progesterone; blood cells from four different females are, for comparison purposes, left untreated. Calculate 95% CI. Assume normality with no outliers.

166 Chapter 4. Statistics (LECTURE NOTES 8)

female progesterone (1) female control (2) 1 5.85 5 5. 2 2.28 6 1. 3 1.51 7 1. 4 2.12 8 1.

progesterone <- c(5.85, 2.28, 1.51, 2.12) control <- c(5.23, 1.21, 1.40, 1.38)

From R, ¯x 1 ≈ 2 .94, s 1 ≈ 1 .97, ¯x 2 ≈ 2 .305, s 2 ≈ 1 .95, m1 <- mean(progesterone); m1; s1 <- sqrt(var(progesterone)); s m2 <- mean(control); m2; s2 <- sqrt(var(control)); s

mean(progesterone); sqrt(var(progesterone)) [1] 2. [1] 1. mean(control); sqrt(var(control)) [1] 2. [1] 1.

so pooled standard deviation is

sp =

(n 1 − 1)s^21 + (n 2 − 1)s^22 n 1 + n 2 − 2

(i) 1. 95 (ii) 1. 96 (iii) 1. 97 (which not surprising since s 1 ≈ 1 .97, s 2 ≈ 1 .95) n1 <- length(progesterone); n2 <- length(control) s12 <- var(progesterone); s22 <- var(control) sp <- sqrt(((n1-1)s12 + (n2-1)s22)/(n1+n2-2)); sp

[1] 1.

and critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, with degrees of freedom = n 1 + n 2 − 2 = 4 + 4 − 2 = (i) 4 (ii) 6 (ii) 8 , so tα 2 = t^0. 205 = t 0. 025 ≈ (i) 2. 31 (ii) 2. 45 (iii) 3. 09 ,

qt(0.975,6) # critical t value, 95% CI, using r df

[1] 2.

and so 95% CI for μ 1 − μ 2 is

(¯x 1 − x¯ 2 ) ± sp · tα 2

n 1

n 2

(i) (− 2. 52 , 6 .49) (ii) (− 2. 62 , 6 .39) (iii) (− 2. 76 , 4 .03)

168 Chapter 4. Statistics (LECTURE NOTES 8)

and critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, with degrees of freedom =

r =

s^21 n 1 +^

s^22 n 2

1 n 1 − 1

s^21 n 1

  • (^) n 21 − 1

s^22 n 2

  1. 972 4 +^

  2. 952 4

1 4 − 1

4

+ 4 −^11

4

(i) 4 (ii) 6 (ii) 8 (same as when σ^21 = σ 22 ) df = 5. so tα 2 = t 0. 205 = t 0. 025 ≈ (i) 2. 31 (ii) 2. 45 (iii) 3. 09 , qt(0.975,6) # critical t value, 95% CI, n1 + n2 - 2 = 6 df

[1] 2.

and so 95% CI for μ 1 − μ 2 is

(¯x 1 − ¯x 2 ) ± tα 2

s^21 n 1

s^22 n 2

(i) (− 2. 52 , 6 .49) (ii) (− 2. 62 , 6 .39) (iii) (− 2. 76 , 4 .03) mean2.t.interval(m1,m2,s1,s2,n1,n2, 0.95,"diff.var")

Mean Difference df Critical Value Margin of Error CI lower CI upper 0.635000 5.999585 2.446953 3.391355 -2.756355 4.

Since confidence interval does include zero, this indicates progesterone population mean cellular response (i) is less than (ii) equals (ii) is greater than (ii) is different from control population mean cellular response.

  1. Inference for difference in dependent means, μd: milk yield. A study is conducted to determine effect of “gentech” animal feed on milk yield of 9 cows. Cow 1 is fed a control feed for three months and then gentech feed for next three months for comparison purposes. Other cows are treated in same way. Calculate 95% CI of mean paired differences in milk yield. Fill in blanks.

cow gentech (1) control (2) differences, di 1 62 54 2 45 43 3 53 55 4 35 39 5 71 65 6 64 62 7 63 56 8 57 50 9 43 52

Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 169

gentech <- c(62, 45, 53, 35, 71, 64, 63, 57, 43) control <- c(54, 43, 55, 39, 65, 62, 56, 50, 52) diff <- gentech - control; diff

[1] 8 2 -2 -4 6 2 7 7 -

d^ ¯ ≈ (i) 1. 41 (ii) 1. 89 (iii) 2. 52 , sd ≈ (i) 5. 47 (ii) 5. 86 (iii) 6. 52 , mean(diff); sqrt(var(diff))

[1] 1. [1] 5.

with n − 1 = 9 − 1 = (i) 6 (ii) 7 (ii) 8 degrees of freedom, and critical value 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, so tα 2 = t^0. 205 = t 0. 025 ≈ (i) 2. 31 (ii) 2. 53 (iii) 3. 09 ,

qt(0.975,8) # critical t value, 95% CI, nd - 1 = 9 - 1 = 8 df

[1] 2.

and so 95% CI for μd is

d¯ ± tα 2

sd √ n

= 1. 89 ± 2. 31 ×

(i) (− 2. 52 , 6 .49) (ii) (− 2. 62 , 6 .39) (iii) (− 2. 72 , 6 .29) mean1.t.interval <- function(m,s,n,conf.level) { t.crit <- -1qt((1-conf.level)/2,n-1) margin.error <- t.crits/sqrt(n) ci.lower <- m - margin.error ci.upper <- m + margin.error dat <- c(mean, t.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } mean1.t.interval(1.889,5.8618,9,0.95) # m: mean, s: SD, n: sample size, 95% t-interval

Mean Critical Value Margin of Error CI lower CI upper 1.889000 2.306004 4.505778 -2.616778 6.

Since confidence interval does include zero, this indicates gentech population mean milk yield (i) is less than (ii) equals (iii) is greater than (iv) is different from control population mean milk yield.