















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Lecture notes on calculating confidence intervals for a proportion using R. It covers the formulas for 95%, 90%, and 85% confidence intervals, as well as ways to write and interpret the intervals. The document also includes examples and critical values for various confidence levels.
Typology: Lecture notes
1 / 23
This page cannot be seen from the preview
Don't miss anything!
150 Chapter 4. Statistics (LECTURE NOTES 8)
Let Z be N (0, 1) and p be a number between 0 and 1; critical z-value zp is
P (Z > zp) = 1 − Φ(zp) = p.
Let 0 < α < 1 and x be number of successes in n observed trials of a Bernoulli experiment with unknown probability of success p. For ˆp = xn , the 100(1 − α)% confidence interval for proportion p is
pˆ ± zα 2
pˆ(1 − pˆ) n
p ˆ − zα 2
pˆ(1 − pˆ) n
, pˆ + zα 2
pˆ(1 − pˆ) n
where
E = zα 2
pˆ(1 − pˆ) n
, and
pˆ(1 − pˆ) n are the margin of error and standard deviation of the proportion respectively and α is the level of significance. We assume a large random sample is chosen, both np ≥ 5 and np(1 − p) ≥ 5 and the conditions of a binomial distribution is satisfied. Also, one-sided confidence interval estimates for p include lower and upper bound respectively: (^) [
p ˆ − zα
pˆ(1 − pˆ) n
0 , pˆ + zα
pˆ(1 − pˆ) n
Exercise 4.5 (Confidence Intervals for a Proportion)
(a) Point estimate. Point estimate of population (actual, true) proportion of all credit card purchase slips made with Visa, p, is pˆ = (i) 0. 3 (ii) 54 (iii) 180. Statistic ˆp = 0.3 probably does not exactly equal unknown parameter p. (b) Check assumptions. Since random sample chosen, conditions of binomial distribution are satisfied, and np(1 − p) ≈ npˆ(1 − pˆ) = 180(0.3)(0.7) = 37. 8 ≥ 5, and np ≈ npˆ = 180(0.3) = 54 ≥ 5, assumptions (i) have (ii) have not been satisfied and so it is appropriate ˆp ± zα 2
pˆ(1−pˆ) n estimate parameter^ p.
Section 5. Confidence Intervals for a Proportion (LECTURE NOTES 8) 151
(c) 95% Confidence Interval (CI) using R. The 95% CI for proportion of all credit cards made with Visa, p, is (i) (0. 251 , 0 .349) (ii) (0. 273 , 0 .367) (iii) (0. 233 , 0 .367). prop1.interval <- function(x,n,conf.level) # function of 1-proportion CI for p { p <- x/n z.crit <- -1qnorm((1-conf.level)/2) margin.error <- z.critsqrt(p*(1-p)/n) ci.lower <- p - margin.error ci.upper <- p + margin.error dat <- c(p, z.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } prop1.interval(54,180,0.95) # 1-proportion 95% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.95996398 0.06694551 0.23305449 0. where this interval includes not only smallest possible proportion of 0. and largest possible proportion of 0.367, but also other proportions in between these two extremes such as point estimate, ˆp = 0.3. Length of this CI is L ≈ 0. 367 − 0 .233 = 0.134. So, 95% confident population parameter p in (0.233, 0.367). (d) 90% CI using R. The 90% CI for proportion of all credit cards made with Visa, p, is (i) (0. 251 , 0 .349) (ii) (0. 244 , 0 .356) (iii) (0. 233 , 0 .367). Length of this CI is L ≈ 0. 356 − 0 .244 = 0.112. prop1.interval(54,180,0.90) # 1-proportion 90% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.64485363 0.05618245 0.24381755 0. (e) 85% CI using R. The 85% CI for proportion of all credit cards made with Visa, p, is (i) (0. 251 , 0 .349) (ii) (0. 273 , 0 .367) (iii) (0. 233 , 0 .367). Length of this CI is L ≈ 0. 349 − 0 .251 = 0.098. prop1.interval(54,180,0.85) # 1-proportion 85% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.43953147 0.04916936 0.25083064 0.
(f) Comparing CI lengths. Length of 95% CI for p, L = 0.134, is (i) longer than (ii) same length as (iii) shorter than length of 90% CI for p, L = 0.112, which is (i) longer than (ii) same length as (iii) shorter than length of 85% CI for p, L = 0.098. Increasing confidence increases CI length. (g) Margin of error. Half of length, L, is margin of error, E = L 2. Consequently, for 95% CI for p,
Section 5. Confidence Intervals for a Proportion (LECTURE NOTES 8) 153
f(z)
z
(a) z critical value
95% in middle of normal
f(z)
z
90% in middle of normal 97.5% to left 2.5% to right 95% to left
0.025 0.
2.5th percentille -z critical value
97.5th percentile z critical value
5% to right
95th percentile z critical value (b) z (^) 0.05critical value
5th percentile -z critical value
Figure 4.5: Critical values
Critical value for 90% = (1 − α) · 100% = (1 − 0 .10) · 100% CI is zα 2 = z^0. 210 = z 0. 05 = (i) 1. 96 (ii) 1. 645 (iii) 1. 44. qnorm(0.95) # critical value z_0.1/
qnorm(0.95) # critical value z_0.1/ [1] 1. Critical value for 85% = (1 − α) · 100% = (1 − 0 .15) · 100% CI is zα 2 = z 0. 215 = z 0. 075 = (i) 1. 96 (ii) 1. 645 (iii) 1. 44. qnorm(0.925) # critical value z_0.15/ qnorm(0.925) # critical value z_0.15/ [1] 1.
(k) CI using formula. A 95% CI for proportion of Visa credit card purchase slips, p, is ˆp ± zα 2
ˆp(1−pˆ) n =
i. 0. 3 ± 1. 96 ×
0 .3(1− 0 .3) 180 ii. 0. 3 ± 1. 645 ×
0 .3(1− 0 .3) 180 iii. 0. 3 ± 1. 44 ×
0 .3(1− 0 .3) 180 and a 90% CI for proportion of Visa credit card purchase slips, p, is
i. 0. 3 ± 1. 96 ×
0 .3(1− 0 .3) 180 ii. 0. 3 ± 1. 645 ×
0 .3(1− 0 .3) 180 iii. 0. 3 ± 1. 44 ×
0 .3(1− 0 .3) 180 and an 85% CI for proportion of Visa credit card purchase slips, p, is
i. 0. 3 ± 1. 96 ×
0 .3(1− 0 .3) 180 ii. 0. 3 ± 1. 645 ×
0 .3(1− 0 .3) 180
154 Chapter 4. Statistics (LECTURE NOTES 8)
iii. 0. 3 ± 1. 44 ×
0 .3(1− 0 .3) 180 (l) Population, Sample, Statistic and Parameter. Match columns.
terms credit card example (a) population (a) Visa or not, all purchase slips (b) sample (b) proportion of all slips made with Visa, p (c) statistic (c) Visa or not, 180 purchase slips (d) parameter (d) proportion of 180 slips made with Visa, ˆp
terms (a) (b) (c) (d) credit card example
(a) Point estimate Point estimate of proportion, p, of student heights over 6 feet tall is pˆ = 10237 ≈ (i) 0. 363 (ii) 0. 378 (iii) 0. 391. (b) Check assumptions. Since np ≈ npˆ = 102
102
and np(1 − p) ≈ npˆ(1 − pˆ) = 102
102
assumptions (i) have (ii) have not been satisfied and so it is appropriate ˆp ± zα 2
pˆ(1−pˆ) n estimate parameter^ p. (c) Using R. The 95% CI for p is (i) (0. 269 , 0 .456) (ii) (0. 273 , 0 .367) (iii) (0. 233 , 0 .367). prop1.interval(37,102,0.95) # 1-proportion 95% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.3627451 1.9599640 0.0933051 0.2694400 0. (d) Using formula: critical value using R. Critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI for p is zα 2 = z 0. 05 2 = z 0. 025 = (i) 1. 28 (ii) 1. 96 (iii) 2. 58. qnorm(0.975) # critical value z_0.05/2 for 95% CI
qnorm(0.975) # critical value z_0.05/ [1] 1.
(e) Using formula: critical value using Table C.1. Critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI for p is zα 2 = z 0. 205 = z 0. 025 = (i) 1. 28 (ii) 1. 96 (iii) 2. 58. (f) Using formula. Since ˆp = 10237 and n = 102, the 95% CI for p is pˆ ± zα 2
pˆ(1−pˆ) n =
156 Chapter 4. Statistics (LECTURE NOTES 8)
μ is called a z-interval:
¯x ± zα 2
σ √ n
The (1 − α) · 100% confidence interval for μ with unknown σ is called a t-interval:
x¯ ± tα 2
s √ n
where T = X¯−μ √^ Sn^ has a Student-t distribution and where
E = tα 2
s √ n
and
s √ n
are the margin of error and standard error of the mean respectively and α is the level of significance. We assume a large random sample, where either the underlying distribution is normal with no outliers or if the sample size large (n > 30). Also, one- sided confidence interval estimates for μ include lower and upper bound respectively:
( x ¯ − tα
s √ n
−∞, ¯x + tα
s √ n
Exercise 4.6 (Confidence Intervals for a Mean)
(a) Point estimate. Point estimate of population weight of all students, μ, is x¯ = (i) 11 (ii) 20. 1 (iii) 167. Also notice σ is unknown and estimated by s = 20.1. (b) 95% CI i. Using R. The 95% CI for μ is (i) (143. 5 , 182 .5) (ii) (151. 5 , 180 .5) (iii) (153. 5 , 180 .5). mean1.t.interval <- function(m,s,n,conf.level) { t.crit <- -1qt((1-conf.level)/2,n-1) margin.error <- t.crits/sqrt(n) ci.lower <- m - margin.error ci.upper <- m + margin.error dat <- c(mean, t.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } mean1.t.interval(167,20.1,11,0.95) # m: mean, s: SD, n: sample size, 95% t-interval
Section 6. Confidence Intervals for a Mean (LECTURE NOTES 8) 157
Mean Critical Value Margin of Error CI lower CI upper 167.000000 2.228139 13.503364 153.496636 180. So, 95% confident population parameter μ in (153.5, 180.5). ii. Using formula: degrees of freedom (df ). df = n − 1 = 11 − 1 = (i) 10 (ii) 11. iii. Using formula: critical value using R. Critical value 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, 10 df tα 2 = t^0. 205 = t 0. 025 ≈ (i) 1. 28 (ii) 2. 23 (iii) 2. 58. qt(0.975,10) # critical value t, 10 df, for 95% CI
qt(0.975,10) # critical value t for 95% CI [1] 2. iv. Using formula: critical value using Table C.3. Critical value 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, 10 df tα 2 = t 0. 205 = t 0. 025 ≈ (i) 1. 28 (ii) 2. 23 (iii) 2. 58. v. Using formula. The 95% CI for μ is x¯ ± tα 2 √^ sn = (i) 20. 1 ± 167 × √^2.^2311 (ii) 2. 23 ± 167 × 20 √ 11.^1 (iii) 167 ± 2. 23 × 20 √ 11.^1 which equals (i) 20. 1 ± 12. 51 (ii) 2. 23 ± 13. 51 (iii) 167 ± 13. 51 ≈ (153. 5 , 180 .5). (c) 99% CI i. Using R. The 99% CI for μ is (i) (147. 8 , 186 .2) (ii) (151. 5 , 180 .5) (iii) (153. 5 , 180 .5). mean1.t.interval(167,20.1,11,0.99) # m: mean, s: SD, n: sample size, 99% t-interval Mean Critical Value Margin of Error CI lower CI upper 167.000000 3.169273 19.206990 147.793010 186. So, 99% confident population parameter μ in (147.8, 186.2). ii. Using formula: degrees of freedom. df = n − 1 = 11 − 1 = (i) 10 (ii) 11. iii. Using formula: critical value. Critical value 99% = (1 − α) · 100% = (1 − 0 .01) · 100% CI, 10 df tα 2 = t 0. 201 = t 0. 005 ≈ (i) 1. 28 (ii) 2. 23 (iii) 3. 17. qt(0.995,10) # critical value t, 10 df, for 99% CI [1] 3. iv. Using formula. The 99% CI for μ is x¯ ± tα 2 √^ sn = (i) 20. 1 ± 20. 1 × 3 √.^1711 (ii) 3. 17 ± 167 × 20 √ 11.^1 (iii) 167 ± 3. 17 × 20 √ 11.^1. which equals (i) 20. 1 ± 19. 21 (ii) 3. 17 ± 19. 21 (iii) 167 ± 19. 21 ≈ (147. 8 , 186 .2)
Section 6. Confidence Intervals for a Mean (LECTURE NOTES 8) 159
iv. Using formula. The 95% CI for μ is x¯ ± tα 2 √^ sn = (i) 21. 6 ± 2. 15 × 2 √.^9715 (ii) 21. 6 ± 2. 15 × 3 √.^9715 (iii) 21. 6 ± 3. 15 × 2 √.^9715. (c) 99% CI i. Using R. The 99% CI for μ is (i) (19. 23 , 23 .45) (ii) (19. 96 , 23 .24) (iii) (19. 32 , 23 .88). mean1.t.interval(m,s,n,0.99) # m: mean, s: SD, n: sample size, 99% t-interval Mean Critical Value Margin of Error CI lower CI upper 21.600000 2.976843 2.283786 19.316214 23. ii. Using formula: degrees of freedom (df ). The df, here, for 99% CI is (i) same as (ii) different from degrees of freedom calculated for 95% CI above because same sample size is used in both cases. iii. Using formula: critical value. Critical value 99% = (1 − α) · 100% = (1 − 0 .01) · 100% CI, 14 df tα 2 = t 0. 201 = t 0. 005 ≈ (i) 1. 76 (ii) 2. 98. qt(0.995,14) # critical value t, 14 df, for 99% CI [1] 2. iv. Using formula. Thus, the 99% CI for μ is x¯ ± tα 2 √^ sn = (i) 21. 6 ± 2. 15 × 2 √.^9715 (ii) 21. 6 ± 2. 15 × 3 √.^9715 (iii) 21. 6 ± 2. 98 × 2 √.^9715. which equals (i) 21. 6 ± 1. 29 (ii) 21. 6 ± 2. 29 (iii) 21. 6 ± 3. 29 ≈ (19. 32 , 23 .88). (d) Some comments i. (i) True (ii) False. Long 99% CI better than shorter 95% CI in the sense we are more confident 99% contains or “captures” unknown parameter μ. However, 95% CI better than longer 99% CI in the sense, if unknown parameter μ is 95% interval estimate, we are more certain of location of this unknown parameter. ii. Since sample size is small, we can (ii) cannot use central limit theo- rem. iii. Match columns. terms corn example (a) population (a) average length of 15 plants, X¯ (b) sample (b) average length of all plants, μ (c) statistic (c) lengths of all plants (d) parameter (d) observed lengths of 15 plants terms (a) (b) (c) (d) corn example
160 Chapter 4. Statistics (LECTURE NOTES 8)
(a) Population μ = 22 length Population μ = 22 is a (i) statistic (ii) parameter. Population μ (i) changes (ii) remains same for every random sample. Population μ (usually) (i) known (ii) unknown to us, (although we are pretending for this question we do know it.) (b) Sample ¯x length Sample ¯x is a (i) statistic (ii) parameter. Sample ¯x (i) changes (ii) remains same for every random sample. Sample ¯x (usually)(i) known (ii) unknown to us: it may be ¯x = 21.6 for one sample, but ¯x = 29.8 for another sample. (c) A 95% CI for μ, if ¯x = 21.6, is x¯ ± tα 2 √^ sn = 21. 6 ± 1. 96 2 √.^9715 = (i) (19. 95 , 23 .24) (ii) (23. 45 , 27 .80) (iii) (28. 16 , 31 .44). mean1.t.interval(21.6,2.97,14,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Mean Critical Value Margin of Error CI lower CI upper 21.600000 2.160369 1.714827 19.885173 23. This 95% CI (i) contains (ii) does not contain μ = 22. (d) A 95% CI for μ, if ¯x = 29.8, is x¯ ± tα 2 √^ sn = 29. 8 ± 1. 96 2 √.^9715 = (i) (19. 60 , 23 .60) (ii) (23. 45 , 27 .80) (iii) (28. 16 , 31 .44). mean1.t.interval(29.8,2.97,14,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Mean Critical Value Margin of Error CI lower CI upper 29.800000 2.160369 1.714827 28.085173 31. This 95% CI (i) contains (ii) does not contain μ = 22. (e) If sample average length, ¯x, changes, corresponding 95% CI, x¯ ± tα 2 √^ sn , (i) changes (ii) remains the same. More than this,
i. all possible 95% CIs contain μ = 22. ii. none of all possible 95% CIs contain μ = 22. iii. ninety–nine percent of all possible 95% CIs contain μ = 22, and so one percent of all possible 95% CIs do not contain μ = 22. iv. ninety–five percent of all possible 95% CIs contain μ = 22, and so five percent of all possible 95% CIs do not contain μ = 22. This is demonstrated in figure below. (f) Choose true or false.
162 Chapter 4. Statistics (LECTURE NOTES 8)
(a) Using R. The 95% CI for σ^2 is (i) (0. 39 , 1 .22) (ii) (0. 41 , 1 .25) (iii) (0. 44 , 1 .30). var1.chi2.interval = function(v,n,conf.level) { df = n - 1 chilower = qchisq((1 - conf.level)/2, df) chiupper = qchisq((1 - conf.level)/2, df, lower.tail = FALSE) ci.lower <- df * v/chiupper ci.upper <- df * v/chilower margin.error <- (ci.upper - ci.lower)/ dat <- c(v, chilower, chiupper, margin.error, ci.lower, ci.upper) names(dat) <- c("Variance", "Lower Crit Val", "Upper Crit Val", "Margin of Error", "CI lower", "CI upper") return(dat) } var1.chi2.interval(0.7,28,0.95) # 95% CI for variance, n = 28 Variance Lower Crit Val Upper Crit Val Margin of Error CI lower CI upper 0.7000000 14.5733827 43.1945110 0.4296647 0.4375556 1. (b) Upper critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI is χ^2 α 2 = χ^20. 05 2
= χ^20. 025 = (i) 8. 7 (ii) 40. 1 (iii) 43. 2 qchisq(0.975, 27) # 95% upper critical chi-square value [1] 43. (c) Lower critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI is χ^21 − α 2 = χ^21 − 0. 05 2
= χ^20. 975 = (i) 14. 6 (ii) 40. 1 (iii) 43. 2 qchisq(0.025, 27) # 95% lower critical chi-square value [1] 14. (d) Using Table C.4, lower critical value for 95% CI is χ^21 − α 2 = χ^21 − 0. 05 2
= χ^20. 975 = (i) between 13.12 and 16. 79 (ii) 40. 1 (iii) 43. 2 (e) So, 95% CI for variance σ^2 is ( (n − 1)s^2 χ^2 α/ 2
(n − 1)s^2 χ^21 −α/ 2
(i) (0. 61 , 1 .65) (ii) (0. 59 , 1 .29) (iii) (0. 43 , 1 .29). (f) Since 95% CI (0.43, 1.29) does not include 0.40, this indicates variance in distance between door and jamb (i) is (ii) is not 0.4 mm^2. (g) Population, parameter, sample and statistic. Match columns.
terms jamb example (a) population (a) variance in jamb–door distance, of 28 cars, s^2 (b) sample (b) variance in jamb–door distance, of all cars, σ^2 (c) statistic (c) jamb–door distances, of all cars (d) parameter (d) jamb–door distances, of 28 cars
terms (a) (b) (c) (d) jamb example
Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 163
(a) Using R. The 90% CI for σ^2 is (i) (88. 1 , 281 .3) (ii) (88. 7 , 282 .3) (iii) (88. 2 , 282 .3). var1.chi2.interval(12^2,18,0.90) # 90% CI for variance, n = 18 Variance Lower Crit Val Upper Crit Val Margin of Error CI lower CI upper 144.00000 8.67176 27.58711 96.77927 88.73709 282. (b) Upper critical value for 90% = (1 − α) · 100% = (1 − 0 .10) · 100% CI is χ^2 α 2 = χ^20. 10 2
= χ^20. 05 = (i) 8. 7 (ii) 27. 6 (iii) 43. 2 qchisq(0.95, 17) # 90% upper critical chi-square value [1] 27. (c) Lower critical value for 90% = (1 − α) · 100% = (1 − 0 .10) · 100% CI is χ^21 − α 2 = χ^21 − 0. 10 2 = χ^20. 95 = (i) 8. 7 (ii) 40. 1 (iii) 43. 2 qchisq(0.05, 17) # 90% lower critical chi-square value [1] 8.
(d) So, 90% CI for variance( σ^2 is (there may round-off error) (n−1)s^2 χ^2 U^ ,^
(n−1)s^2 χ^2 L
(18−1)12^2
(18−1)12^2
(i) (80. 5 , 101 .4) (ii) (100. 5 , 104 .2) (iii) (88. 7 , 281 .4). (e) Since 90% CI (88.7, 281.4) includes test statistic 13^2 = 169, this indicates variance in lengths (i) is (ii) is not σ^2 = 13^2 mm^2. (f) Also, 90% CI for standard deviation σ is (√ (n−1)s^2 χ^2 U^ ,
(n−1)s^2 χ^2 L
(18−1)12^2
(18−1)12^2
(i) (9. 4 , 16 .8) (ii) (10. 5 , 14 .2) (iii) (88. 7 , 281 .4).
Let x 1 and x 2 be number of successes in two independent samples of size n 1 and n 2 (with ˆp 1 = x n^11 and ˆp 2 = (^) nx^22 ) taken two populations with proportions p 1 and p 2. The (1 − α) · 100% 2-proportion z-interval for p 1 − p 2 is
pˆ 1 − pˆ 2 ± zα 2
p ˆ 1 (1 − pˆ 1 ) n 1
pˆ 2 (1 − pˆ 2 ) n 2
where we assume the samples random and there are at least 5 successes and 5 failures in each sample.
Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 165
military (1) civilian (2) male doctors 358 6786 total doctors 407 7363
From above, ˆp 1 = 358407 , ˆp 2 = 67867363 ; also critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, of zα 2 = z 0. 05 2 = z 0. 025 ≈ (i) 1. 65 (ii) 1. 96 (iii) 2. 09 ,
qnorm(0.975) # critical value z, for 95% CI
[1] 1.
and so 95% CI for p 1 − p 2 is
pˆ 1 − pˆ 2 ± zα 2
p ˆ 1 (1 − pˆ 1 ) n 1
pˆ 2 (1 − pˆ 2 ) n 2
358 407
6786 7363
(i) (− 0. 054 , − 0 .008) (ii) (− 0. 064 , − 0 .009) (iii) (− 0. 074 , − 0 .010) prop2.interval <- function(x, n, conf.level) { x1 <- x[1]; x2 <- x[2]; n1 <- n[1]; n2 <- n[2] p.hat1 <- x1/n1; p.hat2 <- x2/n z.crit <- -1qnorm((1-conf.level)/2) margin.error <- z.critsqrt(p.hat1(1-p.hat1)/n1+p.hat2(1-p.hat2)/n2) ci.lower <- p.hat1-p.hat2 - margin.error ci.upper <- p.hat1-p.hat2 + margin.error dat <- c(p.hat1, p.hat2, z.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("p.hat1", "p.hat2", "z crit", "Margin of Error", "CI lower", "CI upper") return(dat) } prop2.interval(c(358,6786), c(407,7363), 0.95) # approx 2-proportion z-test for p, two-sided
p.hat1 p.hat2 z crit Margin of Error CI lower CI upper 0.879606880 0.921635203 1.959963985 0.032205624 -0.074233948 -0.
Since confidence interval does not include (is, in fact, smaller than) zero, this indicates population proportion of male military doctors (i) is less than (ii) equals (iii) is greater than (iv) is different from the population proportion of male civilian doctors.
166 Chapter 4. Statistics (LECTURE NOTES 8)
female progesterone (1) female control (2) 1 5.85 5 5. 2 2.28 6 1. 3 1.51 7 1. 4 2.12 8 1.
progesterone <- c(5.85, 2.28, 1.51, 2.12) control <- c(5.23, 1.21, 1.40, 1.38)
From R, ¯x 1 ≈ 2 .94, s 1 ≈ 1 .97, ¯x 2 ≈ 2 .305, s 2 ≈ 1 .95, m1 <- mean(progesterone); m1; s1 <- sqrt(var(progesterone)); s m2 <- mean(control); m2; s2 <- sqrt(var(control)); s
mean(progesterone); sqrt(var(progesterone)) [1] 2. [1] 1. mean(control); sqrt(var(control)) [1] 2. [1] 1.
so pooled standard deviation is
sp =
(n 1 − 1)s^21 + (n 2 − 1)s^22 n 1 + n 2 − 2
(i) 1. 95 (ii) 1. 96 (iii) 1. 97 (which not surprising since s 1 ≈ 1 .97, s 2 ≈ 1 .95) n1 <- length(progesterone); n2 <- length(control) s12 <- var(progesterone); s22 <- var(control) sp <- sqrt(((n1-1)s12 + (n2-1)s22)/(n1+n2-2)); sp
[1] 1.
and critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, with degrees of freedom = n 1 + n 2 − 2 = 4 + 4 − 2 = (i) 4 (ii) 6 (ii) 8 , so tα 2 = t^0. 205 = t 0. 025 ≈ (i) 2. 31 (ii) 2. 45 (iii) 3. 09 ,
qt(0.975,6) # critical t value, 95% CI, using r df
[1] 2.
and so 95% CI for μ 1 − μ 2 is
(¯x 1 − x¯ 2 ) ± sp · tα 2
n 1
n 2
(i) (− 2. 52 , 6 .49) (ii) (− 2. 62 , 6 .39) (iii) (− 2. 76 , 4 .03)
168 Chapter 4. Statistics (LECTURE NOTES 8)
and critical value for 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, with degrees of freedom =
r =
s^21 n 1 +^
s^22 n 2
1 n 1 − 1
s^21 n 1
s^22 n 2
972 4 +^
952 4
1 4 − 1
4
4
(i) 4 (ii) 6 (ii) 8 (same as when σ^21 = σ 22 ) df = 5. so tα 2 = t 0. 205 = t 0. 025 ≈ (i) 2. 31 (ii) 2. 45 (iii) 3. 09 , qt(0.975,6) # critical t value, 95% CI, n1 + n2 - 2 = 6 df
[1] 2.
and so 95% CI for μ 1 − μ 2 is
(¯x 1 − ¯x 2 ) ± tα 2
s^21 n 1
s^22 n 2
(i) (− 2. 52 , 6 .49) (ii) (− 2. 62 , 6 .39) (iii) (− 2. 76 , 4 .03) mean2.t.interval(m1,m2,s1,s2,n1,n2, 0.95,"diff.var")
Mean Difference df Critical Value Margin of Error CI lower CI upper 0.635000 5.999585 2.446953 3.391355 -2.756355 4.
Since confidence interval does include zero, this indicates progesterone population mean cellular response (i) is less than (ii) equals (ii) is greater than (ii) is different from control population mean cellular response.
cow gentech (1) control (2) differences, di 1 62 54 2 45 43 3 53 55 4 35 39 5 71 65 6 64 62 7 63 56 8 57 50 9 43 52
Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 169
gentech <- c(62, 45, 53, 35, 71, 64, 63, 57, 43) control <- c(54, 43, 55, 39, 65, 62, 56, 50, 52) diff <- gentech - control; diff
[1] 8 2 -2 -4 6 2 7 7 -
d^ ¯ ≈ (i) 1. 41 (ii) 1. 89 (iii) 2. 52 , sd ≈ (i) 5. 47 (ii) 5. 86 (iii) 6. 52 , mean(diff); sqrt(var(diff))
[1] 1. [1] 5.
with n − 1 = 9 − 1 = (i) 6 (ii) 7 (ii) 8 degrees of freedom, and critical value 95% = (1 − α) · 100% = (1 − 0 .05) · 100% CI, so tα 2 = t^0. 205 = t 0. 025 ≈ (i) 2. 31 (ii) 2. 53 (iii) 3. 09 ,
qt(0.975,8) # critical t value, 95% CI, nd - 1 = 9 - 1 = 8 df
[1] 2.
and so 95% CI for μd is
d¯ ± tα 2
sd √ n
(i) (− 2. 52 , 6 .49) (ii) (− 2. 62 , 6 .39) (iii) (− 2. 72 , 6 .29) mean1.t.interval <- function(m,s,n,conf.level) { t.crit <- -1qt((1-conf.level)/2,n-1) margin.error <- t.crits/sqrt(n) ci.lower <- m - margin.error ci.upper <- m + margin.error dat <- c(mean, t.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } mean1.t.interval(1.889,5.8618,9,0.95) # m: mean, s: SD, n: sample size, 95% t-interval
Mean Critical Value Margin of Error CI lower CI upper 1.889000 2.306004 4.505778 -2.616778 6.
Since confidence interval does include zero, this indicates gentech population mean milk yield (i) is less than (ii) equals (iii) is greater than (iv) is different from control population mean milk yield.