

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of inferential statistics for comparing means and proportions between two populations. It covers the concepts of estimating and testing differences in means and proportions, as well as the special case of matched samples. Formulas and procedures for calculating confidence intervals and performing hypothesis tests.
Typology: Study notes
1 / 2
This page cannot be seen from the preview
Don't miss anything!
Inference (hypothesis tests and estimation) Differences between means, proportions in two populations The idea: We have two populations and a variable of interest. We may be interested in the difference in mean values (on this variable) in the two populations [difference of means] or in the difference in proportions of the populations that have a particular value on the variable [difference of proportions]
In dealing with means, we want to either
In dealing with proportions, we want to either
The methods parallel the methods for estimation and for tests on the mean of one population, but the calculations are different because we have different (and more complicated) distributions. There is a special situation [the “matched samples” case] which is usually discussed with (and often confused with) inference on two populations but is really a special case of inference on one population [of differences].
Difference of means (independent samples)
The basic important fact is that our best estimator of μ 1 − μ 2 (the difference between the population means — order of subraction matters) is the difference between sample means ¯x 1 − x¯ 2. The mean of the difference in sample means (as long as we keep sample sizes the same) is exactly the difference in the population means (in the same order):μx¯ 1 −¯x 2 = μX 1 −μX 2
and the variance of ¯x 1 − ¯x 2 is the sum of the variances of ¯x 1 and ¯x 2 , so σ¯x 1 −x¯ 2 =
σ^2 x 1 n 1 +^
σ^2 X 2 n 2. In addition, if^ X^1 and X 2 are approximately normally distributed or if the sample sizes are large enough then the distribution of ¯x 1 − x¯ 2 is approximately normal (which means ¯x^1 r−¯x^2 −(μ^1 −μ^2 ) σ^2 x 1 n 1 +^
σ^2 X 2 n 2
is a Z).
Thus, if we happen to know σ 1 and σ 2 and n 1 , n 2 are large enough or X 1 , X 2 are approximately normal, our 1 − α confidence interval for μ 1 − μ 2 is given by
x¯ 1 − x¯ 2 ± E with E = Z α 2
σ^2 x 1 n 1
σ X^22 n 2 In the usual situation, we don’t know σ 1 , σ 2 ; the difference of sample means, compared using s 1 , s 2 involves four values that vary from case to case, and is not even really a t – it is closely approximated by a t (if X 1 , X 2 normal or n 1 , n 2 large) but ( to make the approximation work) we have to use a strange value of degrees of freedom, so our interval for confidence 1 − α is given by
¯x 1 − x¯ 2 ± E with E = t α 2
s^2 x 1 n 1
s^2 X 2 n 2
df =
s^21 n 1 +^
s^22 n 2
1 n 1 − 1
s^21 n 1
s^22 n 2
[This is the fractional degrees of freedom value that will be reported by your calculator or by Minitab if you use either of these for the calculation) Testing follow the same six-step procedure as testing on one mean, but slightly different numbers appear. There are the same three forms for the alternative — the order in which the two populations are identified will matter for one-sided tests.
“Greater” H 0 : μ 1 = μ 2 Ha : μ 1 > μ 2
or H 0 : μ 1 − μ 2 = 0 Ha : μ 1 − μ 2 > 0
“Less” H 0 : μ 1 = μ 2 Ha : μ 1 < μ 2
or H 0 : μ 1 − μ 2 = 0 Ha : μ 1 − μ 2 < 0
“not equal” H 0 : μ 1 = μ 2 Ha : μ 1 6 = μ 2
or H 0 : μ 1 − μ 2 = 0 Ha : μ 1 − μ 2 6 = 0
Reject H 0 if sample t > tα Reject H 0 if sample t < −tα Reject H 0 if sample t < −t α 2 or sample t > t α 2
sample t =
x¯ 1 − ¯x 2 − (μ 1 − μ 2 ) √ s^2 x 1 n 1 +^
(^22) X 2 n 2
with df =
s^21 n 1 +^
s^22 n 2
1 n 1 − 1
s^21 n 1
s^22 n 2
Difference of proportions
The basic important fact is that our best estimator of p 1 − p 2 (the difference between the population proportions — order of subraction matters )is the difference between the sample proportions ¯p 1 − p¯ 2. The mean of the difference in sample proportions (as long as we don’t change sample sizes) is exactly the difference in the population proportions (in the same order):√ μp¯ 1 −¯p 2 = p 1 − p 2 and the variance of the differences in ¯p 1 and ¯p 2 is the sum of the variances, so σ¯p 1 −p¯ 2 = p 1 (1−p 1 ) n 1 +^
p 2 (1−p 2 ) n 2. If the sample sizes are large enough for the proportions (that is, if^ n^1 p^1 , n^1 −^ n^1 p^1 , n^2 , n^2 −^ n^2 p^2 are all at least 5) then the difference ¯p 1 − ¯p 2 will be approximately normally distributed (which means that q^ p¯p 11 (1−−p¯^2 p 1 −)(p^1 −p^2 ) n 1 +^ p 2 (1−p 2 ) n 2 is a Z) In working with proportions, we don’t have an independent calculation of standard deviation — it depends on the proportion — so we don’t get involved with t. For estimation, we have the problem that we don’t know p 1 , p 2 to put into the formula, so we make do with ¯p 1 , p¯ 2. If our sample sizes are large enough for our proportions, our 1 − α confidence interval for p 1 − p 2 is given by
¯p 1 − p¯ 2 ± E with E = Z α 2
p ¯ 1 (1 − p¯ 1 ) n 1
p¯ 2 (1 − p¯ 2 ) n 2
Testing follow the same six-step procedure as testing on one mean, but slightly different numbers appear. There are the same three forms for the alternative — the order in which the two populations are identified will matter for one-sided tests. Since our null hypothesis is always “difference between p 1 and p 2 is 0” (p 1 = p 2 ) we calculate the standard error of the difference using ¯p = n^1 n¯p^11 ++nn^22 ¯p^2 (= total number of successes /total number of trials) in place of both p 1 and p 2 [This is referred to as the “pooled estimate of the proportion”)
“Greater” H 0 : p 1 = p 2 Ha : p 1 > p 2
or H 0 : p 1 − p 2 = 0 Ha : p 1 − p 2 > 0
“Less” H 0 : p 1 = p 2 Ha : p 1 < p 2
or H 0 : p 1 − p 2 = 0 Ha : p 1 − p 2 < 0
“not equal” H 0 : p 1 = p 2 Ha : p 1 6 = p 2
or H 0 : p 1 − p 2 = 0 Ha : p 1 − p 2 6 = 0
Reject H 0 if sample Z > Zα Reject H 0 if sample Z < −Zα Reject H 0 if sample Z < −Z α 2 or sample Z > Z α 2
sample Z =
p¯ 1 − p¯ 2 − (p 1 − p 2 ) √ p¯(1−p¯) n 1 +^
p¯(1−¯p) n 2
p¯ 1 − p¯ 2 − (p 1 − p 2 ) √ ¯p(1 − p¯)
1 n 1 +^
1 n 2
Matched samples (Paired data) - means
This is the situation in which we have two sets of values (for the same variable) but each value in one set is related to a corresponding value in the other, so it makes sense to talk about the differences [individual differences, not just difference of the means]. We are interested in the mean of the differences. Examples: Measurements of resting heart rates of people before and after an exercise program (Pair is before and after on the same person) Selling price of a selection of standard items at Wal-Mart and at Target (pair is prices at two stores on the same item)
In this situation we work directly with the differences: d = x 1 − x 2 [It is usually necessary to keep track of the order of subtracting : “before” minus “after” gives a different sign from “after” minus “before” — and this will matter — especially for one-sided tests.]. If the variable we observe is normally distributed, or if the sample size (note n = number of pairs) is large enough,
then sample means of the differences will be approximately normally distributed (that is, d¯ − μ (^) d¯ σ (^) d¯
will be a Z and d¯ − μ (^) d¯ sd/
n will be a t) To estimate the mean difference we use:
d¯ ± E with E = t α 2 sd/
n
Our tests follow the same six steps and we have the same three cases:
“Greater” H 0 : μd = 0 Ha : μd > 0
“Less” H 0 : μd = 0 Ha : μd < 0
“not equal” H 0 : μd = 0 Ha : μd 6 = 0
sample t =
d¯ − 0 sd/
n df = n − 1
Reject H 0 if sample t > tα Reject H 0 if sample t < −tα Reject H 0 if sample t < −t α 2 or sample t > t α 2