



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of population parameters and sample statistics in statistical inference. It discusses how to estimate population parameters using sample statistics, the concept of null distributions, and confidence intervals. The document also covers hypothesis testing and significance levels.
Typology: Summaries
1 / 5
This page cannot be seen from the preview
Don't miss anything!
Summary of Statistical Inference
What is Statistical Inference****? The population parameter and the sample statistic summarize the same variable. The population parameter summarizes the variable for the population, which is what we want to know, e.g. π or μ 1 - μ2. However, we usually can’t observe the whole population so we don’t know what the parameter value really is. However, we can measure the variable on a sample or randomized groups and compute a sample statistic, e.g. 𝑝̂ or 𝑥̅ 1 − 𝑥̅ 2. The question is what can we infer about the parameter based on this statistic? Because these statistics follow a null distribution/regular pattern due to the randomness in the study design, we can estimate or calculate probabilities of different values of the statistic occurring. Different statistics follow different distributions, but once we know which distribution we should use, we can make conclusions about the value of the parameter, e.g., it’s in some interval or we have evidence that it is not a particular value.
Null Distributions If we specify a value for the population parameter, we can take (or simulate) lots of samples from this population or lots of random assignments and calculate a statistic for each sample/random assignment. This allows us to examine the behavior of the statistic so we can discuss the shape , center , and spread (i.e. variability) of this “null” distribution. For example, what types of values do we expect the statistic to have, how far away might the statistic stray from the hypothesized value of the parameter?
Simulation vs. “Approximate” (Theory-Based) procedures In almost every case we have seen two different ways to calculate the p-value: simulation or an approximation based on a mathematical model. We are considering the simulation approaches to always be valid. The mathematical models are only appropriate if the “validity conditions” of the theory-based, approximate approach are met. The advantage of the mathematical models is we can easily get a confidence interval as well. So you may want to consider the mathematical model way first but then if the validity (aka “technical”) conditions aren’t met, use the simulation approach.
Confidence Intervals Estimate population parameter The goal of a confidence interval is to get a range of plausible values that we think the population parameter could be equal to. To do this, we use the sample statistic and a measure of the sampling variability of the sample statistic. This lets us form an interval around the sample statistic that should contain the population parameter. Note, we are trying to contain the population parameter in the interval, not the data and not the sample statistic. In fact, the sample statistic are often the midpoint (center) of the interval.
Tests of Significance Test claim about population parameter The goal of a test of significance is to make a decision about the population parameter. Here are the steps we use:
Define the parameter(s) of interest. (Should also be able to define the observational units and variable)
Specify the hypotheses (e.g. H 0 : π =1/3, μ =50, etc.) Always in terms of the population (parameters) because that’s what is unknown and what we are trying to make statements about (take off the hats!) The null hypothesis is the “dull hypothesis” or the “ho-hum hypothesis” The alternative hypothesis specifies something interesting (“a-ha!”) One or two-sided (decide based on wording of research question)
Check the validity conditions, and identify the appropriate test procedure by name If the technical conditions are not met, use a simulation method instead
Compare the data observed in the sample to what’s “expected” from H 0.
Compute the test statistic and do one of the following:
Find the p-value=probability of observing a value of the statistic as extreme or more extreme when H 0 is true , using an applet or minitab. Reject H 0 if p-value level of significance otherwise fail to reject it.
Determine the rejection region for the given level of significance using an applet – plug in numbers for “as extreme as” until the probability is equal or just below the desired level of significance. If your test-statistic is inside the rejection region, reject H 0 otherwise fail to reject it.
Be careful about inferring a cause-and-effect relationship if you’re comparing two variable based on your study design (was this an observational or experimental study?)
If we repeatedly took different samples or random shuffles and calculated the value of the statistic for each sample, the p-value indicates how often we would expect to see the statistic value that we actually did observe, or one more extreme, when Ho is true. If the statistic value is very unlikely (so small p-value) we stop believing H 0. We can compare to the significance level as a benchmark to decide whether the p-value is “too small.”
Question Translations If the question asks you to
describe/compare the distribution(s) of a categorical variable
Look at (conditional) proportions
describe/compare the distribution(s) of a quantitative variable
Shape, center, spread Use mean, median, SD, IQR if available estimate “how large the difference is” or “plausible values for the parameter”
Consider the confidence interval
test a parameter Conduct a test of significance (use your “methods” table to find the appropriate test) comment on generalizability Consider the data collection methods and specify a reasonable population comment on causation Consider whether you have a randomized experiment and statistical significance describe a confounding variable Specify a variable and argue how it might differ between the explanatory variable groups and relate to the response variable describe a parameter Specify the number (e.g., mean or proportion or slope), the variable (e.g., how you are defining success), and the population (don’t worry too much at this point about whether it’s a reasonable population) interpret a p-value Begin the sentence “the probability that…” or “the proportion of …” and put your answer in context of the problem (e.g., what source of randomness are we are modelling? what statistic are you talking about, what value was observed, what the null hypothesis specified in context, what do you mean by “or more extreme”) interpret a confidence interval Begin the sentence “I am 95% confident that <
p-value interpretation template
X% of (randomness) would produce (statistic) at least as extreme (higher, lower, beyond) the observed result (cite the value) if the null is true (context)
CI interpretation template
I’m 95% confident that the (parameter – mean(s) or proportion(s), define population or process) is in the interval, context (variable, measurement units)
With comparison – make sure the “direction” is clear
Test statistic interpretation template (t or z statistic)
The (observed mean or proportion or difference in mean or proportion) is (insert absolute value of test statistic) standard deviations (below if -, above if +) the mean of (your variable).
R^2 interpretation template
X% of the variability in the (response variable y) is explained by a least squares regression model with (the explanatory variable x)
Regression slope interpretation template
A one (unit) increase in (the explanatory variable x) results in an X (unit) (increase if +, decrease if -) in the predicted value of (the response variable y).
CI level interpretation
This means 95% of intervals constructed this way would succeed in capturing the (parameter)
Power Interpretation
Probability of rejecting the null hypothesis when the alternative hypothesis is true. Calculated for a particular instance of the alternative hypothesis
Misc.
1 – P(type 2 error)
P(type 1 error) = level of significance
p-value = “observed” level of significance