Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Inference: Population Parameters and Sample Statistics, Summaries of Statistics

The concept of population parameters and sample statistics in statistical inference. It discusses how to estimate population parameters using sample statistics, the concept of null distributions, and confidence intervals. The document also covers hypothesis testing and significance levels.

Typology: Summaries

2021/2022

Uploaded on 03/31/2022

shanthi_48
shanthi_48 🇺🇸

4.8

(36)

901 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Summary of Statistical Inference
What is Statistical Inference?
The population parameter and the sample statistic summarize the same variable.
The population parameter summarizes the variable for the population, which is what we
want to know, e.g. π or µ1 - µ2. However, we usually can’t observe the whole population
so we don’t know what the parameter value really is. However, we can measure the
variable on a sample or randomized groups and compute a sample statistic, e.g. 𝑝
or 𝑥1 𝑥2. The question is what can we infer about the parameter based on this
statistic? Because these statistics follow a null distribution/regular pattern due to the
randomness in the study design, we can estimate or calculate probabilities of different
values of the statistic occurring. Different statistics follow different distributions, but
once we know which distribution we should use, we can make conclusions about the
value of the parameter, e.g., it’s in some interval or we have evidence that it is not a
particular value.
Null Distributions
If we specify a value for the population parameter, we can take (or simulate) lots of
samples from this population or lots of random assignments and calculate a statistic for
each sample/random assignment. This allows us to examine the behavior of the
statistic so we can discuss the shape, center, and spread (i.e. variability) of this “null”
distribution. For example, what types of values do we expect the statistic to have, how
far away might the statistic stray from the hypothesized value of the parameter?
Simulation vs. “Approximate” (Theory-Based) procedures
In almost every case we have seen two different ways to calculate the p-value:
simulation or an approximation based on a mathematical model. We are considering
the simulation approaches to always be valid. The mathematical models are only
appropriate if the “validity conditions” of the theory-based, approximate approach are
met. The advantage of the mathematical models is we can easily get a confidence
interval as well. So you may want to consider the mathematical model way first but
then if the validity (aka “technical”) conditions aren’t met, use the simulation approach.
Confidence Intervals Estimate population parameter
The goal of a confidence interval is to get a range of plausible values that we think the
population parameter could be equal to. To do this, we use the sample statistic and a
measure of the sampling variability of the sample statistic. This lets us form an interval
around the sample statistic that should contain the population parameter. Note, we are
trying to contain the population parameter in the interval, not the data and not the
sample statistic. In fact, the sample statistic are often the midpoint (center) of the
interval.
pf3
pf4
pf5

Partial preview of the text

Download Statistical Inference: Population Parameters and Sample Statistics and more Summaries Statistics in PDF only on Docsity!

Summary of Statistical Inference

What is Statistical Inference****? The population parameter and the sample statistic summarize the same variable. The population parameter summarizes the variable for the population, which is what we want to know, e.g. π or μ 1 - μ2. However, we usually can’t observe the whole population so we don’t know what the parameter value really is. However, we can measure the variable on a sample or randomized groups and compute a sample statistic, e.g. 𝑝̂ or 𝑥̅ 1 − 𝑥̅ 2. The question is what can we infer about the parameter based on this statistic? Because these statistics follow a null distribution/regular pattern due to the randomness in the study design, we can estimate or calculate probabilities of different values of the statistic occurring. Different statistics follow different distributions, but once we know which distribution we should use, we can make conclusions about the value of the parameter, e.g., it’s in some interval or we have evidence that it is not a particular value.

Null Distributions If we specify a value for the population parameter, we can take (or simulate) lots of samples from this population or lots of random assignments and calculate a statistic for each sample/random assignment. This allows us to examine the behavior of the statistic so we can discuss the shape , center , and spread (i.e. variability) of this “null” distribution. For example, what types of values do we expect the statistic to have, how far away might the statistic stray from the hypothesized value of the parameter?

Simulation vs. “Approximate” (Theory-Based) procedures In almost every case we have seen two different ways to calculate the p-value: simulation or an approximation based on a mathematical model. We are considering the simulation approaches to always be valid. The mathematical models are only appropriate if the “validity conditions” of the theory-based, approximate approach are met. The advantage of the mathematical models is we can easily get a confidence interval as well. So you may want to consider the mathematical model way first but then if the validity (aka “technical”) conditions aren’t met, use the simulation approach.

Confidence Intervals Estimate population parameter The goal of a confidence interval is to get a range of plausible values that we think the population parameter could be equal to. To do this, we use the sample statistic and a measure of the sampling variability of the sample statistic. This lets us form an interval around the sample statistic that should contain the population parameter. Note, we are trying to contain the population parameter in the interval, not the data and not the sample statistic. In fact, the sample statistic are often the midpoint (center) of the interval.

Tests of Significance Test claim about population parameter The goal of a test of significance is to make a decision about the population parameter. Here are the steps we use:

  1. Define the parameter(s) of interest. (Should also be able to define the observational units and variable)

  2. Specify the hypotheses (e.g. H 0 : π =1/3, μ =50, etc.) Always in terms of the population (parameters) because that’s what is unknown and what we are trying to make statements about (take off the hats!) The null hypothesis is the “dull hypothesis” or the “ho-hum hypothesis” The alternative hypothesis specifies something interesting (“a-ha!”) One or two-sided (decide based on wording of research question)

  3. Check the validity conditions, and identify the appropriate test procedure by name If the technical conditions are not met, use a simulation method instead

  4. Compare the data observed in the sample to what’s “expected” from H 0.

Compute the test statistic and do one of the following:

 Find the p-value=probability of observing a value of the statistic as extreme or more extreme when H 0 is true , using an applet or minitab.  Reject H 0 if p-value level of significance otherwise fail to reject it.

 Determine the rejection region for the given level of significance using an applet – plug in numbers for “as extreme as” until the probability is equal or just below the desired level of significance.  If your test-statistic is inside the rejection region, reject H 0 otherwise fail to reject it.

  1. Draw a conclusion in context Make conclusion about research question of interest (back to English)  Clearly indicate the population you are generalizing to, taking into account your sampling method (convenience, simple random, etc.)

 Be careful about inferring a cause-and-effect relationship if you’re comparing two variable based on your study design (was this an observational or experimental study?)

If we repeatedly took different samples or random shuffles and calculated the value of the statistic for each sample, the p-value indicates how often we would expect to see the statistic value that we actually did observe, or one more extreme, when Ho is true. If the statistic value is very unlikely (so small p-value) we stop believing H 0. We can compare to the significance level as a benchmark to decide whether the p-value is “too small.”

Question Translations If the question asks you to

describe/compare the distribution(s) of a categorical variable

Look at (conditional) proportions

describe/compare the distribution(s) of a quantitative variable

Shape, center, spread Use mean, median, SD, IQR if available estimate “how large the difference is” or “plausible values for the parameter”

Consider the confidence interval

test a parameter Conduct a test of significance (use your “methods” table to find the appropriate test) comment on generalizability Consider the data collection methods and specify a reasonable population comment on causation Consider whether you have a randomized experiment and statistical significance describe a confounding variable Specify a variable and argue how it might differ between the explanatory variable groups and relate to the response variable describe a parameter Specify the number (e.g., mean or proportion or slope), the variable (e.g., how you are defining success), and the population (don’t worry too much at this point about whether it’s a reasonable population) interpret a p-value Begin the sentence “the probability that…” or “the proportion of …” and put your answer in context of the problem (e.g., what source of randomness are we are modelling? what statistic are you talking about, what value was observed, what the null hypothesis specified in context, what do you mean by “or more extreme”) interpret a confidence interval Begin the sentence “I am 95% confident that <> is in (XX, XX)”. Clarify parameter, population, context If an interval about a difference, clarify which population parameter has a higher value. interpret the confidence level Talk about the reliability of the method, if you repeated the process for different samples, what percentage of the resulting intervals would succeed in capturing the parameter draw a conclusion from a p-value Comment on whether the p-value should be considered small, reject or fail to reject the null hypothesis, and restate the conclusion you are going with in the study context calculate a confidence interval Theory-based inference applet calculate a p-value Theory-based inference applet, or simulation in Minitab state hypotheses Probably want both null and alternative. Could ask for you to do this in words and/or in symbols. Make sure you are clearly talking about the population parameter and in context identify the procedure Name the test you would use (e.g., one proportion). Also be prepared to describe a simulation process you could use to estimate a p-value (e.g., flip a coin X times, shuffle X blue and X green cards X times) comment on validity conditions Consider the sample size condition for the relevant procedure as stated on your “methods” table

p-value interpretation template

X% of (randomness) would produce (statistic) at least as extreme (higher, lower, beyond) the observed result (cite the value) if the null is true (context)

CI interpretation template

I’m 95% confident that the (parameter – mean(s) or proportion(s), define population or process) is in the interval, context (variable, measurement units)

With comparison – make sure the “direction” is clear

Test statistic interpretation template (t or z statistic)

The (observed mean or proportion or difference in mean or proportion) is (insert absolute value of test statistic) standard deviations (below if -, above if +) the mean of (your variable).

R^2 interpretation template

X% of the variability in the (response variable y) is explained by a least squares regression model with (the explanatory variable x)

Regression slope interpretation template

A one (unit) increase in (the explanatory variable x) results in an X (unit) (increase if +, decrease if -) in the predicted value of (the response variable y).

CI level interpretation

This means 95% of intervals constructed this way would succeed in capturing the (parameter)

Power Interpretation

Probability of rejecting the null hypothesis when the alternative hypothesis is true. Calculated for a particular instance of the alternative hypothesis

Misc.

1 – P(type 2 error)

P(type 1 error) = level of significance

p-value = “observed” level of significance