Statistical Inference: Population Parameters and Sample Statistics | Summaries Statistics

Summary of Statistical Inference

What is Statistical Inference?

The population parameter and the sample statistic summarize the same variable.

The population parameter summarizes the variable for the population, which is what we

want to know, e.g. π or µ1 - µ2. However, we usually can’t observe the whole population

so we don’t know what the parameter value really is. However, we can measure the

variable on a sample or randomized groups and compute a sample statistic, e.g. 𝑝

or 𝑥1− 𝑥2. The question is what can we infer about the parameter based on this

statistic? Because these statistics follow a null distribution/regular pattern due to the

randomness in the study design, we can estimate or calculate probabilities of different

values of the statistic occurring. Different statistics follow different distributions, but

once we know which distribution we should use, we can make conclusions about the

value of the parameter, e.g., it’s in some interval or we have evidence that it is not a

particular value.

Null Distributions

If we specify a value for the population parameter, we can take (or simulate) lots of

samples from this population or lots of random assignments and calculate a statistic for

each sample/random assignment. This allows us to examine the behavior of the

statistic so we can discuss the shape, center, and spread (i.e. variability) of this “null”

distribution. For example, what types of values do we expect the statistic to have, how

far away might the statistic stray from the hypothesized value of the parameter?

Simulation vs. “Approximate” (Theory-Based) procedures

In almost every case we have seen two different ways to calculate the p-value:

simulation or an approximation based on a mathematical model. We are considering

the simulation approaches to always be valid. The mathematical models are only

appropriate if the “validity conditions” of the theory-based, approximate approach are

met. The advantage of the mathematical models is we can easily get a confidence

interval as well. So you may want to consider the mathematical model way first but

then if the validity (aka “technical”) conditions aren’t met, use the simulation approach.

Confidence Intervals Estimate population parameter

The goal of a confidence interval is to get a range of plausible values that we think the

population parameter could be equal to. To do this, we use the sample statistic and a

measure of the sampling variability of the sample statistic. This lets us form an interval

around the sample statistic that should contain the population parameter. Note, we are

trying to contain the population parameter in the interval, not the data and not the

sample statistic. In fact, the sample statistic are often the midpoint (center) of the

interval.

Statistical Inference: Population Parameters and Sample Statistics, Summaries of Statistics