









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Formula equations with linear transformation, data description, histogram, probability and random variables and interpretation from inference.
Typology: Cheat Sheet
1 / 16
This page cannot be seen from the preview
Don't miss anything!
Part I: IQR = Q 3 – Q 1 Test for an outlier: 1.5(IQR) above Q 3 or below Q 1 The calculator will run the test for you as long as you choose the boxplot with the oulier on it in STATPLOT
Linear transformation: Addition : affects center NOT spread adds to , M, Q 1 , Q3, IQR
not σ
Multiplication: affects both center and spread multiplies , M, Q 1 , Q3, IQR, σ
When describing data: describe center, spread, and shape.
Give a 5 number summary or mean and standard deviation when necessary.
Histogram: fairly symmetrical unimodal
skewed right
Skewed left Ogive (cumulative frequency)
Boxplot (with an outlier)
Stem and leaf Normal Probability Plot
The 80th^ percentile means that 80% of the data is below that observation.
68-95-99.7 Rule for Normality N(μ,σ) N(0,1) Standard Normal
r: correlation coefficient, The strength of the linear relationship of data. Close to 1 or -1 is very close to linear
r^2 : coefficient of determination. How well the model fits the data. Close to 1 is a good fit. “Percent of variation in y described by the LSRL on x”
residual =
residual = observed – predicted
y = a+bx Slope of LSRL(b): rate of change in y for every unit x
y-intercept of LSRL(a): y when x = 0
Exponential Model: y = abx^ take log of y
Power Model: y = axb^ take log of x and y
Explanatory variables explain changes in response variables. EV: x, independent RV: y, dependent
Lurking Variable: A variable that may influence the relationship bewteen two variables. LV is not among the EV’s
Confounding: two variables are confounded when the effects of an RV cannot be distinguished.
Part II: Designing Experiments and Collecting Data:
Sampling Methods:
The Bad: Voluntary sample. A voluntary sample is made up of people who decide for themselves to be in the survey. Example: Online poll Convenience sample. A convenience sample is made up of people who are easy to reach. Example: interview people at the mall, or in the cafeteria because it is an easy place to reach people.
The Good: Simple random sampling. Simple random sampling refers to a method in which all possible samples of n objects are equally likely to occur. Example: assign a number 1-100 to all members of a population of size 100. One number is selected at a time from a list of random digits or using a random number generator. The first 10 selected without repeats are the sample. Stratified sampling. With stratified sampling, the population is divided into groups, based on some characteristic. Then, within each group, a SRS is taken. In stratified sampling, the groups are called strata. Example: For a national survey we divide the population into groups or strata, based on geography - north, east, south, and west. Then, within each stratum, we might randomly select survey respondents. Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group is called a cluster. A sample of clusters is chosen using a SRS. Only individuals within sampled clusters are surveyed. Example: Randomly choose high schools in the country and only survey people in those schools. Difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes subjects from each stratum. With cluster sampling the sample includes subjects only from sampled clusters. Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods. Example: Stage 1, use cluster sampling to choose clusters from a population. Then, in Stage 2, we use simple random sampling to select a subset of subjects from each chosen cluster for the final sample. Systematic random sampling. With systematic random sampling, we create a list of every member of the population. From the list, we randomly select the first sample element from the first k subjects on the population list. Thereafter, we select every kth subject on the list. Example: Select every 5th^ person on a list of the population.
Experimental Design: A well-designed experiment includes design features that allow researchers to eliminate extraneous variables as an explanation for the observed relationship between the independent variable(s) and the dependent variable. Experimental Unit or Subject: The individuals on which the experiment is done. If they are people then we call them subjects Factor: The explanatory variables in the study Level: The degree or value of each factor. Treatment: The condition applied to the subjects. When there is one factor, the treatments and the levels are the same. Control. Control refers to steps taken to reduce the effects of other variables (i.e., variables other than the independent variable and the dependent variable). These variables are called lurking variables. Control involves making the experiment as similar as possible for subjects in each treatment condition. Three control strategies are control groups, placebos, and blinding. Control group. A control group is a group that receives no treatment Placebo. A fake or dummy treatment. Blinding : Not telling subjects whether they receive the placebo or the treatment Double blinding: neither the researchers or the subjects know who gets the treatment or placebo Randomization. Randomization refers to the practice of using chance methods (random number tables, flipping a coin, etc.) to assign subjects to treatments. Replication. Replication refers to the practice of assigning each treatment to many experimental subjects. Bias: when a method systematically favors one outcome over another.
Types of design: Completely randomized design With this design, subjects are randomly assigned to treatments. Randomized block design , the experimenter divides subjects into subgroups called blocks. Then, subjects within each block are randomly assigned to treatment conditions. Because this design reduces variability and potential confounding, it produces a better estimate of treatment effects. Matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are randomly assigned to different treatments. In some cases you give two treatments to the same experimental unit. That unit is their own matched pair!
Sampling distribution: The distribution of all values of the statistic in all possible samples of the same size from the population.
Central Limit Theorem: As n becomes very large the sampling distribution for is approximately NORMAL
Use (n ≥ 30) for CLT Low Bias: Predicts the center well High Bias: Does not predict center well Low Variability: Not spread out High Variability: Is very spread out
For the expected value (mean,μX) and the X or X^2 of a probability distribution use the formula sheet
Fixed Number of Trials Probability of success is the same for all trials Trials are independent
If X is B(n,p) then (ON FORMULA SHEET)
Mean X np
Standard Deviation X np ( 1 p )
For Binomial probability use or: or use:
Exactly: P(X = x) = binompdf(n , p, x)
At Most: P(X ≤ x) = binomcdf(n , p, x)
At least: P(X ≥ x) = 1 - binomcdf(n , p, x-1)
More than: P(X > x) = 1 - binomcdf(n , p, x)
Less Than: P(X < x) = binomcdf(n , p, x-1)
You may use the normal approximation of the binomial distribution when np ≥ 10 and n(1-p) ≥ 10. Use then mean and standard deviation of the binomial situation to find the Z score.
Finding the probability of multiple simple events. Addition Rule: P(A or B) = P(A) + P(B) – P(A and B) Multiplication Rule: P(A and B) = P(A)P(B|A)
Mutually Exclusive events CANNOT be independent A and B are independent if the outcome of one does not affect the other. A and B are disjoint or mutually exclusive if they have no events in common. Roll two die: DISJOINT rolling a 9 rolling doubles
Roll two die: NOT disjoint rolling a 4 rolling doubles
Independent: P(B) = P(B|A) Mutually Exclusive: P(A and B) = 0
You are interested in the amount of trials it takes UNTIL you achieve a success. Probability of success is the same for each trial Trials are independent
Use simple probability rules for Geometric Probabilities.
P(X=n) = p(1-p)n-^1 P(X > n) = (1 – p)n^ = 1 - P(X ≤ n)
μX is the expected number of trails until the first success or
Finding the probability of an event given that another even has already occurred.
Conditional Probability:
Use a two way table or a Tree Diagram for Conditional Problems. Events are Independent if P(B|A) = P(B)
For a single observation from a normal population
probabilities (upper – lower)
Use the table to find the probability or use normalcdf(min,max,0,1) after finding the z-score
For the mean of a random sample of size n from a population.
is approximately Normal with:
X
X
If n < 30 then the population should be Normally distributed to begin with to use the z-distribution.
probabilities (upper – lower) Use the table to find the probability or use normalcdf(min,max,0,1) after finding the z-score
Mutually Exclusive vs. Independence You just heard that Dan and Annie who have been a couple for three years broke up. This presents a problem, because you're having a big party at your house this Friday night and you have invited them both. Now you're afraid there might be an ugly scene if they both show up. When you see Annie, you talk to her about the issue, asking her if she remembers about your party. She assures you she's coming. You say that Dan is invited, too, and you wait for her reaction. If she says, "That jerk! If he shows up I'm not coming. I want nothing to do with him!", they're mutually exclusive. If she says, "Whatever. Let him come, or not. He's nothing to me now.", they're independent.
Mutually Exclusive and Independence are two very different ideas
Mutually Exclusive (disjoint): P(A and B) = 0 Events A and B are mutually exclusive if they have no outcomes in common. That is A and B cannot happen at the same time.
Example of mutually exclusive (disjoint) : A: roll an odd on a die B: roll an even on a die
Odd and even share no outcomes P(odd and even) = 0 Therefore, they are mutually exclusive.
Example of not mutually exclusive (joint): A: draw a king B: draw a face card
King and face card do share outcomes. All of the kings are face cards. P(king and face card) = 4/ Therefore, they are not mutually exclusive.
Independence: P(B) = P(B|A) Events A and B are independent if knowing one outcome does not change the probability of the other. That is knowing A does not change the probability of B.
Examples of independent events: A: draw an ace B: draw a spade
P(Spade) =13/52 = 1/ P(Spade | Ace) = 1/ Knowing that the drawn card is an ace does not change the probability of drawing a spade
Examples that are dependent (not independent): A: roll a number greater than 3 B: roll an even
P(even) = 3/6 = 1/ P(even | greater than 3) = 2/ Knowing the number is greater than three changes the probability of rolling an even number.
Mutually Exclusive events cannot be independent
Mutually exclusive and dependent
A: Roll an even B: Roll an odd
They share no outcomes and knowing that it is odd changes the probability of it being even.
Independent events cannot be Mutually Exclusive
Independent and not mutually exclusive
A: draw a black card B: draw a king
Knowing it is a black card does not change the probability of it being a king and they do share outcomes.
Dependent Events may or may not be mutually exclusive Dependent and mutually exclusive A: draw a queen B: draw a king Knowing it is a queen changes the probability of it being a king and they do not share outcomes.
Dependent and not mutually exclusive A: Face Card B: King Knowing it is a face card changes the probability of it being a king and they do share outcomes.
If events are mutually exclusive then: P(A or B) = P(A) + P(B) If events are not mutually exclusive use the general rule: P(A or B) = P(A) + P(B) – P(A and B)
If events are independent then: P(A and B) = P(A)P(B) If events are not independent then use the general rule: P(A and B) = P(A)P(B|A)
Interpretations from Inference
Interpretation for a Confidence Interval:
I am C% confident that the true parameter (mean or proportion p) lies between # and #. INTERPRET IN CONTEXT!!
Interpretation of C% Confident: Using my method, If I sampled over and over again, C% of my intervals would contain the true
parameter (mean or proportion p). NOT : The parameter lies in my interval C% of the time. It either does or does not!!
If p < I reject the null hypothesis H 0 and I have sufficient/strong evidence to support the alternative hypothesis Ha INTERPRET IN CONTEXT in terms of the alternative.
If p > I fail to reject the null hypothesis H 0 and I have insufficient/poor evidence to support the alternative hypothesis Ha INTERPRET IN CONTEXT in terms of the alternative.
Evidence Against Ho
Interpretation of a p-value:
Duality: Confidence intervals and significance tests. If the hypothesized parameter lies outside the C% confidence interval for the parameter I can REJECT H 0
If the hypothesized parameter lies inside the C% confidence interval for the parameter I FAIL TO REJECT H 0
Power of test:
The probability that at a fixed level test will reject the null hypothesis when and alternative value is true.
One Sample Z Test Use when testing a mean from a single sample and σ is known Ho: μ = # Ha: μ ≠ # - or- μ < # - or- μ > # Conditions:
0
CALCULATOR: 1: Z-Test
One Sample t Test Use when testing a mean from a single sample and σ is NOT known [Also used in a matched paired design for the mean of the difference: PAIRED t PROCEDURE ] Ho: μ = # Ha: μ ≠ # - or- μ < # - or- μ > # Conditions:
Test statistic: df = n- 1
n
s
x t^0
CALCULATOR: 2: T-Test
One Proportion Z test Use when testing a proportion from a single sample Ho: p = # Ha: p ≠ # - or- p < # - or- p > # Conditions:
CALCULATOR: 5: 1-PropZTest
Two Sample Z test σ known (RARELY USED) Use when testing two means from two samples and σ is known Ho: μ 1 = μ 2 Ha: μ 1 ≠ μ 2 - or- μ 1 < μ 2 - or- μ 1 > μ 2 Conditions:
CALCULATOR: 3: 2-SampZTest
Two Sample t Test σ unknown Use when testing two means from two samples and σ is NOT known Ho: μ 1 = μ 2 Ha: μ 1 ≠ μ 2 - or- μ 1 < μ 2 - or- μ 1 > μ 2 Conditions:
CALCULATOR: 4: 2-SampTTest
Two Proportion Z test Use when testing two proportions from two samples Ho: p 1 = p 2 Ha: p 1 ≠ p 2 - or- p 1 < p 2 - or- p 1 > p 2 Conditions:
Independence and N≥10n for both populations.Test statistic:
CALCULATOR: 6: 2-PropZTest
Χ^2 – Goodness of Fit L1 and L Use for categorical variables to see if one distribution is similar to another
Ho: The distribution of two populations are not different Ha: The distribution of two populations are different Conditions:
CALCULATOR: D: Χ^2 GOF-Test(on 84)
Χ^2 – Homogeneity of Populations r x c table Use for categorical variables to see if multiple distributions are similar to one another Ho: The distribution of multiple populations are not different Ha: The distribution of multiple populations are different Conditions:
CALCULATOR: C: Χ^2 - Test
Χ^2 – Independence/Association r x c table Use for categorical variables to see if two variables are related Ho: Two variables have no association (independent) Ha: Two variables have an association (dependent) Conditions:
CALCULATOR: C: Χ^2 - Test
Significance Test for Regression Slope: Ho: β = 0 Ha: β ≠ # - or- β < # - or- β > # Conditions: 1. Observations are independent 2. Linear Relationship (look at residual plot)
CALCULATOR: LinRegTTest Use technology readout or the calculator for this significance test SEb
b (^) t df = n- 2
IQR Inner Quartile Range Mean of a sample
μ Mean of a population s Standard deviation of a sample σ Standard deviation of a population Sample proportion
p (^) Population proportion s (^2) Variance of a sample σ (^2) Variance of a population M (^) Median Σ (^) Summation Q 1 First Quartile Q 3 Third Quartile Z Standardized value – z test statistic z^ Critical value for the standard normal distribution t Test statistic for a t test t^ Critical value for the t-distribution N(μ, σ) Notation for the normal distribution with mean and standard deviation r Correlation coefficient – strength of linear relationship r^2 Coefficient of determination – measure of fit of the model to the data Equation for the Least Squares Regression Line
a y-intercept of the LSRL b Slope of the LSRL ( Point the LSRL passes through
y = axb^ Power model y = abx^ Exponential model SRS Simple Random Sample S Sample Space P(A) The probability of event A Ac^ A complement P(B|A) Probability of B given A ∩ Intersection (And) U Union (Or) X Random Variable μX Mean of a random variable σ (^) X Standard deviation of a random variable Variance of a random variable
B(n,p) Binomial Distribution with observations and probability of success Combination n taking k
pdf Probability distribution function cdf Cumulative distribution function n Sample size N Population size CLT Central Limit Theorem Mean of a sampling distribution Standard deviation of a sampling distribution
df Degrees of freedom SE Standard error H 0 Null hypothesis-statement of no change Ha Alternative hypothesis- statement of change p-value Probability (assuming H 0 is true) of observing a result as large or larger than that observed α Significance level of a test. P(Type I) or the y-intercept of the true LSRL β P(Type II) or the true slope of the LSRL χ^2 Chi-square test statistic