Download Statistics for Engineers Notes and more Lecture notes Statistics in PDF only on Docsity!
STATS NOTES
Linear Functions of Random Variables
Normal Approximation to Binomial Distribution Normal Approximation to Poisson Distribution
Multinomial Probability Distribution Bivariate Normal Distribution
o 0.005 =
Z α
2
How to Perform a Hypothesis Test:
- State your Claim and Opposition (ex. μ = T 1 , μ ≠ T 2 )
- Match H 0 to whatever is “=”, H 1 to “≠, >, <” (≠ is a two tail, > is a right tail, and < is a left tail test)
- State level of significance (α)
- Perform Test Statistic
- Compare Z-score (t-score) or p-value a. If Test Z-score (t-score) ≤ Critical Value, Reject H 0 or If p-score ≤ α, Reject H 0 b. If Test Z-score (t-score) > Critical Value, F.T.R H 0 or If p-score > α, F.T.R. H 0
Population Proportion Statistic: Confidence Interval: Calculating Type 2 Error:
- Perform normal Hypothesis Test
- Select an alternative Hypothesis: Ha (μa or sa) a. Ex. If Ho: μ ≤ 3, Ha: μ > 3, then select any value greater than 3
3. Find Zcrit score for Ha (Use Z-score table value from selected α value for x , ex. α = 0.01, x 0 = P(Z 0 ))
a. x 0 = Z 0 (
√ n )
b. Zcrit =
x 0 − ua
√ n
- β = P(Zcrit) = Probability of conducting a Type II error If a sample falls in the rejection region of our initial Hypothesis Test, but exists in the F.T.R. region of the alternate Hypothesis Test, then we rejected something that is “True” = Type I Error If a sample falls in the F.T.R. region of our initial Hypothesis Test, but in the Rejection region of our alternate Hypothesis Test, then we accept something that is “False” = Type II Error Power of the Test (1 – β): Percent probability of correctly rejecting H 0
Multiple Regression Steps:
- Organize data into a statistical program for analysis (e.g. Excel) a. Categorical data should be organized into Indicator Variables. i. Ex. Yes and No = 1 and 0. ii. Number of Indicator Variables = Number of categories – 1
- Compare Independent Variables to the Dependent Variable using scatter plots to determine correlation a. Independent Variables must be able to contribute to the model and explain the Dependent Variable. Variables which have no correlation should be excluded
- Compare Independent Variables using scatterplots to one another to determine collinearity a. Linear relationships that exist between IVs will end of confounding the data and overfitting the model, giving it the appearance of capturing more variability than it does b. Calculate Correlation and associated P-values between the IVs and DVs to determine if they should be kept in the model IV/IV comparisons with high correlation and low p-values show collinearity, the opposite is better IV/DV comparisons with high correlation and low p-values show positive relationship, the opposite is negative
- Perform Regression Analysis (in Excel: Data tab, Data Analysis, select Regression) a. b. Multiple R: Multiple correlation coefficient c. R Square: Percentage of variation in the IV that explains the DV [(Multiple R)^2 ] d. Adjusted R: R Squared adjusted for multiple IVs in the model e. Standard Error: The average distance of the data points from the regression model (in units of the DV) f. F vs Significance F (p-value): Comparison of significance of the model. If F > Sig F, model is significant g. Coefficients: model coefficients h. Estimate regression: y = (b 0 + b 1 x 1 + b 2 x 2 + …) +/- ta/2-statistic * Standard Error i. ta/2 must be derived from the tables (or calculated) based on Degrees of Freedom and Confidence Level (95%) [ T.INV.2T (probability, deg_freedom)]
- Conduct simple linear regressions for each IV/DV pair [or excel function RSQ (known_ys, known_xs)] a. Compute Variable Inflation Factors for IVs: VIF = 1/(1 – R Square) i. VIF < 5, IVs are not very correlated ii. 5 < VIF < 10, IVs are highly correlated and may be problematic iii. VIF > 10, IVs are poorly estimated due to multicollinearity
- Use the non-redundant independent variables in the analysis to find the best fitting model. a. Can be performed through Forward Selection, Backwards Selection, or Stepwise Selection
- Use the best fitting model to make predictions about the dependent variable a. Ideal model is the simplest model with high R squared, low VIF, and low Standard Error