Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ISYE 6414 Exam Guide: Linear Regression, ANOVA, and Statistical Inference, Exams of Mathematics

This comprehensive exam guide covers key concepts in isye 6414, including simple and multiple linear regression, anova, and statistical inference methods. it provides verified questions and answers in various formats (true/false, multiple-choice, open-ended), making it a valuable resource for exam preparation. the guide delves into model assumptions, variable selection, regularization techniques, and interpretation of regression results, ensuring a thorough understanding of the subject matter.

Typology: Exams

2024/2025

Available from 05/13/2025

charleswest
charleswest 🇺🇸

4.2

(12)

1.3K documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 6414 Georgia Institute of Technology All Units Exam Guide (2025 Edition)
Actual Exam Questions with 100% Verified Answers
This comprehensive exam document covers all units of the ISYE 6414 course (Georgia
Tech), providing fully verified questions and answers for the 2025 edition. Topics
include simple and multiple linear regression, ANOVA, logistic regression, Poisson
regression, model assumptions, variable selection, regularization techniques (Ridge,
LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,
with extensive true/false, multiple-choice, and open-answer format, making it a
reliable resource for mastering exam material.
100% Verified Exam Study Guide 2025/2026.
response (dependent) variables - ansone particular variable that we are interested in
understanding or modeling (y)
predicting or explanatory (independent) variables - ansa set of other variables that might be
useful in predicting or modeling the response variable (x1, x2)
What kind of variable is a response variable and why? - ansrandom, because it varies with
changes in the predictor/s along with other random changes.
What kind of variable is a predicting variable and why? - ansfixed, because it does not
change with the response but it is fixed before the response is measured.
linear relationship - ansa simple deterministic relationship between 2 factors, x and y
what are three things that a regression analysis is used for? - ans1. Prediction of the response
variable, 2. Modeling the relationship between the response and explanatory variables, 3.
Testing hypotheses of association relationships
B0 = ? - ansintercept
B1 = ? - ansslope
for our linear model where: Y = B0 + B1 + EPSILON (E), what does the epsilon represent? -
ansdeviance of the data from the linear model (error term)
what are the 4 assumptions of linear regression? - ansLinearity/Mean Zero, Constant
Variance, Independence, Normality
Linearity/Mean zero assumption - ansMeans that the expected value (deviances) of errors is
zero. This leads to difficulties in estimating B0 and means that our model does not include a
necessary systematic component
Constant variance assumption - ansMeans that it cannot be true that the model is more
accurate for some parts of the population, and less accurate for other parts of the populations.
This can result in less accurate parameters and poorly-calibrated prediction intervals.
Assumption of Independence - ansMeans that the deviances, or in fact the response variables
ys, are independently drawn from the data-generating process. (this most often occurs in time
series data) This can result in very misleading assessments of the strength of regression.
Normality assumption - ansThis is needed if we want to do any confidence or prediction
intervals, or hypothesis test, which we usually do. If this assumption is violated, hypothesis
test and confidence and prediction intervals and be very misleading.
what are the 3 parameters we estimated in regression? - ansB0, B1, sigma squared (variance
of the one pop.)
What do we mean by model parameters in statistics? - ansModel parameters are unknown
quantities, and they stay unknown regardless how much data are observed. We estimate those
parameters given the model assumptions and the data, but through estimation, we're not
identifying the true parameters. We're just estimating approximations of those parameters.
What is the estimated sampling distribution of s^2? - anschi-square with n-1 DF
Why do we lose 1 DF for s^2? - answe replace mu with zbar
what is the relationship between s^2 and sigma^2? - ansS^2 estimates sigma^2
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download ISYE 6414 Exam Guide: Linear Regression, ANOVA, and Statistical Inference and more Exams Mathematics in PDF only on Docsity!

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

response (dependent) variables - ansone particular variable that we are interested in understanding or modeling (y) predicting or explanatory (independent) variables - ansa set of other variables that might be useful in predicting or modeling the response variable (x1, x2) What kind of variable is a response variable and why? - ansrandom, because it varies with changes in the predictor/s along with other random changes. What kind of variable is a predicting variable and why? - ansfixed, because it does not change with the response but it is fixed before the response is measured. linear relationship - ansa simple deterministic relationship between 2 factors, x and y what are three things that a regression analysis is used for? - ans1. Prediction of the response variable, 2. Modeling the relationship between the response and explanatory variables, 3. Testing hypotheses of association relationships B0 =? - ansintercept B1 =? - ansslope for our linear model where: Y = B0 + B1 + EPSILON (E), what does the epsilon represent? - ansdeviance of the data from the linear model (error term) what are the 4 assumptions of linear regression? - ansLinearity/Mean Zero, Constant Variance, Independence, Normality Linearity/Mean zero assumption - ansMeans that the expected value (deviances) of errors is zero. This leads to difficulties in estimating B0 and means that our model does not include a necessary systematic component Constant variance assumption - ansMeans that it cannot be true that the model is more accurate for some parts of the population, and less accurate for other parts of the populations. This can result in less accurate parameters and poorly-calibrated prediction intervals. Assumption of Independence - ansMeans that the deviances, or in fact the response variables ys, are independently drawn from the data-generating process. (this most often occurs in time series data) This can result in very misleading assessments of the strength of regression. Normality assumption - ansThis is needed if we want to do any confidence or prediction intervals, or hypothesis test, which we usually do. If this assumption is violated, hypothesis test and confidence and prediction intervals and be very misleading. what are the 3 parameters we estimated in regression? - ansB0, B1, sigma squared (variance of the one pop.) What do we mean by model parameters in statistics? - ansModel parameters are unknown quantities, and they stay unknown regardless how much data are observed. We estimate those parameters given the model assumptions and the data, but through estimation, we're not identifying the true parameters. We're just estimating approximations of those parameters. What is the estimated sampling distribution of s^2? - anschi-square with n-1 DF Why do we lose 1 DF for s^2? - answe replace mu with zbar what is the relationship between s^2 and sigma^2? - ansS^2 estimates sigma^

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

What is the estimated sampling distribution of sigma^2? - anschi-square with n-2 DF (~ equivalent to MSE) Why do we lose 2 DF for sigma^2? - answe replaced two parameters, B0 and B In SLR, we are interested in the behavior of which parameter? - ansB If we have a positive value for B1,.... - ansthen that's consistent with a direct relationship between the predicting variable x and the response variable y. If we have a negative value for B1,.... - ansis consistent with an inverse relationship between x and y When B1 is close to zero... - answe interpret that there is not a significant association between predicting variables, between the predicting variable x, and the response variable y. How do we interpret B1? - ansIt is the estimated expected change in the response variable associated with one unit of change in the predicting variable. How we interpret ^B0? - ansIt is the estimated expected value of the response variable, when the predicting variable equals zero. What is the sampling distribution of ^B1? - anst distribution with N-2 DF What can we use to test for statistical significance? - anst-test What would we do if the T value is large? - ansReject the null hypothesis that β1 is equal to zero. If the null hypothesis is rejected, we interpret this that β1 is statistically significant. what does 'statistical significance' mean? - ansB1 is statistically different from zero. what is the distribution of B1? - ansNormal The estimators for the regression coefficients are: A) Biased but with small variance B) Unbiased under normality assumptions but biased otherwise. C) Unbiased regardless of the distribution of the data. - ansC The assumption of normality: A) It is needed for deriving the estimators of the regression coefficients. B) It is not needed for linear regression modeling and inference. C) It is needed for the sampling distribution of the estimators of the regression coefficients and hence for inference. D) It is needed for deriving the expectation and variance of the estimators of the regression coefficients. - ansC What is 'X'? - anspredictor Where does uncertainty from estimation come from? - ansfrom estimation alone Where does uncertainty from prediction come from? - ansfrom the estimation of regression parameters and from the newness of the observation itself what is the prediction interval used for? - ansused to provide an interval estimate for a prediction of y for one member of the population with a particular value of x

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

SST =? - ansSSE + SSTR sum of square errors - ansthe sum of square differences between the observations and the individual sample means sum of square treatments - ansthe sum of the square difference between the sample means of the individual samples minus the overall mean MSE measures.. - answithin-group variability MSSTr measures... - ansbetween-group variability ANOVA measures... - ansvariability between samples to the variability within a sample. F-test measures... - ansratio of between-group variability and within-group variability Which are all the model parameters in ANOVA? A) The means of the k populations. B) The sample means of the k populations. C) The sample means of the k samples. D) None of the above. - ansD The pooled variance estimator is: A) The sample variance estimator assuming equal variances. B) The variance estimator assuming equal means and equal variances. C) The sample variance estimator assuming equal means. D) None of the above. - ansA The total sum of squares divided by N-1 is: A) The mean sum of squared errors B) The sample variance estimator assuming equal means and equal variances C) The sample variance estimator assuming equal variances. D) None of the above. - ansB The mean squared errors (MSE) measures: A) The within-treatment variability. B) The between-treatment variability. C) The sum of the within-treatment and between-treatment variability. D) None of the above. - ansA Which is correct? A) If we reject the test of equal means, we conclude that all treatment means are not equal. B) If we do not reject the test of equal means, we conclude that means are definitely all equal C) If we reject the test of equal means, we conclude that some treatment means are not equal. D) None of the above. - ansC

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

When would we use the 'Comparing pairs of Means' method? - ansAfter we reject the null hypothesis of equal means pairwise comparison - answe estimate the difference in the means (for example, a pair: meani and meanj) as a difference between their corresponding means Using the Tukey method to find the confidence interval of the means, what does having a '0' in the CI mean? - ansFor the confidence intervals that include zero, it's plausible that the difference between means is zero. In the pairwise comparison, if the confidence interval only contains positive values, then we conclude... - ansthat the difference in means in statistically positive The 3 assumptions of ANOVA with respect to the error term: - ansConstant variance, independence, normality How will we diagnose the assumptions for ANOVA? - answe are going to diagnose the assumptions on the residuals because under the error terms we do not know the means How will we assess the normality assumption for the ANOVA model? - ansthe quantile- quantile normal plot and the historgram of the residuals. If the scatter plot of the residuals (epsilon ij) for the ANOVA is NOT random: (2) - ansThe sample responses are not independent, or the variances of responses are not equal What does a cluster of residuals mean on a residual plot? - answe have correlated errors The objective of the residual analysis is: A) To evaluate departures from the model assumptions B) To evaluate whether the means are equal. C) To evaluate whether only the normality assumptions holds. D) None of the above. - ansA The objective of the pairwise comparison is: A) To find which means are equal. B) To identify the statistically significantly different means. C) To find the estimated means which are greater or lower than other. D) None of the above. - ansB ANOVA is a linear regression model where... - ansthe predicting factor is a (one) categorical variable. If our ANOVA model has an intercept, then how many dummy variables and why? - ansK- 1 because of linear dependence between the X's If our ANOVA model does not have an intercept, then how many dummy variables? - ansall k dummy variables T/F: For assessing the normality assumption of the ANOVA model, we can only use the quantile-quantile normal plot of the residuals. - ansF

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

design matrix - ansa matrix consisting of columns of predicting variables including the column of ones corresponding to the intercept: simple linear regression - anslinear regression with one quantitative predicting variable ANOVA - anslinear regression with one or more qualitative predicting variables Multiple linear regression - ansmultiple quantitative and qualitative predicting variables in MLR, the sampling distribution for sigma^2 is MSE ..... - anschi-square with n-p-1 DF marginal model (SLR) - anscaptures the association of one predicting variable to the response variable marginally, that means without consideration of other factors. conditional model (MLR) - anscaptures the association of a predictor variable to the response variable, conditional of other predicting variables in the model. We can make causality statements for... - ansexperimental designs We can make associated statements for... - ansobservational studies 3 ways Predicting Variables can be distinguished as: - ansControlling, Explanatory, Predictive Controlling factors - ansto control for bias selection in the sample. They are used as 'default' variables in order to capture more meaningful relationships. Explanatory factors - ansto explain variability in the response variable; they may be included in the model even if other "similar" variables are in the model Predictive factors - ansto best predict variability in the response regardless of their explanatory power The objective of multiple linear regression is: A) To predict future new responses B) To model the association of explanatory variables to a response variable accounting for controlling factors. C) To test hypotheses using statistical inference on the model. D) All of the above. - ansD

  1. Which is correct? A) A multiple linear regression model with p predicting variables but no intercept has p model parameters. B) The interpretation of the regression coefficients is the same whether or not interaction terms are included in the model. C) Multiple linear regression is a general model encompassing both ANOVA and simple linear regression. D) None of the above. - ansC Which is correct?

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

A) The regression coefficients can be estimated only if the predicting variables are not linearly dependent. B) The estimated regression coefficient beta hat i is interpreted as the change in the response variable associated with one unit of change in the i-th predicting variable. C) The estimated regression coefficients will be the same under marginal and conditional model; only their interpretation is not. D) Causality is the same as association in interpreting the relationship between the response and predicting variables. - ansA Which one correctly characterizes the sampling distribution of the estimated variance? A) The estimated variance of the error term has a chi-squared distribution regardless of the distribution assumption of the error terms. B) The number of degrees of freedom for the chi-squared distribution of the estimated variance is n - p - 1 for a model without an intercept. C) The sampling distribution of the mean squared error is different of that of the estimated variance. D) None of the above. - ansD What is B^ in MLR? - ansa linear combination of Y's and is normally distributed. σ^2 hat distribution is? - anschi-square, n-p-1 DF What is the sampling distribution for individual β hat? - anst-distribution with n-p-1 DF To what distribution can we derive the confidence interval from? - anst-distribution What does it mean if 0 is included in the CI? - answe conclude that Bj is NOT statistically significant What does it mean if 0 is NOT included in the CI? - answe conclude that Bj IS statistically significant What is the null and alternative hypothesis for MLR? - ansH0: the coefficients are not HA: the coefficients (at least 1) are not equal to 0 If the t-value is large... - answe reject the null hypothesis and conclude that the coefficient is statistically significant. How will the procedure change if we test whether the coefficient is equal to a constant? - ans1) We reject the null hypothesis if the t-value is larger than t alpha over 2n-p-1, when the absolute value of t-value, is larger than the critical point.

  1. OR we can look if the p-value is smaller than.
  • ans
  • ans What is the hypothesis testing procedure for overall regression and what is it testing? - ansAnalysis of Variance for multiple regression. We will use analysis of variance (ANOVA) to test the hypothesis that the regression coefficients are zero.

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

A) Have the same variance B) Have the same expectation C) Have the same variance and expectation D) None of the above - ansB Which one is correct? A) The prediction intervals need to be corrected for simultaneous inference when multiple predictions are made jointly. B) The prediction intervals are centered at the predicted value. C) The sampling distribution of the prediction of a new response is a t-distribution. D) All of the above. - ansD T/F: In a multiple linear regression model with 6 predicting variables but without intercept, there are 7 parameters to estimate. - ansTrue T/F: The only objective of multiple linear regression is prediction. - ansFalse T/F: We can make causal inference in observational studies. - ansFalse T/F: In order to make statistical inference on the regression coefficients, we need to estimate the variance of the error terms. - ansTrue T/F: We cannot estimate a multiple linear regression model if the predicting variables are linearly dependent. - ansTrue T/F: The estimated regression coefficients are unbiased estimators. - ansTrue T/F: Controlling variables used in multiple linear regression are used to control for bias in the sample. - ansTrue T/F: We interpret the coefficient corresponding to one predictor in a regression with multiple predictors as the estimated expected change in the response variable associated with one unit of change in the corresponding predicting variable. - ansFalse T/F: The error term in the multiple linear regression cannot be correlated. - ansTrue T/F: The hypothesis test for whether a subset of regression coefficients are all equal to zero is a partial F-test. - ansTrue T/F: The estimated regression coefficient corresponding to a predicting variable will likely be different in the model with only one predicting variable alone versus in a model with multiple predicting variables. - ansTrue T/F: Analysis of variance (ANOVA) is a multiple regression model. - anstrue T/F: In multiple linear regression, we study the relationship between one response variable and both predicting quantitative and qualitative variables. - ansTrue T/F: We need to assume normality of the response variable for making inference on the regression coefficients. - ansTrue T/F: We can use the normal test to test whether a regression coefficient is equal to zero. - ansFalse

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

T/F: If a predicting variable is categorical with 5 categories in a linear regression model with intercept, we will include 5 dummy variables in the model. - ansFalse T/F: Multiple linear regression captures the causation of a predicting variable to the response variable, conditional of other predicting variables in the model. - ansFalse T/F: The error term variance estimator has a (chi-squared) distribution with degrees of freedom for a multiple regression model with 10 predictors. - ansTrue T/F: The sampling distribution for estimating confidence intervals for the regression coefficients is a normal distribution. - ansFalse T/F: The estimated variance of the error terms is the sum of squared residuals divided by the sample size minus the number of predictors minus one. - ansTrue T/F: Conducting t-tests on each β parameter in a multiple regression model is the best way for testing the overall significance of the model. - ansFalse T/F: In the case of a multiple linear regression model containing 6 quantitative predicting variables and an intercept, the number of parameters to estimate is 7. - ansFalse T/F: The regression coefficient corresponding to one predictor in multiple linear regression is interpreted in terms of the estimated expected change in the response variable when there is a change of one unit in the corresponding predicting variable holding all other predictors fixed.

  • anstrue T/F: The proportion of variability in the response variable that is explained by the predicting variables is called correlation. - ansfalse T/F: Predicting values of the response variable for values of the predictors that are within the data range is known as extrapolation. - ansFalse T/F: In multiple linear regression we study the relationship between a single response variable and several predicting quantitative and/or qualitative variables. - ansTrue T/F: The sampling distribution used for estimating confidence intervals for the regression coefficients is the normal distribution. - ansFalse T/F: A partial F-test can be used to test whether a subset of regression coefficients are all equal to zero. - anstrue T/F: Prediction is the only objective of multiple linear regression. - ansfalse T/F: The equation to find the estimated variance of the error terms of a multiple linear regression model with intercept can be obtained by summing up the squared residuals and dividing that by n - p , where n is the sample size and p is the number of predictors. - ansFalse T/F: For a given predicting variable, the estimated coefficient of regression associated with it will likely be different in a model with other predicting variables or in the model with only the predicting variable alone. - ansTrue T/F: Observational studies allow us to make causal inference. - ansFalse T/F: In the case of multiple linear regression, controlling variables are used to control for sample bias. - anstrue

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

What is the rule of thumb for Cook's Distance? - ansIf Di > 4/n or Di >1 or any "large" Di should be investigated What is the F-test for? - anstest for overall regression What are the null and alternative hypotheses of the F test? - ansH0: all the regression coefficients except the intercept are 0. HA: at least one is not 0. What does it mean when we reject the H0 of the F test? - ansIf we reject the null hypothesis, we will conclude that at least one of the predicting variables has explanatory power for the variability in the response. Will R^2 always increase when we add predicting variables? - ansYes Adjusted R^2 - ansadjusted for the number of predictive variables. So it's not going to increase as we add more predictive variables Multicollinearity - ansGenerally occurs when there are high correlations between two or more predictor variables. In other words, one predictor variable can be used to predict the other. This creates redundant info, skewing the results in a regression model. How can we diagnose multicollinearity? - ansAn approach to diagnose collinearities through the computation of the Variance Inflation Factor, which you will compute for EACH predicting variable. VIF = - ans1 / 1 - R^ Multicollinearity inflates...? - ansthe standard error of the estimated coefficients How do we interpret the VIF? - ansVIF measures the proportional increase in the variance of beta hat j compared to what it would have been if the predictive variables had been completely uncorrelated In evaluating a multiple linear model: A) The F test is used to evaluate the overall regression. B) The coefficient of variation is interpreted as the percentage of variability in the response variable explained by the model. C) Residual analysis is used for goodness of fit assessment. D) All of the above. - ansD In the presence of near multicollinearity: A) The coefficient of variation decreases. B) The regression coefficients will tend to be identified as statistically significant even if they are not. C) The prediction will not be impacted. D) None of the above. - ansD When do we use transformations? A) If the linearity assumption with respect to one or more predictors does not hold, then we use transformations of the corresponding predictors to improve on this assumption.

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

B) If the normality assumption does not hold, we transform the response variable, commonly using the Box-Cox transformation. C) If the constant variance assumption does not hold, we transform the response variable. D) All of the above. - ansD Which one is correct? A) The residuals have constant variance for the multiple linear regression model. B) The residuals vs. fitted can be used to assess the assumption of independence. C) The residuals have a t-distribution if the error term is assumed to have a normal distribution. D) None of the above. - ansD T/F: In multiple linear regression, we need the linearity assumption to hold for at least one of the predicting variables - ansF T/F: Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients. - ansT T/F: The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients. - ansT T/F: When making a prediction for predicting variables on the "edge" of the space of predicting variables, then its uncertainty level is high. - ansT T/F: The prediction of the response variable and the estimation of the mean response have the same interpretation. - ansF T/F: In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors. - ansF T/F: We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple linear regression model. - ansT T/F: Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients. - ansT T/F; The estimator of the mean response is unbiased. - ansT T/F: The sampling distribution of the prediction of the response variable is a χ 2(chi-squared) distribution. - ansF T/F: Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent. - ansF T/F; A linear regression model has high predictive power if the coefficient of determination is close to 1. - ansF T/F: In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases.

  • ansF T/F: Cook's distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed. - ansT

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

C) The interpretation of the regression coefficients in logistic regression is the same as for standard linear regression assuming normality. D) None of the above. - ansB In logistic regression, A) The estimation of the regression coefficients is based on maximum likelihood estimation. B) We can derive exact (close form expression) estimates for the regression coefficients. C) The estimations of the regression coefficients is based on minimizing the sum of least squares. D) All of the above. - ansA Using the R statistical software to fit a logistic regression, A) We can use the lm() command. B) The input of the response variable is exactly the same if the binary response data are with or without replications. C) We can obtain both the estimates and the standard deviations of the estimates for the regression coefficients. D) None of the above. - ansC Using MLE, can we derive estimated coefficients/parameters in exact form? - ansNo, they are approximate estimated parameters The sampling distribution of MLEs can be approximated by a... - ansnormal distribution What can we use to test if Betaj is = 0? - ansz test (wald test) When would we reject the null hypothesis for a z test? - ansWe reject the null hypothesis that the regression coefficient is 0 if the z value is larger in absolute value than the z critical point. Or the 1- alpha over 2 normal quanta. We interpret this that the coefficient is statistically significant. Does the statistical inference for logistic regression rely on a small or large sample size? - ansLarge, if it was a small then the statistical inference is not reliable Deviance - ansthe test statistic is the difference of the log likelihood under the reduced model and the log likelihood under the full model for testing the subset of coefficients Under testing a subset of coefficients, what is the distribution and degrees of freedom for the deviance? - ansFor large sample size data, the distribution of this test statistic, assuming the null hypothesis is true, is a chi square distribution. With Q degrees of freedom where Q is the number of regression coefficients discarded from the full model to get the reduced model or the number of Z predicting variables. What is the purpose of testing a subset of coefficients? - ansIt simply compares two models and decides whether the larger model is statistically significantly better than the reduced model. Is testing a subset of coefficients a GOF test? - ansNo When we are testing for overall regression for a Logistic model, what is the H0 and HA? - ansH0: all regression coefficients except intercept are 0

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

HA: at least one is not 0. If we reject the null hypothesis for overall regression, what does that mean - ansMeaning that the overall regression has statistically significant power in explaining the response variable. Null-deviance - ansTest statistic for Overall Regression, shows how well the response variable is predicted by a model that includes only the intercept. What is the distribution and DOF of overall regression test statistic? - anschi-squared with p degrees of freedom where p is the number of predicting variables. When do we reject the null hypothesis for the overall regression test in regards to the p value?

  • answhen the P-value is small, indicating that the overall regression has explanatory power. Logistic regression is different from standard linear regression in that: A) The sampling distribution of the regression coefficient is approximate. B) A large sample data is requirded for making accurate statistical inferences. C) A normal sampling distribution is used instead of a t-distribution for statistical inference. D) All of the above. - ansD In logistic regression, A) The hypothesis test for subsets of coefficients is a goodness of fit test. B) The hypothesis test for subsets of coefficients is approximate; it relies on large sample size. C) We can use the partial F test for testing whether a subset of coefficients are all zero. D) None of the above. - ansB In logistic regression, how do we define residuals for evaluating g-o-f? - ansbinary data with replications. What is the distribution of binary data WITHOUT replications? - ansa binomial distribution with one trial where ni = 1 What is the distribution of binary data WITH replications? - ansbinomial distribution with more than one trial or ni greater than 1 Pearson Residuals - ansas the standardized difference between the ith observed response and estimated expected response, which is ni times the probability of success. Deviance residuals - ansthe signed square root of the log-likelihood evaluated at the saturated model when we assume that the estimate expected response is the observed response versus the fitted model. distribution of pearson residuals? - ansFrom the binomial approximation with a normal distribution using the central limit theorem distribution of deviance residuals? - ansFrom the properties of the likelihood function, standard normal distribution if the model assumptions hold. That is, if the model is a good fit. To evaluate whether the model is a good fit or whether the assumptions hold, what should we use? - ansuse the Pearson or Deviance residuals to evaluate whether they are normally distributed and conclude it is a good fit via hypotheses testing.

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

Classification - ansThe prediction of binary responses. Classification is nothing more than a prediction of the class of your response, y* (y star), given the predictor variable, x* (x star). If the predicted probability is large, then classify y* as a success. classification error rate - ansthe probability that the new response is equal to the classifier. How do we compute classification error? (2 ways) - ans1. Training error 2. Cross validation training error - anssimply use the data to fit the model then compute the classifier from each response in the data and take the proportion of the responses we misclassified Why can we not use the training error rate as an estimate of the true error classification error rate? - ansbecause it is biased downward. And the bias comes from the fact that we use the data twice. First, we used it for fitting the model and the second time to estimate the classification error rate. How else can we estimate the classification error without the need of observing new data? - ansCross validation Cross validation - ansSplit the data into two parts, first part called the training data and testing/validation data. The training data will be used to fit the model and thus get the estimated regression coefficients. The testing or validation data will be used to predict or classify the responses for this portion of the data, then compare to the observed response to estimate the classification error, one can repeat the process several times. 3 options for splitting the data - ans1. Random sampling

  1. K-fold cross-validation
  2. Leave-one-out cross-validation In logistic regression: A) We can perform residual analysis for response data with or without replications. B) Residuals are derived as the fitted values minus the observed responses. C) The sampling distribution of the residual is approximately normal distribution if the model is a good fit. D) All of the above. - ansC Which one is correct? A) We can evaluate the goodness of fit a model using the testing procedure of the overall regression. B) In applying the deviance test for goodness of fit in logistic regression, we seek large p- values, that is, not reject the null hypothesis. C) There is no error term in logistic regression and thus we cannot perform a goodness of fit assessment. D) None of the above. - ansB Which is correct? A) Prediction translates into classification of a future binary response in logistic regression. B) In order to perform classification in logistic regression, we need to first define a classifier for the classification error rate.

Actual Exam Questions with 100% Verified Answers

This comprehensive exam document covers all units of the ISYE 6414 course (Georgia

Tech), providing fully verified questions and answers for the 2025 edition. Topics

include simple and multiple linear regression, ANOVA, logistic regression, Poisson

regression, model assumptions, variable selection, regularization techniques (Ridge,

LASSO, Elastic Net), and statistical inference methods. It is structured for clarity,

with extensive true/false, multiple-choice, and open-answer format, making it a

reliable resource for mastering exam material.

100% Verified Exam Study Guide 2025/2026.

C) One common approach to estimate the classification error is cross-validation. D) All of the above. - ansD Comparing cross-validation methods, A) The random sampling approach is more computational efficient that leave-one-out cross validation. B) In K-fold cross-validation, the larger K is, the higher the variability in the estimation of the classification error is. C) Leave-one-out cross validation is a particular case of the random sampling cross- validation. D) None of the above. - ansB Generalized Linear Model - ansTo generalize the standard regression model to response data that do not have a normal distribution, this generalizes the linear model to response data coming from other distributions. In GLM or generalized linear models, the response Y is assumed to have what kind of distribution? - ansdistribution from the exponential family of distributions Poisson regression - anscommonly used for modeling count or rate data. What is the difference between using Poisson regression versus the standard regression with, say, with the log transformation of the response variable? - ansIn standard regression, the variance is assumed constant. In Poisson, the variance of the response is assumed to be equal to the expectation, since for the Poisson distribution, the variance is equal to the expectation. Thus the variance is not constant. Rate parameter - ansthe expectation of the response Yi, given the predicting variables, which is modeled as the exponential of the linear combination of the predicting variables since the link function between expectation and the predicting variables is the log function log rate - ansthe log function of the expected value of the response what is the coefficient interpretation of a GLM (poisson)? - anslog ratio of the rate with an increase with one unit in the predicting variable. We do not interpret beta with respect to the response variable for a Poisson model but with....

  • ansrespect to the ratio of the rate. we estimate the Poisson model parameters using... - ansMLE Poisson regression can be used: A) To model count data. B) To model rate response data. C) To model response data with a Poisson distribution. D) All of the above. - ansD Which one is correct? A) The standard normal regression, the logistic regression and the Poisson regression are all falling under the generalized linear model framework.