Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Introduction to Statistical Learning ISLR Chapter 6 R Solution Manual, Exercises of Statistics

Southern Methodist University (SMU)Statistics

Linear Model Selection and Regularization - Exercise R code as soutution manual ISLR Introduction to Statistical Learning James, Witten, Hastie, Tibshirani

Typology: Exercises

2020/2021

Uploaded on 05/26/2021

ekadant 🇺🇸

4.3

(31)

268 documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

---

title: "Chapter 6: Linear Model Selection and Regularization"

author: "Solutions to Exercises"

date: "January 23, 2016"

output:

html_document:

keep_md: no

---

***

## CONCEPTUAL

***

>EXERCISE 1:

__Part a)__

Best subset will have the smallest train RSS because the models will optimize on the training RSS and

best subset will try every model that forward and backward selection will try.

__Part b)__

The best test RSS model could be any of the three. Best subset could easily overfit if the data has large

$p$ predictors relative to $n$ observations. Forward and backward selection might not converge on the

same model but try the same number of models and hard to say which selection process would be

better.

__Part c)__

Partial preview of the text

Download Introduction to Statistical Learning ISLR Chapter 6 R Solution Manual and more Exercises Statistics in PDF only on Docsity!

title: "Chapter 6: Linear Model Selection and Regularization" author: "Solutions to Exercises" date: "January 23, 2016" output: html_document: keep_md: no

CONCEPTUAL

EXERCISE 1: Part a) Best subset will have the smallest train RSS because the models will optimize on the training RSS and best subset will try every model that forward and backward selection will try. Part b) The best test RSS model could be any of the three. Best subset could easily overfit if the data has large $p$ predictors relative to $n$ observations. Forward and backward selection might not converge on the same model but try the same number of models and hard to say which selection process would be better. Part c)

i. TRUE
ii. TRUE
iii. FALSE
iv. FALSE
v. FALSE

EXERCISE 2: Part a) iii. is TRUE - lasso puts a budget constraint on least squares (less flexible) Part b) iii. is TRUE - ridge also puts a budget constraint on least squares (less flexible) Part c) ii. is TRUE - a non-linear model would be more flexible and have higher variance, less bias

EXERCISE 3: Part a)

Part a) iii. is TRUE - training error increases steadily Part b) ii. is TRUE - test error will decrease initially and then increase Part c) iv. is TRUE - variance always decrease with more constraints Part d) iii. is TRUE - bias always increase with less model flexibility Part e) v. is TRUE - the irreducible error is a constant value, not related to model selection

EXERCISE 5: Part a)

Ridge: minimize $(y_1 - \hat\beta_1x_{11} - \hat\beta_2x_{12})^2 + (y_2 - \hat\beta_1x_{21} - \hat
beta_2x_{22})^2 + \lambda (\hat\beta_1^2 + \hat\beta_2^2)$ Part b) Step 1: Expanding the equation from Part a: $$(y_1 - \hat\beta_1 x_{11} - \hat\beta_2 x_{12})^2 + (y_2 - \hat\beta_1 x_{21} - \hat\beta_2 x_{22})^

\lambda (\hat\beta_1^2 + \hat\beta_2^2) \ = (y_1^2 + \hat\beta_1^2 x_{11}^2 + \hat\beta_2^2 x_{12}^2 - 2 \hat\beta_1 x_{11} y_1 - 2 \hat\beta_ x_{12} y_1 + 2 \hat\beta_1 \hat\beta_2 x_{11} x_{12}) \
(y_2^2 + \hat\beta_1^2 x_{21}^2 + \hat\beta_2^2 x_{22}^2 - 2 \hat\beta_1 x_{21} y_2 - 2 \hat\beta_ x_{22} y_2 + 2 \hat\beta_1 \hat\beta_2 x_{21} x_{22}) \
\lambda \hat\beta_1^2 + \lambda \hat\beta_2^2$$ Step 2: Taking the partial deritive to $\hat\beta_1$ and setting equation to 0 to minimize: $$\frac{\partial }{\partial \hat\beta_1}: (2\hat\beta_1x_{11}^2-2x_{11}y_1+2\hat\beta_2x_{11}x_{12}) + (2\hat\beta_1x_{21}^2-2x_{21}y_2+2\hat\beta_2x_{21}x_{22}) + 2\lambda\hat\beta_1 = 0$$ Step 3: Setting $x_{11}=x_{12}=x_1$ and $x_{21}=x_{22}=x_2$ and dividing both sides of the equation by 2: $$(\hat\beta_1x_1^2-x_1y_1+\hat\beta_2x_1^2) + (\hat\beta_1x_2^2-x_2y_2+\hat\beta_2x_2^2) +
lambda\hat\beta_1 = 0$$ $$\hat\beta_1 (x_1^2+x_2^2) + \hat\beta_2 (x_1^2+x_2^2) + \lambda\hat\beta_1 = x_1y_1 + x_2y_2$$ Step 4: Add $2\hat\beta_1x_1x_2$ and $2\hat\beta_2x_1x_2$ to both sides of the equation: $$\hat\beta_1 (x_1^2 + x_2^2 + 2x_1x_2) + \hat\beta_2 (x_1^2 + x_2^2 + 2x_1x_2) + \lambda\hat
beta_1 = x_1y_1 + x_2y_2 + 2\hat\beta_1x_1x_2 + 2\hat\beta_2x_1x_2 \

$$\lambda\frac{|\beta_1|}{\beta_1} = \lambda\frac{|\beta_2|}{\beta_2}$$ So it seems that the lasso just requires that $\beta_1$ and $\beta_2$ are both positive or both negative (ignoring possibility of 0...)

EXERCISE 6: Part a)

betas <- seq(-10,10,0.1) eq.ridge <- function(beta, y=7, lambda=10) (y-beta)^2 + lambda*beta^ plot(betas, eq.ridge(betas), xlab="beta", main="Ridge Regression Optimization", pch=1) points(5/(1+10), eq.ridge(7/(1+10)), pch=16, col="red", cex=2)

For $y=7$ and $\lambda=10$, $\hat\beta=\frac{7}{1+10}$ minimizes the ridge regression equation Part b)

betas <- seq(-10,10,0.1) eq.lasso <- function(beta, y=7, lambda=10) (y-beta)^2 + lambda*abs(beta) plot(betas, eq.lasso(betas), xlab="beta", main="Lasso Regression Optimization", pch=1) points(7-10/2, eq.lasso(7-10/2), pch=16, col="red", cex=2) ## ``` For $y=7$ and $\lambda=10$, $\hat\beta=7-\frac{10}{2}$ minimizes the ridge regression equation *** <a id="ex07"></a> >EXERCISE 7: __Part a)__ [*... will come back to this. maybe.*] __Part b)__ [*... will come back to this. maybe.*] __Part c)__ [*... will come back to this. maybe.*] __Part d)__ [*... will come back to this. maybe.*] __Part e)__ [*... will come back to this. maybe.*] min.cp <- which.min(reg.summary$cp) plot(reg.summary$cp, xlab="Number of Poly(X)", ylab="Best Subset Cp", type="l") points(min.cp, reg.summary$cp[min.cp], col="red", pch=4, lwd=5) min.bic <- which.min(reg.summary$bic) plot(reg.summary$bic, xlab="Number of Poly(X)", ylab="Best Subset BIC", type="l") points(min.bic, reg.summary$bic[min.bic], col="red", pch=4, lwd=5) min.adjr2 <- which.max(reg.summary$adjr2) plot(reg.summary$adjr2, xlab="Number of Poly(X)", ylab="Best Subset Adjusted R^2", type="l") points(min.adjr2, reg.summary$adjr2[min.adjr2], col="red", pch=4, lwd=5) coef(regfit.full, min.cp) coef(regfit.full, min.bic) coef(regfit.full, min.adjr2)

Part d)

# forward selection regfit.fwd <- regsubsets(Y~poly(X,10,raw=T), data=data.frame(Y,X), nvmax=10) (fwd.summary <- summary(regfit.fwd)) # backward selection regfit.bwd <- regsubsets(Y~poly(X,10,raw=T), data=data.frame(Y,X), nvmax=10) (bwd.summary <- summary(regfit.bwd)) par(mfrow=c(3,2)) min.cp <- which.min(fwd.summary$cp) plot(fwd.summary$cp, xlab="Number of Poly(X)", ylab="Forward Selection Cp", type="l") points(min.cp, fwd.summary$cp[min.cp], col="red", pch=4, lwd=5) min.cp <- which.min(bwd.summary$cp) plot(bwd.summary$cp, xlab="Number of Poly(X)", ylab="Backward Selection Cp", type="l") points(min.cp, bwd.summary$cp[min.cp], col="red", pch=4, lwd=5) min.bic <- which.min(fwd.summary$bic) plot(fwd.summary$bic, xlab="Number of Poly(X)", ylab="Forward Selection BIC", type="l") points(min.bic, fwd.summary$bic[min.bic], col="red", pch=4, lwd=5) min.bic <- which.min(bwd.summary$bic) plot(bwd.summary$bic, xlab="Number of Poly(X)", ylab="Backward Selection BIC", type="l") points(min.bic, bwd.summary$bic[min.bic], col="red", pch=4, lwd=5) min.adjr2 <- which.max(fwd.summary$adjr2) plot(fwd.summary$adjr2, xlab="Number of Poly(X)", ylab="Forward Selection Adjusted R^2", type="l") points(min.adjr2, fwd.summary$adjr2[min.adjr2], col="red", pch=4, lwd=5) min.adjr2 <- which.max(bwd.summary$adjr2) plot(bwd.summary$adjr2, xlab="Number of Poly(X)", ylab="Backward Selection Adjusted R^2", type="l") points(min.adjr2, bwd.summary$adjr2[min.adjr2], col="red", pch=4, lwd=5) # coefficients of selected models coef(regfit.fwd, which.min(fwd.summary$cp)) coef(regfit.bwd, which.min(bwd.summary$cp)) coef(regfit.fwd, which.min(fwd.summary$bic)) coef(regfit.bwd, which.min(bwd.summary$bic)) coef(regfit.fwd, which.max(fwd.summary$adjr2)) coef(regfit.bwd, which.max(bwd.summary$adjr2))

Best subset, foward selection and backward selection all resulted in the same best models Part e)

require(glmnet) xmat <- model.matrix(Y~poly(X,10,raw=T))[,-1] # lasso regression xmat <- model.matrix(Y2~poly(X,10,raw=T))[,-1] lasso.mod <- cv.glmnet(xmat, Y2, alpha=1) (lambda <- lasso.mod$lambda.min) par(mfrow=c(1,1)) plot(lasso.mod) predict(lasso.mod, s=lambda, type="coefficients")

Lasso selects the correct model but best subset diagnostics indicate using 1 to 4 predictors

EXERCISE 9: Part a)

require(ISLR) data(College) set.seed(1) trainid <- sample(1:nrow(College), nrow(College)/2) train <- College[trainid,] test <- College[-trainid,] str(College)

Part b)

fit.lm <- lm(Apps~., data=train) pred.lm <- predict(fit.lm, test) (err.lm <- mean((test$Apps - pred.lm)^2)) # test error

Part c)

require(glmnet) xmat.train <- model.matrix(Apps~., data=train)[,-1] xmat.test <- model.matrix(Apps~., data=test)[,-1] fit.ridge <- cv.glmnet(xmat.train, train$Apps, alpha=0) (lambda <- fit.ridge$lambda.min) # optimal lambda pred.ridge <- predict(fit.ridge, s=lambda, newx=xmat.test) (err.ridge <- mean((test$Apps - pred.ridge)^2)) # test error

Part d)

require(glmnet) xmat.train <- model.matrix(Apps~., data=train)[,-1] xmat.test <- model.matrix(Apps~., data=test)[,-1] fit.lasso <- cv.glmnet(xmat.train, train$Apps, alpha=1) (lambda <- fit.lasso$lambda.min) # optimal lambda pred.lasso <- predict(fit.lasso, s=lambda, newx=xmat.test) (err.lasso <- mean((test$Apps - pred.lasso)^2)) # test error __Part g)__ ```{r} err.all <- c(err.lm, err.ridge, err.lasso, err.pcr, err.pls) names(err.all) <- c("lm", "ridge", "lasso", "pcr", "pls") barplot(err.all )

The test errors aren't much different. The ridge and lasso seem to perform slightly better while the PCR/ PLS don't show any improvement from the full linear regression model.

plot(test$Apps, pred.lm)

EXERCISE 10: Part a)

set.seed(1) eps <- rnorm(1000) xmat <- matrix(rnorm(1000*20), ncol=20) betas <- sample(-5:5, 20, replace=TRUE) betas[c(3,6,7,10,13,17)] <- 0 betas y <- xmat %*% betas + eps

Part b)

set.seed(1) trainid <- sample(1:1000, 100, replace=FALSE) xmat.train <- xmat[trainid,] xmat.test <- xmat[-trainid,] y.train <- y[trainid,] y.test <- y[-trainid,] train <- data.frame(y=y.train, xmat.train) test <- data.frame(y=y.test, xmat.test)

Part c)

require(leaps) # predict function from chapter 6 labs predict.regsubsets <- function(object, newdata, id, ...){ form <- as.formula(object$call[[2]]) mat <- model.matrix(form, newdata) coefi <- coef(object, id=id) xvars <- names(coefi) mat[,xvars]%*%coefi } regfit.full <- regsubsets(y~., data=train, nvmax=20) (coef.best <- coef(regfit.full, id=which.min(err.full))) betas[betas != 0] names(betas) <- paste0("X", 1:20) merge(data.frame(beta=names(betas),betas), data.frame(beta=names(coef.best),coef.best), all.x=T, sort=F)

The best subset model selected all the correct predictors Part g)

err.coef <- rep(NA, 20) for(i in 1:20) { coef.i <- coef(regfit.full, id=i) df.err <- merge(data.frame(beta=names(betas),betas), data.frame(beta=names(coef.i),coef.i), all.x=T) df.err[is.na(df.err[,3]),3] <- 0 err.coef[i] <- sqrt(sum((df.err[,2] - df.err[,3])^2)) } plot(1:20, err.coef, type="b", main="Coefficient Error", xlab="Number of Predictors") points(which.min(err.coef), err.coef[which.min(err.coef)], col="red", pch=16)

The coefficient error plot shows a very similar plot to the test error plot

>EXERCISE 11:

Part a)

require(leaps) # forward and backward selection require(glmnet) # ridge and lasso require(MASS) # Boston data set data(Boston) # split data into training and test sets set.seed(1) trainid <- sample(1:nrow(Boston), nrow(Boston)/2) train <- Boston[trainid,] test <- Boston[-trainid,] xmat.train <- model.matrix(crim~., data=train)[,-1] xmat.test <- model.matrix(crim~., data=test)[,-1] str(Boston) # ridge regression model fit.ridge <- cv.glmnet(xmat.train, train$crim, alpha=0) (lambda <- fit.ridge$lambda.min) # optimal lambda pred.ridge <- predict(fit.ridge, s=lambda, newx=xmat.test) (err.ridge <- mean((test$crim - pred.ridge)^2)) # test error predict(fit.ridge, s=lambda, type="coefficients") # lasso regression model fit.lasso <- cv.glmnet(xmat.train, train$crim, alpha=1) (lambda <- fit.lasso$lambda.min) # optimal lambda pred.lasso <- predict(fit.lasso, s=lambda, newx=xmat.test) (err.lasso <- mean((test$crim - pred.lasso)^2)) # test error predict(fit.lasso, s=lambda, type="coefficients") # predict function from chapter 6 labs

Introduction to Statistical Learning ISLR Chapter 6 R Solution Manual, Exercises of Statistics

Related documents

Partial preview of the text

Download Introduction to Statistical Learning ISLR Chapter 6 R Solution Manual and more Exercises Statistics in PDF only on Docsity!

CONCEPTUAL

>EXERCISE 11: