












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Moving Beyond Linearity - Exercise R code as soutution manual ISLR Introduction to Statistical Learning James, Witten, Hastie, Tibshirani
Typology: Exercises
1 / 20
This page cannot be seen from the preview
Don't miss anything!
title: "Chapter 7: Moving Beyond Linearity" author: "Solutions to Exercises" date: "February 4, 2016" output: html_document: keep_md: no
EXERCISE 1: Part a) Part b) Part c) Part d)
Part e)
EXERCISE 2: Part a) When $\lambda=\infty$, first term does not matter. $g^{(0)}=g=0$ means $\hat g$ must be 0 Part b) When $\lambda=\infty$, first term does not matter. $g^{(1)}=g'=0$ means $\hat g$ must be constant (horizontal line). Part c) When $\lambda=\infty$, first term does not matter. $g^{(2)}=g''=0$ means $\hat g$ must be a straight line like $3x+2$. Part d) When $\lambda=\infty$, first term does not matter. $g^{(3)}=g'''=0$ means $\hat g$ must be a smooth quadratic curve like $x^2$.
require(ggplot2) x.1 <- seq(-6, 0, 0.1) # [-6,0) x.2 <- seq(0, 1, 0.1) # [0,1) x.3 <- seq(1, 2, 0.1) # [1,2] x.4 <- seq(2, 3, 0.1) # (2,3) x.5 <- seq(3, 4, 0.1) # [3,4] x.6 <- seq(4, 5, 0.1) # (4,5] x.7 <- seq(5, 6, 0.1) # (5,6) y.1 <- rep(1, length(x.1)) y.2 <- rep(2, length(x.2)) y.3 <- 3 - x. y.4 <- rep(1, length(x.4)) y.5 <- 3*x.5 - 8 y.6 <- rep(4, length(x.6)) y.7 <- rep(1, length(x.7)) df <- data.frame(X = c(x.1,x.2,x.3,x.4,x.5,x.6,x.7), Y = c(y.1,y.2,y.3,y.4,y.5,y.6,y.7)) p <- ggplot(df, aes(x=X,y=Y)) + geom_line(size=1.5) rect <- data.frame(xmin=-2, xmax=2, ymin=-Inf, ymax=Inf) p + geom_rect(data=rect, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax), fill="grey20", alpha=0.4, inherit.aes = FALSE)
EXERCISE 5: Part a) Because $g^{(3)}$ is more stringent on its smoothness requirements then $g^{(4)}$, we'd expect $\hat g_2$ to be more flexible and be able to have a better fit to the training data and thus a smaller training RSS. Part b) Hard to say. Depends on true form of $y$. If $\hat g_2$ overfits the data because of its increased flexibility, then $\hat g_1$ will likely have a better test RSS. Part c) When $\lambda=0$, only the first term matters, which is the same for both $\hat g_1$ and $\hat g_2$. The two equations become the same and they would have the same training and test RSS.
anova(fit.01,fit.02,fit.03,fit.04,fit.05,fit.06,fit.07,fit.08,fit.09,fit.10)
agelims <- range(Wage$age) age.grid <- seq(agelims[1], agelims[2]) preds <- predict(fit.04, newdata=list(age=age.grid), se=TRUE) se.bands <- preds$fit + cbind(2preds$se.fit, -2preds$se.fit) par(mfrow=c(1,1), mar=c(4.5,4.5,1,1), oma=c(0,0,4,0)) plot(Wage$age, Wage$wage, xlim=agelims, cex=0.5, col="darkgrey") title("Degree 4 Polynomial Fit", outer=TRUE) lines(age.grid, preds$fit, lwd=2, col="blue") matlines(age.grid, se.bands, lwd=1, col="blue", lty=3)
__Part b)__ ```{r, warning=FALSE, message=FALSE} set.seed(1) # cross-validation cv.error <- rep(0,9) for (i in 2:10) { Wage$age.cut <- cut(Wage$age,i) glm.fit <- glm(wage~age.cut, data=Wage) cv.error[i-1] <- cv.glm(Wage, glm.fit, K=10)$delta[1] # [1]:std, [2]:bias-corrected } cv.error plot(2:10, cv.error, type="b") # 7 or 8 cuts look optimal # going with 8 cuts cut.fit <- glm(wage~cut(age,8), data=Wage) preds <- predict(cut.fit, newdata=list(age=age.grid), se=TRUE) se.bands <- preds$fit + cbind(2*preds$se.fit, -2*preds$se.fit) plot(Wage$age, Wage$wage, xlim=agelims, cex=0.5, col="darkgrey") title("Fit with 8 Age Bands") lines(age.grid, preds$fit, lwd=2, col="blue") matlines(age.grid, se.bands, lwd=1, col="blue", lty=3)
EXERCISE 7:
Both marital status and job class are categorical variables. It seems that on a univariate basis, wages for jobclass
=Information
are higher than jobclass
=Industrial
. For marital status, married seems to have the highest wages, though this is probably confounded by age.
EXERCISE 9: Part a)
Part b)
Part c)
require(boot) set.seed(1) cv.error <- rep(0,10) for (i in 1:10) { glm.fit <- glm(nox~poly(dis,i), data=Boston) for (i in 4:10) { fit.sp <- lm(nox~bs(dis, df=i), data=Boston) rss.error[i-3] <- sum(fit.sp$residuals^2) } rss.error plot(4:10, rss.error, type="b") # RSS decreases on train set w more flexible fit
Part f)
EXERCISE 10: Part a)
Part d)
summary(gam.fit)
Strong evidence of non-linear effects for Expend
, some evidence for Room.Board
, Terminal
and Grad.Rate
, and no evidence for perc.alumni
.
EXERCISE 11: Part a)
set.seed(1) X1 <- rnorm(100) X2 <- rnorm(100) beta_0 <- -3. beta_1 <- 0. beta_2 <- 4. eps <- rnorm(100, sd = 1) Y <- beta_0 + beta_1*X1 + beta_2*X2 + eps par(mfrow=c(1,1)) plot(Y)
Part b)
# initialize beta hat 1 bhat_1 <- 1
Part c)
a <- Y - bhat_1*X (bhat_2 <- lm(a~X2)$coef[2])
Part d)
a <- Y - bhat_2*X (bhat_1 <- lm(a~X1)$coef[2])
lines(bhat_1[-1], col="green", lwd=2) lines(bhat_2, col="blue", lwd=2) abline(h=coef(fit.lm)[1], lty="dashed", lwd=3, col="brown") abline(h=coef(fit.lm)[2], lty="dashed", lwd=3, col="darkgreen") abline(h=coef(fit.lm)[3], lty="dashed", lwd=3, col="darkblue") legend(x=600,y=9.7, c("bhat_0", "bhat_1", "bhat_2", "multiple regression"), lty = c(1,1,1,2), col = c("red","green","blue","gray"))
__Part g)__ ```{r} head(mydf)
One iteration seemed to be enough to get a decent fit. After iteration 2, the beta estimates already converged.
EXERCISE 12:
set.seed(1) # create toy example with 100 predictors p <- 100 # number of true predictors n <- 1000 # number of observations betas <- rnorm(p+1)*5 # extra 1 for beta_ X <- matrix(rnorm(n*p), ncol=p, nrow=n) eps <- rnorm(n, sd=0.5) Y <- betas[1] + (X %*% betas[-1]) + eps # betas will repeat n times par(mfrow=c(1,1)) plot(Y) # find coef estimates with multiple regression fit.lm <- lm(Y~X) bhats.lm <- coef(fit.lm) # run backfitting with 100 iterations bhats <- matrix(0, ncol=p, nrow=100) mse.error <- rep(0, 100) for (i in 1:100) { for (k in 1:p) { a = Y - (X[,-k] %*% bhats[i,-k]) bhats[i:100,k] = lm(a ~ X[,k])$coef[2] } mse.error[i] <- mean((Y - (X %*% bhats[i,]))^2) } plot(1:100, mse.error) plot(1:5, mse.error[1:5], type="b") # second iteration results were very close to multiple regression