Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ISYE 6402 Homework 4 Solution Answered 100% Score; 2025/2026., Exams of Network Analysis

ISYE 6402 Homework 4 Solution Answered 100% Score; 2025/2026.

Typology: Exams

2024/2025

Available from 03/01/2025

TopScorer100
TopScorer100 🇺🇸

4.8

(6)

91 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ISYE 6402 Homework 4
Background
For this data analysis, you will analyze the daily and weekly domestic passenger count arriving in Hawaii airports. File DailyDomestic.csv contains
the daily number of passengers between May 2019 and February 2023 File WeeklyDomestic.csv contains the weekly number of passengers for
the same time period. Here we will use different ways of fitting the ARIMA model while dealing with trend and seasonality.
library(lubridate)
library(mgcv)
library(tseries)
library(car)
Instructions on reading the data
To read the data in R, save the file in your working directory (make sure you have changed the directory if different from the R working directory)
and read the data using the R function read.csv()
daily <- read.csv("DailyDomestic.csv", head = TRUE)
daily$date <- as.Date(daily$date)
weekly <- read.csv("WeeklyDomestic.csv", head = TRUE)
weekly$week <- as.Date(weekly$week)
Question 1. Trend and seasonality estimation
1a. Plot the daily and weekly domestic passenger count separately. Do you see a strong trend and seasonality?
plot(daily$date, daily$domestic, type = "l", xlab = "Time", ylab = "Daily Passenger Count")
plot(weekly$week, weekly$domestic, type = "l", xlab = "Time", ylab = "Weekly Passenger Count")
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download ISYE 6402 Homework 4 Solution Answered 100% Score; 2025/2026. and more Exams Network Analysis in PDF only on Docsity!

ISYE 6402 Homework 4

Background

For this data analysis, you will analyze the daily and weekly domestic passenger count arriving in Hawaii airports. File DailyDomestic.csv contains the daily number of passengers between May 2019 and February 2023 File WeeklyDomestic.csv contains the weekly number of passengers for the same time period. Here we will use different ways of fitting the ARIMA model while dealing with trend and seasonality.

library (lubridate) library (mgcv) library (tseries) library (car)

Instructions on reading the data

To read the data in R , save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the R function read.csv()

daily <- read.csv("DailyDomestic.csv", head = TRUE) daily$date <- as.Date(daily$date) weekly <- read.csv("WeeklyDomestic.csv", head = TRUE) weekly$week <- as.Date(weekly$week)

Question 1. Trend and seasonality estimation

1a. Plot the daily and weekly domestic passenger count separately. Do you see a strong trend and seasonality?

plot(daily$date, daily$domestic, type = "l", xlab = "Time", ylab = "Daily Passenger Count")

plot(weekly$week, weekly$domestic, type = "l", xlab = "Time", ylab = "Weekly Passenger Count")

Response

From both plots there is a strong seasonality on the time of year. The daily passenger count also shows a strong seasonality on the day of week. There might be a trend that the number of passenger is increasing through the years but it is not obvious.

1b. (Trend and seasonality) Fit the weekly domestic passenger count with a non-parametric trend using splines and monthly seasonality using ANOVA. Is the seasonality significant? Plot the fitted values together with the original time series. Plot the residuals and the ACF of the residuals. Comment on how the model fits and on the appropriateness of the stationarity assumption of the residuals.

time.pts = c( 1 :length(weekly$week)) month = as.factor(format(weekly$week,"%b")) model1.weekly = gam(weekly$domestic ~ s(time.pts) + month) summary(model1.weekly)

Family: gaussian

Link function: identity

Formula:

weekly$domestic ~ s(time.pts) + month

Parametric coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 123879 3502 35.375 < 2e-16 ***

monthAug 22233 4515 4.924 1.88e-06 ***

monthDec -1947 4581 -0.425 0.

monthFeb 8991 4663 1.928 0..

monthJan -7270 4498 -1.616 0.

monthJul 2972 4572 0.650 0.

monthJun 1287 4605 0.280 0.

monthMar 1968 4795 0.411 0.

monthMay 7560 4477 1.689 0..

monthNov -12315 4577 -2.691 0.00779 **

monthOct 13686 4528 3.023 0.00286 **

monthSep 25287 4582 5.518 1.15e-07 ***

---

Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:

edf Ref.df F p-value

s(time.pts) 3.899 4.737 6.721 2.16e-05 ***

---

Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) = 0.468 Deviance explained = 50.8%

GCV = 1.5551e+08 Scale est. = 1.4315e+08 n = 200

plot(weekly$week, weekly$domestic, col = "black", type = "l", lwd = 2 , xlab = "Time", ylab = "Weekly Passenger Count") lines(weekly$week, fitted(model1.weekly),col = "blue",lwd = 2 ) legend('topleft', legend = c("Passenger Count", "Fitted Value"), lwd = 2 , col = c("black", "blue"))

1c. (Trend and seasonality) This time fit the daily domestic passenger count with a non-parametric trend using splines, monthly and day-of-the- week seasonality using ANOVA. Plot the fitted values together with the original time series. Are the seasonal effects significant? Plot the residuals and the ACF of the residuals. Comment on how the model fits and on the appropriateness of the stationarity assumption of the residuals.

time.pts = c( 1 :length(daily$date)) month = as.factor(format(daily$date,"%b")) weekday = as.factor(weekdays(daily$date)) model1.daily = gam(daily$domestic ~ s(time.pts) + month + weekday) summary(model1.daily)

Family: gaussian

Link function: identity

Formula:

daily$domestic ~ s(time.pts) + month + weekday

Parametric coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 19524.5 314.7 62.037 < 2e-16 ***

monthAug 3400.6 356.2 9.548 < 2e-16 ***

monthDec -714.5 357.5 -1.999 0.045832 *

monthFeb 1292.3 365.1 3.540 0.000414 ***

monthJan -1255.3 357.5 -3.511 0.000461 ***

monthJul 242.5 355.1 0.683 0.

monthJun -319.4 357.3 -0.894 0.

monthMar -488.1 370.9 -1.316 0.

monthMay 1141.4 356.2 3.204 0.001384 **

monthNov -1921.7 360.2 -5.336 1.11e-07 ***

monthOct 899.0 357.8 2.513 0.012091 *

monthSep 3269.9 359.5 9.095 < 2e-16 ***

weekdayMonday -2925.7 250.3 -11.687 < 2e-16 ***

weekdaySaturday -907.1 250.3 -3.624 0.000301 ***

weekdaySunday -1791.3 250.3 -7.156 1.35e-12 ***

weekdayThursday -1213.4 250.3 -4.847 1.39e-06 ***

weekdayTuesday -2182.4 250.4 -8.717 < 2e-16 ***

weekdayWednesday -1571.1 250.3 -6.276 4.64e-10 ***

---

Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:

edf Ref.df F p-value

s(time.pts) 4.811 5.692 17.2 <2e-16 ***

---

Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) = 0.362 Deviance explained = 37.2%

GCV = 6.368e+06 Scale est. = 6.2643e+06 n = 1400

plot(daily$date, daily$domestic, col = "black", type = "l", lwd = 2 , xlab = "Time", ylab = "Daily Passenger Count") lines(daily$date, fitted(model1.daily),col = "blue",lwd = 2 ) legend('topleft', legend = c("Passenger Count", "Fitted Value"), lwd = 2 , col = c("black", "blue"))

plot(daily$date, residuals(model1.daily), type = "l",lwd = 2 , xlab = "Time", ylab = "Residual")

acf(residuals(model1.daily), lag.max = 365 )

Response:

Similar to 1b, the fitted model confirms a strong seasonality on the time of year and on the day of week. And from the small p-values of the coefficients in the ANOVA model, both seasonalities are statistically significant. The residual process shows different variability with the time of year. The ACF has values outside of the confidence band so the residual process is not stationary. Note that the ACF of the residuals for daily data has a similar shape with that of weekly data. Again this may be due to that the monthly ANOVA model isn’t enough to capture the rapid change of the seasonality depending on time of year.

Question 2. ARMA fitting and residual analysis

2a. (ARMA fitting) Fit a ARMA model with both AR and MA orders of 6 without intercept using the residual processes from Question 1b and 1c for the daily and weekly domestic passenger count, respectively. What are the coefficients of the fitted models? Are the fitted ARMA models causal? (Hint: Set include.mean = FALSE if using arima(). Use polyroot() to find the roots of a polynomial.)

model1.arma.weekly <- arima(residuals(model1.weekly),order = c( 6 , 0 , 6 ), method = "ML", include.mean = FALSE) model1.arma.weekly

acf(residuals(model1.arma.weekly), lag.max = 52 )

pacf((residuals(model1.arma.weekly)), lag.max = 52 )

qqPlot((residuals(model1.arma.weekly)), ylab = "Weekly Residuals")

## [1] 1 93

plot(daily$date, residuals(model1.arma.daily), xlab = "Time", ylab = "Daily residuals", type = "l")

acf(residuals(model1.arma.daily), lag.max = 365 )

pacf((residuals(model1.arma.daily)), lag.max = 365 )

acf(diff.daily, lag.max = 100 )

pacf(diff.daily, lag.max = 100 )

Response:

The PACF does not cut off so it can’t be a pure AR process. It might be an MA(7) process as the ACF cuts off at lag equals 7 and the PACF gradually tails off.

3b. (ARMA fitting and order selection). Fit an ARMA model without intercept using the differenced daily data with AR and MA order up to 8. Select the best ARMA model using AICc. What is the order for the selected model and what is its AICc?

n <- length(diff.daily) p <- 0 : 8 q <- 0 : 8 aic <- matrix( 0 , length(p), length(q)) for (i in 1 :length(p)) { for (j in 1 :length(q)) { modij = arima(diff.daily, order = c(p[i], 0 , q[j]), method = 'ML', include.mean = FALSE) aic[i, j] = modij$aic + 2 * (p[i] + q[j] + 1 ) * (p[i] + q[i]) / (n - p[i] - q[j] - 1 ) } }

js <- ceiling(which.min(aic) / length(p)) is <- which.min(aic) - (js- 1 )*length(p) model2 <- arima(diff.daily, order = c(p[is], 0 ,q[js]), method = "ML", include.mean = FALSE) model

Call:

arima(x = diff.daily, order = c(p[is], 0, q[js]), include.mean = FALSE, method = "ML")

Coefficients:

ar1 ar2 ar3 ma1 ma2 ma3 ma4 ma5 ma

0.0553 -0.024 0.0797 0.0266 0.0160 0.0238 0.0187 0.0265 0.

s.e. 0.0321 0.032 0.0322 0.0094 0.0097 0.0101 0.0100 0.0096 0.

ma

-0.

s.e. 0.

sigma^2 estimated as 8372080: log likelihood = -9667.72, aic = 19357.

Response:

The selected model is ARMA(3, 7). The AICc is 19357.43.

3c. (Residual analysis) Plot the residual process of the selected model in Question 3b. Display the ACF, PACF and QQ plot of the residual process. How does this model compare to the one on daily passenger count data in Question 2b?

ts.plot(residuals(model2),ylab = "Residual Process")

acf(residuals(model2),lag.max = 365 )

3d. (Testing uncorrelated residuals) Use the Ljung-Box Test to decide whether the residuals of the selected ARMA model in Question 3b are correlated.

Box.test(model2$resid, lag = ( 3 + 7 + 1 ), type = "Ljung-Box", fitdf = ( 3 + 7 ))

Box-Ljung test

data: model2$resid

X-squared = 3.4379, df = 1, p-value = 0.

Response:

The Ljung-Box test gives a p-value of 0.06 > 0.05, so we failed to reject the null and concluded there residuals might be uncorrelated.

Question 4. Seasonal ARMA model and forecasting: Weekly

domestic passenger count

4a. (Seasonal ARMA) Use the first 185 data points of weekly domestic passenger count as training data. Fit a seasonal ARMA model with intercept, where the non-seasonal model is ARMA(1,1) and the seasonal model is AR(1) with a period of 52 weeks. Plot the residual process and the ACF of the residual process. Comment on the appropriateness of the fitted model.

st <- 185 ed <- nrow(weekly)

model3.weekly = arima(weekly$domestic[ 1 :st], order = c( 1 , 0 , 1 ), seasonal = list(order = c( 1 , 0 , 0 ),period= 52 ),method = "ML") model3.weekly

Call:

arima(x = weekly$domestic[1:st], order = c(1, 0, 1), seasonal = list(order = c(1,

0, 0), period = 52), method = "ML")

Coefficients:

ar1 ma1 sar1 intercept

0.7114 -0.1215 0.6398 129038.

s.e. 0.0907 0.1387 0.0643 3857.

sigma^2 estimated as 77824163: log likelihood = -1957.18, aic = 3924.

plot(weekly$week[ 1 :st],residuals(model3.weekly),type = "l", ylab = "Residuals")

acf(residuals(model3.weekly),lag.max = 52 )

Response:

The ACF of the residual generally falls inside the confidence band. This shows the model can potentially be a good fit.

4b. (Forecasting) Use the fitted model in Question 4a to predict the total passenger count of the remainder of the weeks. Plot the 99% confidence interval. Compare with the actual observation. Does the actual observation fell into the 99% confidence interval?

outpred <- predict(model3.weekly, n.ahead = ed-st) alpha99 <- 2. plot(weekly$week[(st- 10 ):ed], weekly$domestic[(st- 10 ):ed],type = "l", ylim = c( 100000 , 180000 ), xlab = "Time", ylab = "Weekly Passenger Count") points(weekly$week[(st+ 1 ):ed], outpred$pred, col = "red") lines(weekly$week[(st+ 1 ):ed], outpred$pred + alpha99outpred$se, col = "blue") lines(weekly$week[(st+ 1 ):ed], outpred$pred - alpha99outpred$se, col = "blue") legend('topleft', legend = c("Passenger Count", "Fitted Value","Confidence Band"), lwd = 2 , col = c("black", "red", "blue"))

upper_pred <- outpred$pred + alpha99 * outpred$se lower_pred <- outpred$pred - alpha99 * outpred$se act_data <- weekly$domestic[(st+ 1 ):ed] res_summary <- data.frame( Actual = act_data, Predictions = outpred$pred, "Upper Bound" = upper_pred, "Lower Bound" = lower_pred ) print(res_summary)