









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
ISYE 6402 Homework 4 Solution Answered 100% Score; 2025/2026.
Typology: Exams
1 / 15
This page cannot be seen from the preview
Don't miss anything!
For this data analysis, you will analyze the daily and weekly domestic passenger count arriving in Hawaii airports. File DailyDomestic.csv contains the daily number of passengers between May 2019 and February 2023 File WeeklyDomestic.csv contains the weekly number of passengers for the same time period. Here we will use different ways of fitting the ARIMA model while dealing with trend and seasonality.
library (lubridate) library (mgcv) library (tseries) library (car)
To read the data in R , save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the R function read.csv()
daily <- read.csv("DailyDomestic.csv", head = TRUE) daily$date <- as.Date(daily$date) weekly <- read.csv("WeeklyDomestic.csv", head = TRUE) weekly$week <- as.Date(weekly$week)
1a. Plot the daily and weekly domestic passenger count separately. Do you see a strong trend and seasonality?
plot(daily$date, daily$domestic, type = "l", xlab = "Time", ylab = "Daily Passenger Count")
plot(weekly$week, weekly$domestic, type = "l", xlab = "Time", ylab = "Weekly Passenger Count")
Response
From both plots there is a strong seasonality on the time of year. The daily passenger count also shows a strong seasonality on the day of week. There might be a trend that the number of passenger is increasing through the years but it is not obvious.
1b. (Trend and seasonality) Fit the weekly domestic passenger count with a non-parametric trend using splines and monthly seasonality using ANOVA. Is the seasonality significant? Plot the fitted values together with the original time series. Plot the residuals and the ACF of the residuals. Comment on how the model fits and on the appropriateness of the stationarity assumption of the residuals.
time.pts = c( 1 :length(weekly$week)) month = as.factor(format(weekly$week,"%b")) model1.weekly = gam(weekly$domestic ~ s(time.pts) + month) summary(model1.weekly)
plot(weekly$week, weekly$domestic, col = "black", type = "l", lwd = 2 , xlab = "Time", ylab = "Weekly Passenger Count") lines(weekly$week, fitted(model1.weekly),col = "blue",lwd = 2 ) legend('topleft', legend = c("Passenger Count", "Fitted Value"), lwd = 2 , col = c("black", "blue"))
1c. (Trend and seasonality) This time fit the daily domestic passenger count with a non-parametric trend using splines, monthly and day-of-the- week seasonality using ANOVA. Plot the fitted values together with the original time series. Are the seasonal effects significant? Plot the residuals and the ACF of the residuals. Comment on how the model fits and on the appropriateness of the stationarity assumption of the residuals.
time.pts = c( 1 :length(daily$date)) month = as.factor(format(daily$date,"%b")) weekday = as.factor(weekdays(daily$date)) model1.daily = gam(daily$domestic ~ s(time.pts) + month + weekday) summary(model1.daily)
plot(daily$date, daily$domestic, col = "black", type = "l", lwd = 2 , xlab = "Time", ylab = "Daily Passenger Count") lines(daily$date, fitted(model1.daily),col = "blue",lwd = 2 ) legend('topleft', legend = c("Passenger Count", "Fitted Value"), lwd = 2 , col = c("black", "blue"))
plot(daily$date, residuals(model1.daily), type = "l",lwd = 2 , xlab = "Time", ylab = "Residual")
acf(residuals(model1.daily), lag.max = 365 )
Response:
Similar to 1b, the fitted model confirms a strong seasonality on the time of year and on the day of week. And from the small p-values of the coefficients in the ANOVA model, both seasonalities are statistically significant. The residual process shows different variability with the time of year. The ACF has values outside of the confidence band so the residual process is not stationary. Note that the ACF of the residuals for daily data has a similar shape with that of weekly data. Again this may be due to that the monthly ANOVA model isn’t enough to capture the rapid change of the seasonality depending on time of year.
2a. (ARMA fitting) Fit a ARMA model with both AR and MA orders of 6 without intercept using the residual processes from Question 1b and 1c for the daily and weekly domestic passenger count, respectively. What are the coefficients of the fitted models? Are the fitted ARMA models causal? (Hint: Set include.mean = FALSE if using arima(). Use polyroot() to find the roots of a polynomial.)
model1.arma.weekly <- arima(residuals(model1.weekly),order = c( 6 , 0 , 6 ), method = "ML", include.mean = FALSE) model1.arma.weekly
acf(residuals(model1.arma.weekly), lag.max = 52 )
pacf((residuals(model1.arma.weekly)), lag.max = 52 )
qqPlot((residuals(model1.arma.weekly)), ylab = "Weekly Residuals")
plot(daily$date, residuals(model1.arma.daily), xlab = "Time", ylab = "Daily residuals", type = "l")
acf(residuals(model1.arma.daily), lag.max = 365 )
pacf((residuals(model1.arma.daily)), lag.max = 365 )
acf(diff.daily, lag.max = 100 )
pacf(diff.daily, lag.max = 100 )
Response:
The PACF does not cut off so it can’t be a pure AR process. It might be an MA(7) process as the ACF cuts off at lag equals 7 and the PACF gradually tails off.
3b. (ARMA fitting and order selection). Fit an ARMA model without intercept using the differenced daily data with AR and MA order up to 8. Select the best ARMA model using AICc. What is the order for the selected model and what is its AICc?
n <- length(diff.daily) p <- 0 : 8 q <- 0 : 8 aic <- matrix( 0 , length(p), length(q)) for (i in 1 :length(p)) { for (j in 1 :length(q)) { modij = arima(diff.daily, order = c(p[i], 0 , q[j]), method = 'ML', include.mean = FALSE) aic[i, j] = modij$aic + 2 * (p[i] + q[j] + 1 ) * (p[i] + q[i]) / (n - p[i] - q[j] - 1 ) } }
js <- ceiling(which.min(aic) / length(p)) is <- which.min(aic) - (js- 1 )*length(p) model2 <- arima(diff.daily, order = c(p[is], 0 ,q[js]), method = "ML", include.mean = FALSE) model
Response:
The selected model is ARMA(3, 7). The AICc is 19357.43.
3c. (Residual analysis) Plot the residual process of the selected model in Question 3b. Display the ACF, PACF and QQ plot of the residual process. How does this model compare to the one on daily passenger count data in Question 2b?
ts.plot(residuals(model2),ylab = "Residual Process")
acf(residuals(model2),lag.max = 365 )
3d. (Testing uncorrelated residuals) Use the Ljung-Box Test to decide whether the residuals of the selected ARMA model in Question 3b are correlated.
Box.test(model2$resid, lag = ( 3 + 7 + 1 ), type = "Ljung-Box", fitdf = ( 3 + 7 ))
Response:
The Ljung-Box test gives a p-value of 0.06 > 0.05, so we failed to reject the null and concluded there residuals might be uncorrelated.
4a. (Seasonal ARMA) Use the first 185 data points of weekly domestic passenger count as training data. Fit a seasonal ARMA model with intercept, where the non-seasonal model is ARMA(1,1) and the seasonal model is AR(1) with a period of 52 weeks. Plot the residual process and the ACF of the residual process. Comment on the appropriateness of the fitted model.
st <- 185 ed <- nrow(weekly)
model3.weekly = arima(weekly$domestic[ 1 :st], order = c( 1 , 0 , 1 ), seasonal = list(order = c( 1 , 0 , 0 ),period= 52 ),method = "ML") model3.weekly
plot(weekly$week[ 1 :st],residuals(model3.weekly),type = "l", ylab = "Residuals")
acf(residuals(model3.weekly),lag.max = 52 )
Response:
The ACF of the residual generally falls inside the confidence band. This shows the model can potentially be a good fit.
4b. (Forecasting) Use the fitted model in Question 4a to predict the total passenger count of the remainder of the weeks. Plot the 99% confidence interval. Compare with the actual observation. Does the actual observation fell into the 99% confidence interval?
outpred <- predict(model3.weekly, n.ahead = ed-st) alpha99 <- 2. plot(weekly$week[(st- 10 ):ed], weekly$domestic[(st- 10 ):ed],type = "l", ylim = c( 100000 , 180000 ), xlab = "Time", ylab = "Weekly Passenger Count") points(weekly$week[(st+ 1 ):ed], outpred$pred, col = "red") lines(weekly$week[(st+ 1 ):ed], outpred$pred + alpha99outpred$se, col = "blue") lines(weekly$week[(st+ 1 ):ed], outpred$pred - alpha99outpred$se, col = "blue") legend('topleft', legend = c("Passenger Count", "Fitted Value","Confidence Band"), lwd = 2 , col = c("black", "red", "blue"))
upper_pred <- outpred$pred + alpha99 * outpred$se lower_pred <- outpred$pred - alpha99 * outpred$se act_data <- weekly$domestic[(st+ 1 ):ed] res_summary <- data.frame( Actual = act_data, Predictions = outpred$pred, "Upper Bound" = upper_pred, "Lower Bound" = lower_pred ) print(res_summary)