





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A comprehensive analysis of time series data, focusing on the fluctuations in the next funds gold price exchange traded fund and quarterly average temperature data. It explores trend estimation techniques, including moving average, parametric quadratic polynomial, local polynomial, and splines, and evaluates their effectiveness in capturing trends and achieving stationarity. The document also delves into differencing techniques for time series analysis and provides a detailed analysis of the residuals from the trend models. It is a valuable resource for students studying time series analysis and data modeling.
Typology: Exams
1 / 29
This page cannot be seen from the preview
Don't miss anything!
In this problem, we will study fluctuations in The NEXT FUNDS Gold Price Exchange Traded Fund that is a type of investment fund that aims to track the performance of gold prices. By investing in this fund, investors can gain exposure to the price movements of gold without having to physically own the metal. The fund holds physical gold as its underlying asset, and its value is based on the market price of gold. You will use the file Fund Prices Data.csv, where monthly prices are from January 2010 to Dec 2022.
To read the data in R, save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the R function read.csv()
You will perform the analysis and modelling on the Close data column.
#Here are the libraries you will need:
library (mgcv) library (TSA) library (dynlm) library (ggplot2) library (reshape2) library (greybox) library (mlr) library (mgcv) library (lubridate) library (dplyr) library (data.table)
#Run the following code to prepare the data for analysis:
data<-read.csv("Fund Prices Data.csv")
Plot the Time Series and the ACF plot for the series. Comment on the stationarity of both time series based on these plots. Which (if any) stationarity assumptions are violated for the time series?
fp <- ts(data$Close, start = 2010 , freq = 12 )
plot(fp,col="purple",lwd=1.5,ylab="Fund Price",main="FP Time Series Plot")
acf(fp,lag.max= 12 * 12 ,col="purple",lwd=1.5,ylab="Fund Price",main="FP ACF Plot")
Response: 1b
The plot shows the trend lines indicate that the trend has been better captured by the local polynomial and splines as compared to the moving average and Parametric approach.
Evaluate the quality of each fit using the residual analysis.
## Residual Process: MAV resid.1 = fp-tsmav.fp ## Residual Process: Local Polynomial resid.2 = fp-fit.loc.fp ## Residual Process: Spline resid.3 = fp-fit.spl ## Residual Process: Parametric resid.4 = fp-para.fit
y.min = min(c(resid.1,resid.2, resid.3, resid.4)) y.max = max(c(resid.1,resid.2, resid.3, resid.4)) ts.plot(resid.1,lwd= 2 ,ylab="Residual Process",col="plum", ylim=c(- 1500 , 2000 )) lines(resid.2,col="purple") lines(resid.3,col="green") lines(resid.4,col="blue") legend(x= 2010 ,y= 2000 ,legend=c("Moving Average","LOESS", "Splines", "Parametric quadratic"),lt y = 1 , col=c("plum","purple", "green", "blue"))
acf(resid.1,lag.max= 24 * 4 ,main="Moving Average model")
acf(resid.4,lag.max= 24 * 4 ,main="Parametric model")
Response:1c
The residuals from the trend models show clear non-stationarity, suggesting that trend removal alone using any of the four models is not sufficient for accounting for non stationary variations in the time series.
Now plot the difference time series and its ACF plot. Apply the four trend models in Question 1b to the differenced time series. What can you conclude about the difference data in terms of stationarity? Which model would you recommend to apply (trend removal via fitting trend vs differencing) such that to obtain a stationary process?
ts.plot(diff(fp), col = "black", xlab = "", ylab = "Differenced FP", main = "Differenced FP Exchange Rate by Time") grid()
acf(diff(fp), lag.max = 52 * 12 , xlab = "Lag", ylab = "ACF ", main = "Diff FP ACF Analysis")
# 2. Fit a parametric quadratic polynomial model x1 <- time.pts[- 1 ] x2 <- time.pts[- 1 ] ^ 2 para.model <- lm(diff(fp) ~ x1 + x2) para.fit <- ts(fitted(para.model), start = 2010 , frequency = 12 ) ts.plot(diff(fp), xlab = "", ylab = "Differenced FP", main = "Differenced Parametric Quadratic Polynomial Analysis") grid() lines(para.fit, lwd = 2 ,col = "orange")
# 3. Fit a local polynomial model loc.model <- loess(diff(fp) ~ time.pts[- 1 ]) loc.fit <- ts(fitted(loc.model), start = 2010 , frequency = 12 ) ts.plot(diff(fp), xlab = "", ylab = "Differenced FP", main = "Differenced Local Polynomial Analysis") grid() lines(loc.fit, lwd = 2 ,col = "green")
# 5. Compare all estimated trends vals <- c(mav.fit, para.fit, loc.fit, gam.fit) ylim <- c(min(vals), max(vals)) ts.plot(mav.fit, lwd = 2 , col = "black", ylim = ylim, xlab = "", ylab = "FP", main = "Differenced Regression Model Comparison") grid() lines(mav.fit, lwd = 2 , col = "red") lines(para.fit, lwd = 2 , col = "orange") lines(loc.fit, lwd = 2 , col = "green") lines(gam.fit, lwd = 2 , col = "blue") legend("bottomright", legend = c("MAV", "PARA", "LOC", "GAM"), col = c("red", "orange", "green", "blue"), lwd = 2 )
Response 1d
The time series plots seem to clearly show the appropriateness of fit of the models and the indication of stationarity in the differenced data.
The fitted line showing the splines trend seems to have the least variability. The parametric quadratic model also has little variability, but not as much as the splines model which has higher deviations in trend, and local polynomial model which has the highest deviations as shown in the combined graph. The moving average trend model, however, has many ‘kinks’ capturing the minor movements that might not be of use in determining the trend.
From this analysis, we can confirm the property of stationarity for the differenced data; hence using the differenced data is a more effective approach for removing the trend in making the time series stationary.
Part 2: Temperature Analysis
Background
In this problem, we will analyze quarterly average temperature data. The data file Temperature HW 2.csv contains average monthly temperature from a southern region from January 1980 through Dec 2016. We will aggregate the data on a quarterly basis, by taking the average rate within each quarter. We will fit the models on the data until Quarter 4 of 2015 and evaluate the predictions for Q1 to Q4 2016.
To read the data in R, save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the R function read.csv()
plot(diff(temp),xlab="Time",ylab="Temperature",main="Quarterly Temperature: 1-Differenced")
acf(diff(temp),lag.max= 12 * 12 ,main="Quarterly Temperature ACF: 1-Differenced")
plot(diff(temp, 4 ),xlab="Time",ylab="Temperature",main="Quarterly Temperature: 4-Differenced")
The plot of the 1st-order differenced data shows that trend has been removed. The seasonality effect, however, still seems to be present. For the 1st-order differenced data, the first seasonal lag in the ACF large and decays slowly over multiples of the lag. The ACF for the 1st-order differenced data exhibits a large first seasonal lag that decays slowly over multiples of the lag, indicating that the 1st-order differenced data is not suitable for effectively capturing the seasonality in the data.
Since we know that the 1st order difference doesn’t appropriately address seasonality, we can apply a 4 lag difference as provided above. The absence of a cyclical pattern in the ACF plot indicates that seasonality has been removed to a great extent; however, there is still evidence of a trend in the data, given the presence of slowly-decaying lags.
Separately fit a seasonality harmonic model and the ANOVA seasonality model to the temperature data. Evaluate the quality of each fit using the residual analysis. Does one model perform better than the other? Which model would you select to fit the seasonality in the data?
times<-ts(seq( 1 : 768 )) Timereq<-times Timereq2<-times^ 2 ## Estimate seasonality using ANOVA approach td_lm<- dynlm(temp ~ season(temp)) summary(td_lm)
plot(temp, type = "l") lines(fitted(td_lm), col = "blue")
## Estimate seasonality using harmonic model td_lm2 <- dynlm(temp ~ harmonic(temp)) summary(td_lm2)