Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ARMA Modeling and Forecasting: A Case Study of IBM Stock Price, Quizzes of Network Analysis

A comprehensive guide to arma (autoregressive moving average) modeling and forecasting, using the example of ibm stock price prediction. It covers key concepts like time series analysis, stationarity, differencing, and residual analysis. Practical examples and r code snippets for implementing arma models and evaluating their performance. It is a valuable resource for students and professionals interested in time series analysis and forecasting.

Typology: Quizzes

2024/2025

Available from 03/01/2025

TopScorer100
TopScorer100 🇺🇸

4.8

(6)

91 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Module 2: Auto-Regressive and Moving Average Model
- Case Studies
2.4: Data Examples
2.4.1 IBM Stock Price: ARMA Modeling
In this lesson, I will illustrate ARIMA modeling and forecasting with a data example.
IBM Stock Price Prediction
The data example is analyzing the stock price for a large company which has been around for many years. Specifically, we'll focus
on IBM, standing for International Business Machines. The company was initiated in 1911 as the Computing-Tabulating-Recording
Company, which was a merger of four companies. Later in 1933 it became IBM. Since the company has been around for more than
100 years, it has contributed to many innovations and has experienced many disruptive events as provided on the slide. For
example, in 1993, it experienced the highest loss in its history to date, a loss of $8 billion and in 2005, it sold its personal computing
business, since it was not sustainable.
IBM Stock Price
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download ARMA Modeling and Forecasting: A Case Study of IBM Stock Price and more Quizzes Network Analysis in PDF only on Docsity!

Module 2: Auto-Regressive and Moving Average Model

- Case Studies

2.4: Data Examples

2.4.1 IBM Stock Price: ARMA Modeling

In this lesson, I will illustrate ARIMA modeling and forecasting with a data example. IBM Stock Price Prediction The data example is analyzing the stock price for a large company which has been around for many years. Specifically, we'll focus on IBM, standing for International Business Machines. The company was initiated in 1911 as the Computing-Tabulating-Recording Company, which was a merger of four companies. Later in 1933 it became IBM. Since the company has been around for more than 100 years, it has contributed to many innovations and has experienced many disruptive events as provided on the slide. For example, in 1993, it experienced the highest loss in its history to date, a loss of $8 billion and in 2005, it sold its personal computing business, since it was not sustainable. IBM Stock Price

What is a stock price? In financial terms, it is viewed as the perceived company's worth since multiplied by number of shares gives the total company's worth. It is generally affected by the number of things including volatility in the market, current economic conditions, popularity of the company, and events as the ones I mentioned in the previous slide. The analysis in this study is to develop a model to predict IBM's stock price given that no major events are to be released. The model presented in this lesson is general and applies to the stock price of other companies, although its performance will be different. The data consists of daily stock price from January 29th 1962 until August 26 th^ 2020. The daily stock price is available as open price, close price, and adjusted close high and low price. We'll consider here adjusted close, which is the common price analyzed when daily stock price predictions are sought. Time Series Plots The input is the file IBMstockprice.txt , accompanying this example. I used the as.date() command in R to convert the date, providing the data into a date object in R. For this command, the input is the vector of dates converted into character with the specification on how the dates are recorded. Furthermore, I defined one of the events called Truven to be the date of acquisition of the Truven company in 2016. The other events are defined similarly.

One way to account for the trend is to consider the difference process. Here I redefine the stock price as the time series with a frequency specifying daily data starting with January 29th, 1962. Then I use a difference command to take the difference, where the default in this command is the first order difference. I also plotted the ACF and PACF to see whether another difference of the data is needed and/or whether the resulting process is stationary. Here are the ACF and the PACF plots. The two plots look just like those for the white noise we simulated in a previous lesson. Do we still then, need to apply an ARMA model to model the difference process, or is it enough to take the difference since the difference process seems to look like white noise? We'll address this question in this lesson and the next lesson. ARIMA Modeling Next, I applied the ARIMA modelling. I used ARIMA to take into account the trend through differencing. I assume only one order difference and thus set the difference order equal to one. Here I only selected the orders of the AR and MA polynomials using the AICc criterion. For this we need to fit the ARMA model for all combinations of AR and MA orders up to the maximum order for both polynomials. Here I set the maximum order to be five, and consider orders from zero to five for both the AR and MA components. I loop over all combinations of the orders and fit the ARMA model for all those combinations and save the AICC values in a matrix for all the combinations. Note that the R implementation does not provide the AICC value but the AIC without the modification. The selected orders are p equal to 0 and q equal to 1. Last, we fit the model for the selected orders called here, the final model. ARIMA Modeling: Residual Analysis

Here are the residual plots for the fitted ARMA Model. The R-code for obtaining this plot is provided in the complete R-code for this data example available with this lesson. The residual plot does not show to have a pattern. The variance of the residuals is also constant. The ACF plot has only the first lag equal to one, where all other values for the sample ACF are small within the confidence band. The same for the sample PACF, the values are all within the confidence band. The Q-Q normal plot shows both a left and a right tail, an indication that the residuals may have more of a T distribution than a normal distribution, although otherwise quite symmetric. Testing for Uncorrelated Residuals Last, we also apply the hypothesis testing procedures for independence or serial correlation using the Box-Pierce and Ljung-Box test. The R output of these implementation is here. In this implementation, we need to input the lag for the test statistic and fitdf for the distribution under the null hypothesis. Here I used the lag to be equal to the sum of the AR and MA order plus 1 and the degree of freedom of the fitted model to be the sum of the two orders. It is common practice to use a minimum lag for the test to be as provided by the sum of the orders plus 1. The R output provides the test values along with the p-value of the test. Note that the null hypothesis in this test is that the time series process (here it is the residual process) consists of uncorrelated variables. Thus, this is one case when we want large P values so that we do not reject the null hypothesis. The P values for both tests are large, indicating that it is plausible for the residuals to be uncorrelated. Summary:

The corresponding plot is here. The black line is the observed time series. The red points are the predicted values along with the confidence bands in blue. All the observed values are within the confidence band. The predicted values are somewhat closer to observed values, except that they do not capture some of the variations in the stock price. The confidence band is quite wide, widening as the lag increases also, indicating there is large volatility or uncertainty, in the prediction. Prediction Accuracy But how good are these predictions? We can compare the predictions derived from applying the predict() command based on the training data to the observed responses in the test data. In the real world, we do not have the observed responses at that time of making the predictions, and thus we cannot evaluate the prediction accuracy of a model. But here, we first pretend we do not have the observed future time series values and predict based on the training data. Generally, the question “how good is the prediction?” comprises two separate aspects. Firstly, measuring predictive accuracy per se. Secondly, we compare various forecasting models. The most common reported measures of predicting accuracy are:

  • Mean squared prediction error abbreviated MSPE and computed as the mean of the square differences between predicted and observed;
  • Mean absolute prediction errors abbreviated MAE and computed as the mean of the absolute values of the differences between predicted and observed;
  • Mean absolute percentage error abbreviated MAPE and computed as the mean of the absolute values of the differences scaled by the observed responses;
  • Precision error abbreviated PM and computed as the ratio between MSPE and the sum of square differences between the response and the mean of the responses;
  • Confidence Interval Measure abbreviated CIM computed as the number of predictions falling outside of the prediction intervals divided by the number of predictions made. Prediction Error Measure Insights MSPE is appropriate for evaluating prediction accuracy for models estimated by minimizing square prediction errors, but it depends on the scale of the time series data, and thus is sensitive to outliers. MAE is appropriate for evaluating prediction accuracy for models estimated by minimizing absolute prediction errors; similar to MSPE, it depends on scale, but it is robust to outliers. MAPE is appropriate to evaluate prediction accuracy for models estimated by minimizing absolute prediction errors, but unlike MAE, it does not depend on scale. It is also robust to outliers. Last, the precision error is appropriate for evaluating prediction accuracy for models estimated by minimizing square prediction errors. It also does not depend on scale. The precision measure is reminiscent of the regression R squared. It can be interpreted as a proportion of the variability in the prediction versus the variability in the new data.

The precision measure is 8.84, which measures that the proportion between the variability in the prediction and the variability in the new data. The closer this value is to zero, the better the prediction is. However, the prediction measure is quite large indicating poor performance in predicting the 10 days. Last, I also note that all observed responses fall within the prediction intervals. However, as I pointed out before, the prediction bands are quite large, indicating significant uncertainty in the predictions. ARIMA Forecasting: 1 rolling day vs 10 day ahead Let's also consider prediction of the 10 days, but this time on a rolling basis. That is for each day, we fit the model with the entire time series up to that day and predict only one day ahead. Apply this for each of the 10 days. To do this I used a loop command for i from 1 to 10 where each i corresponds to one different day. We then save not only the prediction but also the prediction intervals. Last, I plug in the predictions based on this approach along with the observations of the time series. The complete R code is not provided here in this slide, but it's available with this lecture.

The plot here compares the predictions from this approach, where the predictions for each of the 10 days were made on a rolling basis, to the prediction made 10 days ahead. As expected, the predictions using the one day ahead, shown in green, are much closer to the observed time series values than the predictions from the 10 days ahead approach, shown in red. Moreover, the confidence bands from the rolling basis predictions shown in purple are much tighter than for the 10 day ahead predictions. This is again expected since there is less uncertainty from one day to another, as compared from one day and looking 10 days ahead. Prediction Accuracy: 1 - Day Ahead Similar to the previous approach, we can also evaluate the accuracy of the prediction using accuracy measures. The R code is the same as before, except that it is now applied to the predictions based on the one day ahead rolling basis approach. The R output is here. Now the precision measure reduced significantly to 1.49 versus 8.84 for the 10 - day ahead predictions. While the prediction bands are tight for the 1 - day rolling predictions, the observed values are all falling within the prediction bands. The predictions using the one day ahead approach on a rolling basis are preferable, since the predictions are significantly better, however, we cannot always use such an approach since there are situations when we want to predict 10 days ahead in the future rather than one day ahead. Generally, different models may be used for different lags ahead prediction.

Let’s begin again by plotting the time series. Please note that over the course of the R data analysis lessons, I have and will be using various approaches to converting data into time series and to plotting it. When performing your own data analysis, you will decide which one you like best! Here the command to convert the data into a time series is the xts() command. Here I also transformed the time series using the log-transformation as I will motivate in the next slides. The time series plots are on this slide. The top figure is for the original time series and the bottom figure is for the log-transformed time series. Some observations from these two plots are as follows. The variance of the original time series depends on time. There is also a clear non-linear trend as well as there is a clear seasonality or cyclical pattern, indicating that the time series is non-stationary. The log transformation of the time series is needed to stabilize the time-varying variability before doing trend and seasonality analysis. Why transforming?

Here I am motivating why we need to transform the time series before evaluating the trend and seasonality. Let’s skip ahead and fit a quarter seasonality along with the trend as shown in blue in the two time series plots. When comparing the two plots, we can see that the differences between the original time series and the fitted seasonality and fitted trend gradually increase over time, with lower residuals for earlier times and larger residuals for more recent times. In contrast, for the transformed time series, we can see the variability in the residuals is relatively the same or constant over time. In the following analysis, we will perform the analysis on log transformed monthly electricity consumption. Trend Estimation Generally, when removing trend or seasonality, we don't just delete information. As discussed in Module 1, we take that information apart in order to analyze separately each part of the behavior. Here are going back to analyzing one component, the trend. You can use your favorite method in Module 1 to capture the trend in the time series. Here we fit spline regression, parametric quadratic polynomial and moving average smoothing to the log transformed time series. Overall, we can observe a positive logarithmic trend in the log-transformed time series. Among the three trend fitting methods, spline seems to best capture the trend. The moving

Trend & Seasonality: Residual Analysis This slide presents the ACF and PACF plots for the original and log-transformed time series on the column on the left. On the right, I added the plots for the residuals of the two models. We can see that “trend+quarterly seasonality” residuals still contain seasonality in the acf and pacf pots. The acf plots for the “trend+monthly seasonality” residuals show that there is still some seasonality pattern although most of the sample auto-correlation and partial auto-correlation values are within the confidence bands or close to the confidence bands. It is possible that there are still some seasonality factors not captured by either monthly seasonality. This might due to temperature changes and environment behaviors that could impact people’s electricity consumptions in each year. Summary: This lesson provided the exploratory analysis of the the trend and seasonality for the time series of the electricity consumption in the US. When removing seasonality and trend, we could tell what is the effect of the trend, what is the effect of the seasons, and what is the effect that isn't accounted for by season nor trend through the residuals. We will continue on this line of analysis in the next lesson.

2.4.4 Energy Consumption: Forecasting

This lesson continues the analysis of the time series of monthly electricity consumption. Here I will compare the predictions based on two forecasting approaches, one being the ARMA model applied to the residuals after fitting trend and monthly seasonality and the second being the seasonal ARIMA. Energy Consumption Our focus again is in forecasting electricity consumption as a proxy of energy consumption over the period of one year, analysis that could be useful in pricing, load allocation and management of generation and transmission infrastructures. For example, we have seen from the previous lesson that there has been an increasing no-linear trend in electricity consumption since 1985, indicating the need of potentially new infrastructures. We also have identified a monthly seasonality, which can be used in inferring load allocation. Assessing Stationarity I am returning now to the analysis of the log-tranformed data, displaying here the ACF and PACF plots of the log transformed time series on the left and the 1 lag difference of the time series on the right. The slow decay in the ACF is a sign of non-stationarity; in fact, we did identify a trend and seasonality in the data hence this result is not surprising. But we can see that even with the first order differencing, we observe that there is a pattern in the ACF plots, with large values following a seasonal pattern. This thus suggests a seasonal difference would need to be applied.

Now, we are ready to model the residual process of the log monthly electricity consumption with ARMA models. This is the code we used before for order selection applied this time to the 12 - lag difference of the log-transformed data for the monthly electricity time series. The selected orders are p= 8; q = 12 assuming we also apply a one order difference to account for any other potential trend in the data. Difference Time Series: ARMA Model (cont’d) If instead we take d=0, that is, we don’t apply yet another difference, then the selected orders are p= 4; q = 12. A few observations are to be noted. I will also highlight that I divided the data into training and testing data, where the testing data consists of the monthly observations from 2019, that is, one year of data. I applied the arima modeling including model selection on the training data. The testing data will be used to evaluate the prediction accuracy. The model with d=1 does not perform better than the model with d=0 when comparing the AICC values. Moreover, the model with d=0 is less complex since the AR order is 4, lower than the model in the previous slide. For the rest of the analysis, I will use this model.

ARMA Model: Residual Analysis The residual plots for the model in the previous slide are here. The residuals seem to behave like a white noise with a normal distribution. The normality is assessed using the quantile-quantile normal plot and the histogram on the bottom. Time Series: SARIMA Model Next I applied an SARIMA or seasonal ARIMA model to the log-transformed data. The selected model is an ARMA with orders p= 3; d=0; q = 12 and SARIMA with orders sp =1; sd= 1; sq = 1. Selecting the orders for the SARIMA is time consuming; for that, I considered only small orders for the seasonal component of the model. I also note here that the ‘sarima’ command here does not converges for all combinations of the ARMA orders. For example, the SARIMA does not converge for an AR order larger than 9 hence for I restricted here to a ‘porder’ taking values from 0 to 9. The code provided on the slide is for sp=1, sd=1 and sq=0. While here the code includes all combinations of p and q, as you will run the code, you will find that the code will stop for some combinations of (p,q). You will have to exclude these combinations then run the code again. SARIMA Model: Residual Analysis