Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Classical Linear Regression Model: Assumptions and ..., Study notes of Statistics

Technically, the presence of high multicollinearity doesn't violate any CLRM assumptions. Consequently,. OLS estimates can be obtained and are ...

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

millionyoung
millionyoung 🇬🇧

4.5

(25)

242 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Classical Linear Regression Model:
Assumptions and Diagnostic Tests
Yan Zeng
Version 1.1, last updated on 10/05/2016
Abstract
Summary of statistical tests for the Classical Linear Regression Model (CLRM), based on Brooks [1],
Greene [5] [6], Pedace [8], and Zeileis [10].
Contents
1 The Classical Linear Regression Model (CLRM) 3
2 Hypothesis Testing: The t-test and The F-test 4
3 Violation of Assumptions: Multicollinearity 5
3.1 Detection of multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Consequence of ignoring near multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Dealing with multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Violation of Assumptions: Heteroscedasticity 7
4.1 Detection of heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.1 TheGoldfeld-Quandttest.................................. 7
4.1.2 TheWhitesgeneraltest................................... 7
4.1.3 TheBreusch-Pagantest ................................... 8
4.1.4 TheParktest......................................... 8
4.2 Consequences of using OLS in the presence of heteroscedasticity . . . . . . . . . . . . . . . . 8
4.3 Dealing with heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.1 The generalised least squares method . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.2 Transformation........................................ 8
4.3.3 The White-corrected standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Violation of Assumptions: Autocorrelation 9
5.1 Detectionofautocorrelation..................................... 9
5.1.1 Graphicaltest ........................................ 9
5.1.2 The run test (the Geary test) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.1.3 TheDurbin-Watsontest................................... 10
5.1.4 TheBreusch-Godfreytest .................................. 11
5.2 Consequences of ignoring autocorrelation if it is present . . . . . . . . . . . . . . . . . . . . . 11
5.3 Dealingwithautocorrelation .................................... 12
5.3.1 The Cochrane-Orcutt procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.2 The Newey-West standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.3 Dynamicmodels ....................................... 13
5.3.4 Firstdierence ........................................ 13
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Classical Linear Regression Model: Assumptions and ... and more Study notes Statistics in PDF only on Docsity!

Classical Linear Regression Model:

Assumptions and Diagnostic Tests

Abstract Summary of statistical tests for the Classical Linear Regression Model (CLRM), based on Brooks [1], Greene [5] [6], Pedace [8], and Zeileis [10].

 - Version 1.1, last updated on 10/05/ Yan Zeng 
  • 1 The Classical Linear Regression Model (CLRM) Contents
  • 2 Hypothesis Testing: The t-test and The F-test
  • 3 Violation of Assumptions: Multicollinearity
    • 3.1 Detection of multicollinearity
    • 3.2 Consequence of ignoring near multicollinearity
    • 3.3 Dealing with multicollinearity
  • 4 Violation of Assumptions: Heteroscedasticity
    • 4.1 Detection of heteroscedasticity
      • 4.1.1 The Goldfeld-Quandt test
      • 4.1.2 The White’s general test
      • 4.1.3 The Breusch-Pagan test
      • 4.1.4 The Park test
    • 4.2 Consequences of using OLS in the presence of heteroscedasticity
    • 4.3 Dealing with heteroscedasticity
      • 4.3.1 The generalised least squares method
      • 4.3.2 Transformation
      • 4.3.3 The White-corrected standard errors
  • 5 Violation of Assumptions: Autocorrelation
    • 5.1 Detection of autocorrelation
      • 5.1.1 Graphical test
      • 5.1.2 The run test (the Geary test)
      • 5.1.3 The Durbin-Watson test
      • 5.1.4 The Breusch-Godfrey test
    • 5.2 Consequences of ignoring autocorrelation if it is present
    • 5.3 Dealing with autocorrelation
      • 5.3.1 The Cochrane-Orcutt procedure
      • 5.3.2 The Newey-West standard errors
      • 5.3.3 Dynamic models
      • 5.3.4 First difference
    • 5.4 Miscellaneous issues
  • 6 Violation of Assumptions: Non-Stochastic Regressors
  • 7 Violation of Assumptions: Non-Normality of the Disturbances
  • 8 Issues of Model Specification
    • 8.1 Omission of an important variable
    • 8.2 Inclusion of an irrelevant variable
    • 8.3 Functional form: Ramsey’s RESET
    • 8.4 Parameter stability / structural stability tests
      • 8.4.1 The Chow test
      • 8.4.2 Predictive failure tests
      • 8.4.3 The Quandt likelihood ratio (QLR) test
      • 8.4.4 Recursive least squares (RLS): CUSUM and CUSUMQ
  • 9 The Generalized Linear Regression Model (GLRM)
    • 9.1 Properties of OLS in the GLRM
    • 9.2 Robust estimation of asymptotic covariance matrices for OLS
      • 9.2.1 HC estimator
      • 9.2.2 HAC estimator
      • 9.2.3 R package for robust covariance estimation of OLS

is minimised so that the coefficient estimates will be given by the ordinary least squares (OLS) estimator

β^ ˆ =

β^ ˆ 1 β^ ˆ 2    β^ ˆk

5 = (X

′X) (^1) X′y. (1)

In order to calculate the standard errors of the coefficient estimates, the variance of the errors, σ^2 , is estimated by the estimator

s^2 =

RSS

T K

∑T

t=1 ˆε 2 t T K

where we recall K is the number of regressors including a constant. In this case, K observations are “lost” as K parameters are estimated, leaving T K degrees of freedom. Then the parameter variance-covariance matrix is given by

Var( βˆ) = s^2 (X′X)^1. (3)

And the coefficient standard errors are simply given by taking the square roots of each of the terms on the leading diagonal. In summary, we have (Brooks [1, page 91-92])

8

<

:

β^ ˆ = (X′X)^1 X′y = β + (X′X)^1 X′ε s^2 =

∑T t=1 ˆε^2 t T K Var( βˆ) = s^2 (X′X)^1.

The OLS estimator is the best linear unbiased estimator (BLUE), consistent and asymptotically normally distributed (CAN), and if the disturbances are normally distributed, asymptotically efficient among all CAN estimators.

2 Hypothesis Testing: The t-test and The F-test

The t-statistic for hypothesis testing is given by

β^ ˆi hypothesized value SE( βˆi)

 t(T K)

where SE( βˆi) =

Var( βˆ)ii, and is used to test single hypotheses. The F -test is used to test more than one coefficient simultaneously. Under the F -test framework, two regressions are required. The unrestricted regression is the one in which the coefficients are freely determined by the data, and the restricted regression is the one in which the coefficients are restricted, i.e. the restrictions are imposed on some βs. Thus the F -test approach to hypothesis testing is also termed restricted least squares. The F -test statistic for testing multiple hypotheses about the coefficient estimate is given by

RRSS U RSS U RSS

T K

m  F (m, T K) (5)

where U RSS is the residual sum of squares from unrestricted regression, RRSS is the residual sum of squares from restricted regression, m is the number of restrictions^1 , T is the number of observations, and K is the number of regressors in the unrestricted regression. To see why the test centres around a comparison of the residual sums of squares from the restricted and unrestricted regressions, recall that OLS estimation involved choosing the model that minimised the residual

(^1) Informally, the number of restrictions can be seen as “the number of equality signs under the null hypothesis”.

sum of squares, with no constraints imposed. Now if, after imposing constraints on the model, a residual sum of squares results that is not much higher than the unconstrained model’s residual sum of squares, it would be concluded that the restrictions were supported by the data. On the other hand, if the residual sum of squares increased considerably after the restrictions were imposed, it would be concluded that the restrictions were not supported by the data and therefore that the hypothesis should be rejected. It can be further stated that RRSS  U RSS.^2 Only under a particular set of very extreme circumstances will the residual sums of squares for the restricted and unrestricted models be exactly equal. This would be the case when the restriction was already present in the data, so that it is not really a restriction at all. Finally, we note any hypothesis that could be tested with a t-test could also have been tested using an F -test, since t^2 (T K)  F (1, T K).

3 Violation of Assumptions: Multicollinearity

If the explanatory variables were orthogonal to one another, adding or removing a variable from a regression equation would not cause the values of the coefficients on the other variables to change. Perfect multicollinearity will make it impossible to invert the (X′X) matrix since it would not be of full rank. Technically, the presence of high multicollinearity doesn’t violate any CLRM assumptions. Consequently, OLS estimates can be obtained and are BLUE with high multicollinearity. The larger variances (and standard errors) of the OLS estimators are the main reason to avoid high multicollinearity.

Causes of multicollinearity include  You use variables that are lagged values of one another.  You use variables that share a common time trend component.  You use variables that capture similar phenomena.

3.1 Detection of multicollinearity

Testing for multicollinearity is surprisingly difficult. Correlation matrix is one simple method, but if the relationship involves more variables that are collinear, then multicollinearity would be very difficult to detect.

Rule of thumb for identifying multicollinearity. Because high multicollinearity doesn’t violate a CLRM assumption and is a sample-specific issue, researchers typically don’t use formal statistical tests to detect multicollinearity. Instead, they use two sample measurements as indicators of a potential multi- collinearity problem.

 Pairwise correlation coefficients. The sample correlation of two independent variables, xk and xj , is calculated as rkj =

skj sksj

As a rule of thumb, correlation coefficients around 0.8 or above may signal a multicollinearity problem. Other evidence you should also check include insignificant t-statistics, sensitive coefficient estimates, and nonsensical coefficient signs and values. Note the pairwise correlation coefficients only identify the linear relationship of two variables. It does not check linear relationship among more than two variables.  Auxiliary regression and the variance inflation factor (VIF). A VIF for any given independent variable is calculated by

V IFk =

1 R^2 k

where R^2 k is the R-squared value obtained by regressing independent variable xk on all the other independent variables in the model. (^2) Recall URSS is the shortest distance from a vector to its projection plane.

4 Violation of Assumptions: Heteroscedasticity

4.1 Detection of heteroscedasticity

This is the situation where E[ε^2 i jX] is not a finite constant.

4.1.1 The Goldfeld-Quandt test

The Goldfeld-Quandt test is based on splitting the total sample of length T into two sub-samples of length T 1 and T 2. The regression model is estimated on each sub-sample and the two residual variances are calculated as s^21 = ˆε′ 1 εˆ 1 /(T 1 K), s^22 = ˆε′ 2 εˆ 2 /(T 2 K)

respectively. The null hypothesis is that the variances of the disturbances are equal, against a two-sided alternative. The test statistic, denoted GQ, is simply

GQ =

s^21 s^22

with s^21 > s^22. The test statistic is distributed as an F (T 1 K, T 2 K) under the null hypothesis, and the null of a constant variance is rejected if the test statistic exceeds the critical value. The GQ test is simple to construct but its conclusions may be contingent upon a particular, and probably arbitrary, choice of where to split the sample. An alternative method that is sometimes used to sharpen the inferences from the test and to increase its power is to omit some of the observations from the centre of the sample so as to introduce a degree of separation between the two sub-samples.

4.1.2 The White’s general test

The White’s general test for heteroscedasticity is carried out as follows. (1) Assume that the regression model estimated is of the standard linear form, e.g.

yt = β 1 + β 2 x 2 t + β 3 x 3 t + εt.

To test Var(εt) = σ^2 , estimate the model above, obtaining the residuals ˆεt. (2) Run the auxiliary regression

ˆε^2 t = α 1 + α 2 x 2 t + α 3 x 3 t + α 4 x^22 t + α 5 x^23 t + α 6 x 2 tx 3 t + νt.

The squared residuals are the quantity of interest since Var(εt) = E[ε^2 t ] under the assumption that E[εt] = 0. The reason that the auxiliary regression takes this form is that it is desirable to investigate whether the variance of the residuals varies systematically with any known variables relevant to the model. Note also that this regression should include a constant term, even if the original regression did not. This is as a result of the fact that ˆε^2 t will always have a non-zero mean. (3) Given the auxiliary regression, the test can be conducted using two different approaches. (i) First it is possible to use the F -test framework. This would involve estimating the auxiliary regression as the unrestricted regression and then running a restricted regression of εˆ^2 t on a constant only. The RSS from each specification would then be used as inputs to the standard F -test formula. (ii) An alternative approach, called Lagrange Multiplier (LM) test, centres around the value of R^2 for the auxiliary regression and does not require the estimation of a second (restricted) regression. If one or more coefficients in the auxiliary regression is statistically significant, the value of R^2 for that equation will be relatively high, while if none of the variables is significant, R^2 will be relatively low. The LM test would thus operate by obtaining R^2 from the auxiliary regression and multiplying it by the number of observations, T. It can be shown that T R^2  χ^2 (m)

where m is the number of regressors in the auxiliary regression (excluding the constant term), equivalent to the number of restrictions that would have to be placed under the F -test approach.

(4) The test is one of the joint null hypothesis that α 2 = α 3 = α 4 = α 5 = α 6 = 0. For the LM test, if the χ^2 -test statistic from step (3) is greater than the corresponding value from the statistical table then reject the null hypothesis that the errors are homoscedastic.

4.1.3 The Breusch-Pagan test

The Breusch-Pagan test can be seen as a special case of White’s general test. See [11] for a summary.

4.1.4 The Park test

The Park test assumes that the heteroscedasticity may be proportional to some power of an independent variable xk in the model: σ ε^2 t = σ^2 ε xαkt. (1) Estimate the model yt = β 1 + β 2 x 2 t +    + βK xKt + εt using OLS. (2) Obtain the squared residuals, ˆε^2 t , after estimating your model. (3) Estimate the model ln εˆ^2 t = γ + α ln xkt + νt using OLS. (4) Examine the statistical significance of α using the t-statistic: t = (^) ˆσαˆ (^) αˆ. If the estimate of α coefficient is statistically significant, then you have evidence of heteroskedasticity.

4.2 Consequences of using OLS in the presence of heteroscedasticity

When the errors are heteroscedastic, the OLS estimators will still give unbiased (and also consistent) coefficient estimates, but they are no longer BLUE. The reason is that the error variance, σ^2 , plays no part in the proof that the OLS estimator is consistent and unbiased, but σ^2 does appear in the formulae for the coefficient variances. If OLS is still used in the presence of heteroscedasticity, the standard errors could be wrong and hence any inferences made could be misleading. In general, the OLS standard errors will be too large for the intercept when the errors are heteroscedastic. The effect of heteroscedasticity on the slope standard errors will depend on its form.

4.3 Dealing with heteroscedasticity

4.3.1 The generalised least squares method

The generalised least squares (GLS) method supposes that the error variance was related to zt by the expression Var(εt) = σ^2 z t^2.

All that would be required to remove the heteroscedasticity would be to divide the regression equation through by zt yt zt

= β 1

zt

  • β 2

x 2 t zt

  • β 3

x 3 t zt

  • νt

where νt = ε ztt is an error term. GLS can be viewed as OLS applied to transformed data that satisfy the OLS assumptions. GLS is also known as weighted least squares (WLS) since under GLS a weighted sum of the squared residuals is minimised, whereas under OLS it is an unweighted sum. Researchers are typically unsure of the exact cause of the heteroscedasticity, and hence this technique is usually infeasible in practice.

4.3.2 Transformation

A second “solution” for heteroscedasticity is transforming the variables into logs or reducing by some other measure of “size”. This has the effect of re-scaling the data to “pull in” extreme observations.

5.1.2 The run test (the Geary test)

The run test (the Geary test). You want to use the run test if you’re uncertain about the nature of the autocorrelation. A run is defined as a sequence of positive or negative residuals. The hypothesis of no autocorrelation isn’t sustainable if the residuals have too many or too few runs. The most common version of the test assumes that runs are distributed normally. If the assumption of no autocorrelation is sustainable, with 95% confidence, the number of runs should be between

μr  1. 96 σr

where μr is the expected number of runs and σr is the standard deviation. These values are calculated by

μr =

2 T 1 T 2

T 1 + T 2

  • 1, σr =

2 T 1 T 2 (2T 1 T 2 T 1 T 2 )

(T 1 + T 2 )^2 (T 1 + T 2 1)

where r is the number of observed runs, T 1 is the number of positive residuals, T 2 is the number of negative residuals, and T is the total number of observations. If the number of observed runs is below the expected interval, it’s evidence of positive autocorrelation; if the number of runs exceeds the upper bound of the expected interval, it provides evidence of negative autocorrelation.

5.1.3 The Durbin-Watson test

The Durbin-Watson ( DW ) test is a test for first order autocorrelation. One way to motivate the test and to interpret the test statistic would be in the context of a regression of the time-t error on its previous value

εt = ρεt 1 + νt (6)

where νt  N (0, σ^2 nu).^3 The DW test statistic has as its null and alternative hypotheses

H 0 : ρ = 0, H 1 : ρ ̸= 0.

It is not necessary to run the regression given by (6) since the test statistic can be calculated using quantities that are already available after the first regression has been run

DW =

∑T

t=2(ˆεt^ ^ εˆt^1 ) 2 ∑T t=2 εˆ

2 t

 2(1 ρˆ) (7)

where ρˆ is the estimated correlation coefficient that would have been obtained from an estimation of (6). The intuition of the DW statistic is that the numerator “compares” the values of the error at times t 1 and t. If there is positive autocorrelation in the errors, this difference in the numerator will be relatively small, while if there is negative autocorrelation, with the sign of the error changing very frequently, the numerator will be relatively large. No autocorrelation would result in a value for the numerator between small and large. In order for the DW test to be valid for application, three conditions must be fulfilled: (i) There must be a constant term in the regression. (ii) The regressors must be non-stochastic. (iii) There must be no lags on dependent variable in the regresion.^4 The DW test does not follow a standard statistical distribution. It has two critical values: an upper critical value dU and a lower critical value dL. The rejection and non-rejection regions for the DW test are illustrated in Figure 1.

(^3) More generally, the AR(1) processes in time series analysis. (^4) If the test were used in the presence of lags of the dependent variable or otherwise stochastic regressors, the test statistic would be biased towards 2, suggesting that in some instances the null hypothesis of no autocorrelation would not be rejected when it should be.

Figure 1: Rejection and non-rejection regions for DW test

5.1.4 The Breusch-Godfrey test

The Breusch-Godfrey test is a more general test for autocorrelation up to the rth order, whereas DW is a test only of whether consecutive errors are related to one another. The model for the errors under the Breusch-Godfrey test is

εt = ρ 1 εt 1 + ρ 2 εt 2 +    + ρrεtr + νt, νt  N (0, σ^2 ν ). (8)

The null and alternative hypotheses are:

H 0 : ρ 1 = ρ 2 =    = ρr = 0, H 1 : ρ 1 ̸= 0 or ρ 2 ̸= 0 or    or ρr ̸= 0.

The Breusch-Godfrey test is carried out as follows: (1) Estimate the linear regression (8) using OLS and obtain the residuals, ˆεt. (2) Obtain R^2 from the auxiliary regression

ˆεt = γ 1 +

∑^ K

i=

γixit +

∑^ r

j=

ρj εˆtj + νt, νt  N (0, σ^2 ν ).

(3) Letting T denote the number of observations, the test statistic is given by

(T r)R^2  χ^2 r.

Note that (T r) pre-multiplies R^2 in the test for autocorrelation rather than T. This arises because the first r observations will effectively have been lost from the sample in order to obtain the r lags used in the test regression, leaving (T r) observations from which to estimate the auxiliary regression. One potential difficulty with Breusch-Godfrey is in determining an appropriate value of r. There is no obvious answer to this, so it is typical to experiment with a range of values, and also to use the frequency of the data to decide. For example, if the data is monthly or quarterly, set r equal to 12 or 4, respectively.

5.2 Consequences of ignoring autocorrelation if it is present

The consequences of ignoring autocorrelation when it is present are similar to those of ignoring het- eroscedasticity. The coefficient estimates derived using OLS are still unbiased, but they are inefficient, even at large sample sizes, so that the standard error estimates could be wrong. In the case of positive serial correlation in the residuals, the OLS standard error estimates will be biased downwards relative to the true standard errors. Furthermore, R^2 is likely to be inflated relative to its “correct” value if autocorrelation is present but ignored, since residual autocorrelation will lead to an underestimate of the true error variance (for positive autocorrelation). The following example illustrates the statement above. We assume the autocorrelation is represented by a first-order autocorrelation :

yt = β 1 +

∑^ K

i=

βixit + εt, εt = ρεt 1 + νt.

where 1 < ρ < 1 and νt is a random variable that satisfies the CLRM assumptions; namely E[νtjεt 1 ] = 0, Var(νtjεt 1 ) = σ^2 ν , and Cov(νt, νs) = 0 for all t ̸= s. By repeated substitution, we obtain

εt = νt + ρνt 1 + ρ^2 νt 2 + ρ^3 νt 3 +   .

(1) Estimate your original model yt = β 1 +

∑K

i=2 βixit^ +^ εt^ and obtain the residuals:^ εˆt. (2) Estimate the auxiliary regression x 2 t = α 1 +

∑K

i=3 αixit^ +^ rt^ and retain the residuals:^ ˆrt. (3) Find the intermediate adjustment factor, αˆt = ˆrt εˆt, and decide how much serial correlation (the number of lags) you’re going to allow. A Breusch-Godfrey test can be useful in making this determination, while EViews uses INTEGER[4(T /100)2/9].

(4) Obtain the error variance adjustment factor, ˆv =

∑T

t=1 αˆ

2 t + 2^

∑g h=

[

1 (^) g+1h

] (∑

T t=h+1 αˆt^ ˆαth

where g represents the number of lags determined in Step 3. (5) Calculate the serial correlation robust standard error. For variable x 2 ,

se( βˆ 2 )HAC =

se( βˆ 2 ) ˆσε

p ˆv.

(6) Repeat Steps (2) through (5) for independent variables x 3 through xK.

5.3.3 Dynamic models

Dynamic models. In practice, assumptions like (9) are likely to be invalid and serial correlation in the errors may arise as a consequence of “misspecified dynamics”. Therefore a dynamic model that allows for the structure of y should be used rather than a residual correction on a static model that only allows for a contemporaneous relationship between the variables.

5.3.4 First difference

First differences. Another potential “remedy” for autocorrelated residuals would be to switch to a model in first differences rather than in levels.

5.4 Miscellaneous issues

Why might lags be required in a regression? Lagged values of the explanatory variables or of the dependent variable (or both) may capture important dynamic structure in the dependent variable. Two possibilities that are relevant in fiance are as follows.  Inertia of the dependent variable. Often a change in the value of one of the explanatory variables will not affect the dependent variable immediately during one time period, but rather with a lag over several time periods. Many variables in economics and finance will change only slowly as a result of pure psychological factors. Delays in response may also arise as a result of technological or institutional factors.  Over-reactions.

Autocorrelation that would not be remedied by adding lagged variables to the model.  Omission of relevant variables, which are themselves autocorrelated , will induce the residuals from the estimated model to be serially correlated.  Autocorrelation owing to unparameterised seasonality.  If “misspecification” error has been committed by using an inappropriate functional form.

Problems with adding lagged regressors to “cure” autocorrelation.  Inclusion of lagged values of the dependent variable violates the assumption that the explanatory variables are non-stochastic. In small samples, inclusion of lags of the dependent variable can lead to biased coefficient estimates, although they are still consistent.  A model with many lags may have solved a statistical problem (autocorrelated residuals) at the expense of creating an interpretational one. Note that if there is still autocorrelation in the residuals of a model including lags, then the OLS estimators will not even be consistent.

Autocorrelation in cross-sectional data. Autocorrelation in the context of a time series regression is quite intuitive. However, it is also plausible that autocorrelation could be present in certain types of cross-sectional data.

6 Violation of Assumptions: Non-Stochastic Regressors

The OLS estimator is consistent and unbiased in the presence of stochastic regressors, provided that the regressors are not correlated with the error term of the estimated equation. However, if one or more of the explanatory variables is contemporaneously correlated with the disturbance term, the OLS estimator will not even be consistent. This results from the estimator assigning explanatory power to the variables where in reality it is arising from the correlation between the error term and yt.

7 Violation of Assumptions: Non-Normality of the Disturbances

The Bera-Jarque test statistic is given by

W = T

[

b^21 6

(b 2 3)^2 24

]

where T is the sample size, b 1 is the coefficient of skewness

b 1 =

E[ε^3 ] (σ^2 )3/^

and b 2 is the coefficient of kurtosis

b 2 = E[ε^4 ] (σ^2 )^2

The test statistic W asymptotically follows a χ^2 (2) under the null hypothesis that the distribution of the series is symmetric and mesokurtic, properties that a standard normal distribution has. b 1 and b 2 can be estimated using the residuals from the OLS regression εˆ. The null hypothesis is of normality, and this would be rejected if the residuals from the model were either significantly skewed or leptokurtic/platykurtic (or both).

8 Issues of Model Specification

8.1 Omission of an important variable

The consequence would be that the estimated coefficients on all the other variables will be biased and inconsistent unless the excluded variable is uncorrelated with all the included variables. Even if this condition is satisfied, the estimate of the coefficient on the constant term will be biased, and the standard errors will also be biased (upwards). Further information is offered in Dougherty [3], Chapter 7.

8.2 Inclusion of an irrelevant variable

The consequence of including an irrelevant variable would be that the coefficient estimators would still be consistent and unbiased, but the estimators would be inefficient. This would imply that the standard errors for the coefficients are likely to be inflated. Variables which would otherwise have been marginally significant may no longer be so in the presence of irrelevant variables. In general, it can also be stated that the extent of the loss of efficiency will depend positively on the absolute value of the correlation between the included irrelevant variable and the other explanatory variables. When trying to determine whether to err on the side of including too many or too few variables in a regression model, there is an implicit trade-off between inconsistency and efficiency; many researchers would argue that while in an ideal world, the model will incorporate precisely the correct variables – no more and no less – the former problem is more serious than the latter and therefore in the real world, one should err on the side of incorporating marginally significant variables.

those coefficient estimates for predicting values of y for the other period. These predictions for y are then implicitly compared with the actual values. The null hypothesis for this test is that the prediction errors for all of the forecasted observations are zero. To calculate the test:

  1. Run the regression for the whole period (the restricted regression) and obtain the RSS.
  2. Run the regression for the “large” sub-period and obtain the RSS (called RSS 1 ). Note the number of observations for the long estimation sub-period will be denoted by T 1. The test statistic is given by

RSS RSS 1 RSS

T 1 K

T 2

where T 2 is the number of observations that the model is attempting to “predict”. The test statistic will follow an F (T 2 , T 1 −K) distribution.

Forward predictive failure tests are where the last few observations are kept back for forecast testing. Backward predictive failure tests attempt to “back-cast” the first few observations. Both types of test offer further evidence on the stability of the regression relationship over the whole sample period

8.4.3 The Quandt likelihood ratio (QLR) test

The Chow and predictive failure tests will work satisfactorily if the date of a structural break in a financial time series can be specified. But more often, a researcher will not know the break date in advance. In such circumstances, a modified version of the Chow test, known as the Quandt likelihood ratio (QLR) test , can be used instead. The test works by automatically computing the usual Chow F -test statistic repeatedly with different break dates, then the break date giving the largest F -statistic value is chosen. While the test statistic is of the F -variety, it will follow a non-standard distribution rather than an F - distribution since we are selecting the largest from a number of F -statistics rather than examining a single one. The test is well behaved only when the range of possible break dates is sufficiently far from the end points of the whole sample, so it is usual to “trim” the sample by (typically) 5% at each end.

8.4.4 Recursive least squares (RLS): CUSUM and CUSUMQ

An alternative to the QLR test for use in the situation where a researcher is unsure of the date is to perform recursive least squares (RLS). The procedure is appropriate only for time-series data or cross- sectional data that have been ordered in some sensible way (e.g., a sample of annual stock returns, ordered by market capitalisation). Recursive estimation simply involves starting with a sub-sample of the data, estimating the regression, then sequentially adding one observation at a time and re-running the regression until the end of the sample is reached. It is to be expected that the parameter estimates produced near the start of the recursive procedure will appear rather unstable, but the key question is whether they then gradually settle down or whether the volatility continues through the whole sample. Seeing the latter would be an indication of parameter instability. It should be evident that RLS in itself is not a statistical test for parameter stability as such, but rather it provides qualitative information which can be plotted and thus gives a very visual impression of how stable the parameters appear to be.

The CUSUM test is based on a normalised (i.e. scaled) version of the cumulative sums of the residuals. Under the null hypothesis of perfect parameter stability, the CUSUM statistic is zero. A set of  2 standard error bands is usually plotted around zero and any statistic lying outside the bands is taken as evidence of parameter instability.

The CUSUMSQ test is based on a normalised version of the cumulative sums of squared residuals. The scaling is such that under the null hypothesis of parameter stability, the CUSUMSQ statistic will start at zero and end the sample with a value of 1. Again, a set of  2 standard error bands is usually plotted around zero and any statistic lying outside these is taken as evidence of parameter instability.

For full technical details of CUSUM and CUSUMQ, see Greene [5], Chapter 7.

9 The Generalized Linear Regression Model (GLRM)

The generalized linear regression model is 8

<

:

y = Xβ + ε E[εjX] = 0 E[εε′jX] = σ^2 Ω = ,

where Ω is a positive definite matrix. In this model, heteroscedasticity usually arises in volatile high frequency time-series data and in cross- section data where the scale of the dependent variable and the explanatory power of the model tend to vary across observations. Autocorrelation is usually found in time-series data. Panel data sets, consisting of cross sections observed at several points in time, may exhibit both characteristics.

Convention on notation : Throughout this section, we shall use n in place of T to stand for the number of observations. This is to be consistent with popular textbooks like Greene [5].

9.1 Properties of OLS in the GLRM

Recall under the assumptions of CLRM, the OLS estimator

β^ ˆ = (X′X)^1 X′y = β + (X′X)^1 X′ε

is the best linear unbiased estimator (BLUE), consistent and asymptotically normally distributed (CAN), and if the disturbances are normally distributed, asymptotically efficient among all CAN estimators. In the GLRM, the OLS estimator remains unbiased, consistent, and asymptotically normally distributed. It will, however, no longer be efficient and the usual inference procedures based on the F and t distributions are no longer appropriate.

Theorem 1 ( Finite Sample Properties of βˆ in the GLRM). If the regressors and disturbances are uncorrelated, then the least squares estimator is unbiased in the generalized linear regression model. With non-stochastic regressors, or conditional on X , the sampling variance of the least squares estimator is

Var [ βˆjX] = E[( βˆ β)( βˆ β)′jX] = E[(X′X)^1 X′εε′X(X′X)^1 jX] = (X′X)^1 X′(σ^2 Ω)X(X′X)^1

=

σ^2 n

n

X′X

n

X′ΩX

n

X′X

n

X′X

n

n

X′X

where Φ = σ

2 n X

′ΩX = Cov

p^1 n X

′(y Xβ)

is essentially the covariance matrix of the scores or esti-

mating functions. If the regressors are stochastic, then the unconditional variance is EX [ Var [bjX]]. βˆ is a linear function of ε. Therefore, if ε is normally distributed, then

β^ ˆjX  N (β, σ^2 (X′X)^1 (X′ΩX)(X′X)^1 )

If Var[ βˆjX] converges to zero, then βˆ is mean square consistent. With well-behaved regressors, (X′X/n)^1 will converge to a constant matrix. But (σ^2 /n)(X′ΩX/n) need not converge at all.

Theorem 2 ( Consistency of OLS in the GLRM). If Q = p lim(X′X/n) and p lim(X′ΩX/n) are both finite positive definite matrices, then βˆ is consistent for β. Under the assumed conditions,

p lim βˆ = β.

9.2.1 HC estimator

Consider the heteroscedasticity case first. White [9] has shown that under very general conditions, the estimator

S 0 =

n

∑^ n

i=

ε ˆ^2 i x˜i x˜′ i

has p lim S 0 = p lim Q.

Therefore, the White heteroscedasticity consistent (HC) estimator

Est.Asy.Var[ βˆ] =

n

n

X′X

n

∑^ n

i=

ε ˆ^2 i x˜i x˜′ i

n

X′X

= n (X′X)^1 S 0 (X′X)^1.

can be used to estimate the asymptotic covariance matrix of βˆ. This result is extremely important and useful. It implies that without actually specifying the type of heteroscedasticity, we can still make appropriate inferences based on the results of least squares. This implication is especially useful if we are unsure of the precise nature of the heteroscedasticity.

9.2.2 HAC estimator

In the presence of both heteroscedasticity and autocorrelation, as a natural extension of White’s result, the natural counterpart for estimating

Q =

n

∑^ n

i,j=

σij x˜i x˜′ j

would be

Q^ ˆ =^1 n

∑^ n

i,j=

ε ˆi ˆεj x˜i x˜′ j

But there are two problems with this estimator. The first one is that it is difficult to conclude yet that Q^ ˆ will converge to anything at all, since the matrix is 1/n times a sum of n^2 terms. We can achieve the convergence of Qˆ by assuming that the rows of X are well behaved and that the correlations diminish with increasing separation in time. The second problem is a practical one, that Qˆ needs not be positive definite. Newey and West [7] have devised an estimator, the Newey–West autocorrelation consistent (AC) covariance estimator , that overcomes this difficulty:

Q^ ˆ = S 0 +^1

n

∑^ L

l=

∑^ n

t=l+

wl εˆt εˆtl( ˜xt x˜′ tl + ˜xtl x˜′ t), wl = 1

l L + 1

It must be determined in advance how large L is to be. In general, there is little theoretical guidance. Current practice specifies L  T 1/4. Unfortunately, the result is not quite as crisp as that for the heteroscedasticity consistent estimator.

9.2.3 R package for robust covariance estimation of OLS

See Zeileis [10] for a survey.

References

[1] Chris Brooks. Introductory econometrics for finance , 2ed.. New York, Cambridge University Press, 2008. 1, 3, 4

[2] Cochrane, D. and Orcutt, G. H. (1949). “Application of Least Squares Regression to Relationships Containing Autocorrelated Error Terms”, Journal of the American Statistical Association 44, 32–61. 12

[3] Christopher Dougherty. Introduction to econometrics , 3ed.. Oxford University Press, 2007. 14

[4] Fabozzi, F. J. and Francis, J. C. (1980). “Heteroscedasticity in the Single Index Model”, Journal of Economics and Business 32, 243–8. 9

[5] William H. Greene. Econometric analysis , 5ed.. Prentice Hall, 2002. 1, 16, 17, 18

[6] William H. Greene. Econometric analysis , 7ed.. Prentice Hall, 2012. 1, 3

[7] Newey, W. K. and West, K. D. (1987). “A Simple Positive-Definite Heteroskedasticity and Autocorrelation-Consistent Covariance Matrix”, Econometrica 55, 703–8. 12, 19

[8] Roberto Pedace. Econometrics for dummies. Hoboken, John Wiley & Sons Inc., 2013. 1

[9] White, H. (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity”, Econometrica 48, 817–38. 9, 19

[10] Zeileis, A. (2004). “Econometric Computing with HC and HAC Covariance Matrix Estimators”, Journal of Statistical Software , 11 :10. 1, 19

[11] Zeng, Y. (2016). “Book Summary: Econometrics for Dummies ”, version 1.0.5. Unpublished manuscript. 8, 12