Transcript Chapter 7

CHAPTER 7

REGRESSION DIAGNOSTIC IV: MODEL SPECIFICATION ERRORS

Damodar Gujarati

Econometrics by Example

MODEL SPECIFICATION ERRORS

 One of the assumptions of the classical linear regression (CLRM) is that the model is specified correctly.

 By correct specification we mean one or more of the following:  1. The model does not exclude any “core” variables.

 2. The model does not include superfluous variables.

 3. The functional form of the model is suitably chosen.

 4. There are no errors of measurement in the regressand and regressors.  5. Outliers in the data, if any, are taken into account.  6. The probability distribution of the error term is well specified.

 7. The regressors are nonstochastic.

Damodar Gujarati Econometrics by Example

OMISSION OF RELEVANT VARIABLES

 If we omit a relevant variable because we do not have the data, or because we have not studied the underlying economic theory carefully, or because we have not studied prior research in the area thoroughly, or just due to carelessness, we are

underfitting

a model.

Damodar Gujarati Econometrics by Example

CONSEQUENCES

     1. If the omitted variables are correlated with the variables included in the model, the coefficients of the estimated model are biased.

 This bias does not disappear as the sample size gets larger (i.e., the estimated coefficients of the misspecified model are also inconsistent).

2. Even if the incorrectly excluded variables are not correlated with the variables included in the model, the intercept of the estimated model is biased.

3. The disturbance variance is incorrectly estimated.

4. The variances of the estimated coefficients of the misspecified model are biased.

5. In consequence, the usual confidence intervals and hypothesis-testing procedures become suspect, leading to misleading conclusions about the statistical significance of the estimated parameters.

 6. Furthermore, forecasts based on the incorrect model and the forecast confidence intervals based on it will be unreliable.

Damodar Gujarati Econometrics by Example

F TEST TO COMPARE TWO MODELS

 If the original model is the “restricted” model, and the model with the added (previously omitted) variable – which could also be a squared term or an interaction term – is the “unrestricted” model, we can compare the two using an

F

test:

F

 (1 ( 2

R ur

  2

R ur R

) /(

r

2 ) /

m

) where

m

= number of restrictions (or omitted variables),

n

= number of observations, and

k

= number of parameters in the unrestricted model  A rejection of the null suggests that the omitted variables belong in the model.

Damodar Gujarati Econometrics by Example

DETECTION OF OMISSION OF VARIABLES

 Ramsey’s Regression Specification Error (RESET) Test  Lagrange Multiplier (LM) test

Damodar Gujarati Econometrics by Example

RAMSEY’S RESET TEST

  1. From the (incorrectly) estimated model, we first obtain the

Y

ˆ

i Y

ˆ

i Y

ˆ

i

higher powers of the estimated dependent variable) as additional regressors.

 3. The initial model is the restricted model and the model is Step 2 is the unrestricted model.  4. Under the null hypothesis that the restricted (i.e., the original) model is correct, we can use the previously mentioned

F

test .

 5. If the

F

test in Step 4 is statistically significant, we can reject the null hypothesis. That is, the restricted model is not appropriate in the present situation. By the same token, if the

F

statistic is statistically insignificant, we do not reject the original model.

Damodar Gujarati Econometrics by Example

LAGRANGE MULTIPLIER TEST

    1. From the original model, we obtain the estimated residuals,

e i

.

2. If in fact the original model is the correct model, then the residuals

e i

obtained from this model should not be related to the regressors omitted from that model.

3. We now regress

e i

on the regressors in the original model and the omitted variables from the original model. This is the

auxiliary regression

.

4. If the sample size is large, it can be shown that

n

(the sample size) times the

R 2

obtained from the auxiliary regression follows the chi-square distribution with df equal to the number of regressors omitted from the original regression.

 5. If the computed chi-square value exceeds the critical chi-square value at the chosen level of significance, or if its

p

value is sufficiently low, we reject the original (or restricted) regression. This is to say, that the original model was misspecified.

Damodar Gujarati Econometrics by Example

INCLUSION OF IRRELEVANT OR UNNECESSARY VARIABLES

 Sometimes researchers add variables in the hope that the

R

2 value of their model will increase in the mistaken belief that the higher the

R

2 the better the model. This is called

overfitting

a model. But if the variables are not economically meaningful and relevant, such a strategy is not recommended.

Damodar Gujarati Econometrics by Example

CONSEQUENCES

 1. The OLS estimators of the “incorrect”or overfitted model are all unbiased and consistent.

 2. The error variance is correctly estimated.

 3. The usual confidence interval and hypothesis testing procedures remain valid.

 4. However, the estimated coefficients of such a model are generally inefficient (their variances will be larger than those of the true model).

Damodar Gujarati Econometrics by Example

MISSPECIFICATION OF THE FUNCTIONAL FORM OF A REGRESSION MODEL

 Sometimes researchers mistakenly do not account for the nonlinear nature of variables in a model. Moreover, some dependent variables (such as wage, which tends to be skewed to the right) are more appropriately entered in natural log form.

Damodar Gujarati Econometrics by Example

COMPARING ON BASIS OF R

2

 We can transform the models as follows, as in Chapter 2:  1. Compute the geometric mean (GM) of the dependent variable, call it

Y

* .

   2. Divide

Y i

by

Y

* to obtain: dependent variable instead of

Y i

.

Y i

*  ~

Y i

4. Estimate the equation with

Y i Y

3. Estimate the equation with ln

Y i

lieu of

Y i

variable).

Y

~

i Y

~

Y

~ as the dependent variable using as the

i

 5. Compute the following, putting the larger

RSS n

ln(

RSS

1 ) ~  1 2 2

RSS

2 If this is significant, the model with the lower

RSS

value in the numerator: value is better.

Damodar Gujarati Econometrics by Example

ERRORS OF MEASUREMENT

 One of the assumptions of CLRM is that the model used in the analysis is correctly specified.

 Although not explicitly spelled out, this presumes that the values of the regressand as well as regressors are accurate. That is, they are not guess estimates, extrapolated, interpolated or rounded off in any systematic manner or recorded with errors.

Damodar Gujarati Econometrics by Example

CONSEQUENCES

 Consequences for Errors of Measurement in the

Regressand

:  1. The OLS estimators are still unbiased.

 2. The variances and standard errors of OLS estimators are still unbiased.

 3. But the estimated variances, and ipso facto the standard errors, are larger than in the absence of such errors.

In short, errors of measurement in the regressand do not pose a very serious threat to OLS estimation.

Damodar Gujarati Econometrics by Example

CONSEQUENCES

   Consequences for Errors of Measurement in the

Regressor

1. OLS estimators are biased as well as inconsistent.

: 2. Errors in a single regressor can lead to biased and inconsistent estimates of the coefficients of the other regressors in the model.  It is not easy to establish the size and direction of bias in the estimated coefficients.

 It is often suggested that we use

instrumental

or

proxy

variables for variables suspected of having measurement errors.

 The proxy variables must satisfy two requirements—that they are highly correlated with the variables for which they are a proxy and also they are uncorrelated with the usual equation error as well as the measurement error  But such proxies are not easy to find.

 We should thus be very careful in collecting the data and making sure that some obvious errors are eliminated.

Damodar Gujarati Econometrics by Example

OUTLIERS, LEVERAGE, AND INFLUENCE DATA

 OLS gives equal weight to every observation in the sample.

 This may create problems if we have observations that may not be “typical” of the rest of the sample.

 Such observations, or data points, are known as

outliers

,

leverage

or

influence

points.

Damodar Gujarati Econometrics by Example

OUTLIERS, LEVERAGE, AND INFLUENCE DATA

Outliers

: In the context of regression analysis, an outlier is an observation with a large residual (

e i

), large in comparison with the residuals of the rest of the observations.

Leverage

: An observation is said to exert (high) leverage if it is disproportionately distant from the bulk of the sample observations. In this case such observation(s) can pull the regression line towards itself, which may distort the slope of the regression line.

Influential point

: If a levered observation in fact pulls the regression line toward itself, it is called an influential point. The removal of such a data point(s) from the sample can dramatically change the slope of the estimated regression line.

Damodar Gujarati Econometrics by Example

PROBABILITY DISTRIBUTION OF THE ERROR TERM

 The classical normal linear regression model (CNLRM), an extension of CLRM, assumes that the error term

u

i normally distributed.

in the regression model is  This assumption is critical if the sample size is relatively small, for the commonly used tests of significance, such as

t

and

F

, are based on the normality assumption.

Damodar Gujarati Econometrics by Example

JARQUE-BERA (JB) TEST OF NORMALITY

  This is a large sample test.

The test statistic is as follows: JB =

n

 

S

2 6  (

K

 3) 2 24   where

n

is the sample size,

S

= skewness coefficient,

K

= kurtosis coefficient.

  For a normally distributed variable

S

= 0 and

K

= 3. When this is the case, the JB statistic is zero.

 Therefore, the closer is the value of JB to zero, the better is the normality assumption. Since in practice we do not observe the true error term, we use its proxy,

e

i . The null hypothesis is the

joint hypothesis

that

S

=0 and

K = 3

. JB have shown that the statistic follows the chi-square distribution with 2 df (because we are imposing two restrictions, namely, that skewness is zero and kurtosis is 3). If the computed

JB

statistic exceeds the critical chi-square value, we reject the hypothesis that the error term is normally distributed.

Damodar Gujarati Econometrics by Example

RANDOM OR STOCHASTIC REGRESSORS

 The CLRM assumes that the regressand is random but the regressors are non-stochastic or fixed—that is, we keep the values of the regressors fixed and draw several random samples of the dependent variable.  Although the assumption of fixed regressors may be valid in several economic situations, it may not be tenable for all economic data. In other words, we assume that both

Y

(the dependent variable) and the

X

s (the regressors) are drawn randomly. This is the case of

stochastic

or r

andom regressors

.

Damodar Gujarati Econometrics by Example

THE SIMULTANEITY PROBLEM

     There are many situations where such unidirectional relationship between

Y

and the

X

s cannot be maintained, since some

X

s affect

Y

but in turn

Y

also affects one or more

X

s.

In other words, there may be a feedback relationship between the Y and X variables.

Simultaneous equation regression models

are models that take into account feedback relationships among variables.

Endogenous

variables are variables whose values are determined in the model.

Exogenous variables

are variables whose values are not determined in the model.

 Sometimes, exogenous variables are called

predetermined

variables, for their values are determined independently or fixed, such as the tax rates fixed by the government.

Estimate parameters using Method of

Indirect Least Squares

(ILS) or Method of

Two-Stage Least Squares

(2SLS).

Damodar Gujarati Econometrics by Example