Variable Misspecification III: Consequences for Diagnositcs

Download Report

Transcript Variable Misspecification III: Consequences for Diagnositcs

Type
author name/s here
Dougherty
Introduction to Econometrics,
5th edition
Chapter heading
Chapter 6: Specification of
Regression Variables
© Christopher Dougherty, 2016. All rights reserved.
VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
. reg LGEARN S EXP HEIGHT
-----------------------------------------------------------------------------Source |
SS
df
MS
Number of obs =
500
-------------+-----------------------------F( 3,
496) =
28.68
Model | 22.5581024
3 7.51936748
Prob > F
= 0.0000
Residual | 130.041117
496 .262179672
R-squared
= 0.1478
-------------+-----------------------------Adj R-squared = 0.1427
Total |
152.59922
499
.30581006
Root MSE
= .51203
-----------------------------------------------------------------------------LGEARN |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S |
.0933581
.0103172
9.05
0.000
.0730873
.1136289
EXP |
.0409265
.0096533
4.24
0.000
.0219602
.0598928
HEIGHT |
.0128517
.0056685
2.27
0.024
.0017146
.0239889
_cons |
.3008412
.4428508
0.68
0.497
-.5692536
1.170936
------------------------------------------------------------------------------
Here is a regression of the logarithm of hourly earnings on years of schooling and
experience, and height in inches. The height coefficient implies than an extra inch leads to
a 1.29% increase in earnings. Can you really believe this?
1
VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
. reg LGEARN S EXP HEIGHT
-----------------------------------------------------------------------------Source |
SS
df
MS
Number of obs =
500
-------------+-----------------------------F( 3,
496) =
28.68
Model | 22.5581024
3 7.51936748
Prob > F
= 0.0000
Residual | 130.041117
496 .262179672
R-squared
= 0.1478
-------------+-----------------------------Adj R-squared = 0.1427
Total |
152.59922
499
.30581006
Root MSE
= .51203
-----------------------------------------------------------------------------LGEARN |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S |
.0933581
.0103172
9.05
0.000
.0730873
.1136289
EXP |
.0409265
.0096533
4.24
0.000
.0219602
.0598928
HEIGHT |
.0128517
.0056685
2.27
0.024
.0017146
.0239889
_cons |
.3008412
.4428508
0.68
0.497
-.5692536
1.170936
------------------------------------------------------------------------------
Perhaps not, but the t statistic is significant at the 5% level. What is going on?
2
VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
. reg LGEARN S EXP HEIGHT MALE
-----------------------------------------------------------------------------Source |
SS
df
MS
Number of obs =
500
-------------+-----------------------------F( 4,
495) =
25.04
Model | 25.6831529
4 6.42078824
Prob > F
= 0.0000
Residual | 126.916067
495 .256396094
R-squared
= 0.1683
-------------+-----------------------------Adj R-squared = 0.1616
Total |
152.59922
499
.30581006
Root MSE
= .50636
-----------------------------------------------------------------------------LGEARN |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S |
.0974338
.0102693
9.49
0.000
.0772569
.1176107
EXP |
.0414342
.0095473
4.34
0.000
.022676
.0601924
HEIGHT | -.0053599
.0076573
-0.70
0.484
-.0204047
.0096849
MALE |
.218359
.0625458
3.49
0.001
.095471
.3412471
_cons |
1.363205
.5332809
2.56
0.011
.3154322
2.410979
------------------------------------------------------------------------------
The reason is that we have omitted an important variable, MALE. When it is included, the
height effect disappears.
3
VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
. reg LGEARN S EXP HEIGHT MALE
-----------------------------------------------------------------------------Source |
SS
df
MS
Number of obs =
500
-------------+-----------------------------F( 4,
495) =
25.04
Model | 25.6831529
4 6.42078824
Prob > F
= 0.0000
Residual | 126.916067
495 .256396094
R-squared
= 0.1683
-------------+-----------------------------Adj R-squared = 0.1616
Total |
152.59922
499
.30581006
Root MSE
= .50636
-----------------------------------------------------------------------------LGEARN |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S |
.0974338
.0102693
9.49
0.000
.0772569
.1176107
EXP |
.0414342
.0095473
4.34
0.000
.022676
.0601924
HEIGHT | -.0053599
.0076573
-0.70
0.484
-.0204047
.0096849
MALE |
.218359
.0625458
3.49
0.001
.095471
.3412471
_cons |
1.363205
.5332809
2.56
0.011
.3154322
2.410979
------------------------------------------------------------------------------
The point of this example is that model misspecification – variable misspecification or
indeed any kind of misspecification – in general will invalidate the regression diagnostics,
and as a consequence the diagnostics may lead you to the wrong conclusions.
4
VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
. reg LGEARN S EXP HEIGHT MALE
-----------------------------------------------------------------------------Source |
SS
df
MS
Number of obs =
500
-------------+-----------------------------F( 4,
495) =
25.04
Model | 25.6831529
4 6.42078824
Prob > F
= 0.0000
Residual | 126.916067
495 .256396094
R-squared
= 0.1683
-------------+-----------------------------Adj R-squared = 0.1616
Total |
152.59922
499
.30581006
Root MSE
= .50636
-----------------------------------------------------------------------------LGEARN |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S |
.0974338
.0102693
9.49
0.000
.0772569
.1176107
EXP |
.0414342
.0095473
4.34
0.000
.022676
.0601924
HEIGHT | -.0053599
.0076573
-0.70
0.484
-.0204047
.0096849
MALE |
.218359
.0625458
3.49
0.001
.095471
.3412471
_cons |
1.363205
.5332809
2.56
0.011
.3154322
2.410979
------------------------------------------------------------------------------
In the original model, we had two kinds of variable misspecification. We omitted MALE, and
we included the irrelevant variable HEIGHT.
5
VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS
. reg LGEARN S EXP HEIGHT MALE
-----------------------------------------------------------------------------Source |
SS
df
MS
Number of obs =
500
-------------+-----------------------------F( 4,
495) =
25.04
Model | 25.6831529
4 6.42078824
Prob > F
= 0.0000
Residual | 126.916067
495 .256396094
R-squared
= 0.1683
-------------+-----------------------------Adj R-squared = 0.1616
Total |
152.59922
499
.30581006
Root MSE
= .50636
-----------------------------------------------------------------------------LGEARN |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S |
.0974338
.0102693
9.49
0.000
.0772569
.1176107
EXP |
.0414342
.0095473
4.34
0.000
.022676
.0601924
HEIGHT | -.0053599
.0076573
-0.70
0.484
-.0204047
.0096849
MALE |
.218359
.0625458
3.49
0.001
.095471
.3412471
_cons |
1.363205
.5332809
2.56
0.011
.3154322
2.410979
------------------------------------------------------------------------------
Including an irrelevant variable is one of the few types of misspecification that does not
lead to the invalidation of the regression diagnostics. However, omitting relevant variables
certainly does. This is why the t statistic in the original specification misled us.
6
Copyright Christopher Dougherty 2016.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 6.3 of C. Dougherty,
Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
www.oxfordtextbooks.co.uk/orc/dougherty5e/.
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2016.05.04