Instrumental Variables I

Download Report

Transcript Instrumental Variables I

Instrumental Variables I
Objective
We are trying to learn the effect of education on income
• We have Card (1993)’s data on years of schooling, wages,
proximity to a four year college and various other controls.
• We will obtain OLS and IV estimates of the returns to
education and discuss any problems in this particular context
and in general
OLS Results
. reg lwage educ exper expersq black smsa smsa66 south reg66*, robust
Linear regression
Number of obs =
F( 15, 2994) =
Prob > F
=
R-squared
=
Root MSE
=
3010
91.31
0.0000
0.2998
.37228
-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0746933
.0036462
20.48
0.000
.0675439
.0818427
exper |
.084832
.0067548
12.56
0.000
.0715875
.0980765
expersq |
-.002287
.0003194
-7.16
0.000
-.0029133
-.0016608
black | -.1990123
.0181644
-10.96
0.000
-.2346282
-.1633964
smsa |
.1363845
.0192172
7.10
0.000
.0987042
.1740648
smsa66 |
.0262417
.0185908
1.41
0.158
-.0102102
.0626937
south |
-.147955
.0280346
-5.28
0.000
-.202924
-.092986
reg661 | -.1405174
.0451252
-3.11
0.002
-.228997
-.0520378
reg662 | -.0441502
.0372945
-1.18
0.237
-.1172756
.0289751
……
------------------------------------------------------------------------------
Are you surprised? What is the OLS Identification Assumption? What sources
of bias are likely to be present? Which direction are these sources of bias
likely to bias our estimates?
What do we require for an instrument
to be valid?
What do we require for an instrument
to be valid?
1. Relevance: cov(z, x) ≠ 0
2. Exogeneity cov(z, e) = 0
What do we require for an instrument
to be valid?
1. Relevance: cov(z, x) ≠ 0
– Important because if the instrument isn’t
correlated with the endogenous variable then
knowing the value of the instrument doesn’t tell
us anything about the endogenous variable.
– Do we care about the unconditional correlation or
the correlation conditional on the other controls?
Why?
– Can we test this? How?
2. Exogeneity cov(z, e) = 0
What do we require for an instrument
to be valid?
1. Relevance: cov(z, x) ≠ 0
2. Exogeneity cov(z, e) = 0
– Important because we want the instrument to
effect z only through x
– Can we test this? If not what do we do instead?
– How does this assumption relate to the key OLS
identification assumption?
Testing Relevance
How can we test the relevance of an
instrument?
Testing Relevance
How can we test the relevance of an instrument?
1. Calculate cor(x,z)
–
Better than nothing but not ideal. Why?
2. Run the ‘first stage’ regression
–
–
–
–
What should we include?
What do we look at?
What if we have more than one instrument?
What if we have more than one endogenous variable?
3. Use the post-estimation commands after estimating
our main regression.
We’ll do (2) today.
1st Stage Results
reg educ nearc4 exper expersq black smsa smsa66 south reg66*, robust
note: reg666 omitted because of collinearity
Linear regression
Number of obs
F( 15, 2994)
Prob > F
R-squared
Root MSE
=
=
=
=
=
3010
244.92
0.0000
0.4771
1.9405
-----------------------------------------------------------------------------|
Robust
educ |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nearc4 |
.3198989
.0850763
3.76
0.000
.153085
.4867128
exper | -.4125334
.0320751
-12.86
0.000
-.4754249
-.3496418
expersq |
.0008686
.0017076
0.51
0.611
-.0024795
.0042167
...
Where do we look to test the Relevance condition? Is it satisfied?
First-Stage F
A ‘First Stage F-Statistic’ in excess of 10 is often
used as the threshold for satisfaction of the
Relevance condition
• What do we mean by a first stage F Statistic
• Can we see it on the previous slide?
– (we can, but not directly) in general you can use
Stata’s ‘test’ command
How plausible is it that nearc4 is
exogenous?
IV Results
ivregress 2sls lwage (educ=nearc4) exper expersq black smsa smsa66 south reg66*, robust
note: reg669 omitted because of collinearity
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(15)
Prob > chi2
R-squared
Root MSE
=
=
=
=
=
3010
840.83
0.0000
0.2382
.3873
-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.1315038
.0539995
2.44
0.015
.0256667
.237341
exper |
.1082711
.0233466
4.64
0.000
.0625127
.1540295
expersq | -.0023349
.0003478
-6.71
0.000
-.0030167
-.0016532
black | -.1467757
.0523622
-2.80
0.005
-.2494038
-.0441477
smsa |
.1118083
.0310619
3.60
0.000
.050928
.1726886
smsa66 |
.0185311
.0205103
0.90
0.366
-.0216684
.0587306
south | -.1446715
.0290653
-4.98
0.000
-.2016385
-.0877045
reg661 | -.1078142
.0409668
-2.63
0.008
-.1881077
-.0275208
How have the results changed? Are they what you expect? What explanations
could there be for the differences?
Does the exclusion of IQ break the
exogeneity condition?
. reg IQ nearc4
Source |
SS
df
MS
-------------+-----------------------------Model | 2869.62905
1 2869.62905
Residual | 487188.423 2059 236.614096
-------------+-----------------------------Total | 490058.052 2060 237.892258
Number of obs
F( 1, 2059)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
2061
12.13
0.0005
0.0059
0.0054
15.382
-----------------------------------------------------------------------------IQ |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nearc4 |
2.5962
.7454966
3.48
0.001
1.134195
4.058206
_cons |
100.6106
.6274557
160.35
0.000
99.38014
101.8412
------------------------------------------------------------------------------
How about now?
. reg IQ nearc4 smsa66 reg662-reg669
Source |
SS
df
MS
-------------+-----------------------------Model | 30699.1017
10 3069.91017
Residual | 459358.951 2050 224.077537
-------------+-----------------------------Total | 490058.052 2060 237.892258
Number of obs
F( 10, 2050)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
2061
13.70
0.0000
0.0626
0.0581
14.969
-----------------------------------------------------------------------------IQ |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------nearc4 |
.3478974
.8144087
0.43
0.669
-1.249257
1.945052
smsa66 |
1.089165
.8086998
1.35
0.178
-.4967934
2.675124
reg662 |
1.099282
1.649748
0.67
0.505
-2.136074
4.334639
reg663 | -1.559295
1.622997
-0.96
0.337
-4.742191
1.6236
reg664 | -.5425011
1.916258
-0.28
0.777
-4.300517
3.215515
reg665 |
-8.47546
1.665513
-5.09
0.000
-11.74173
-5.209185
reg666 | -7.421172
1.973869
-3.76
0.000
-11.29217
-3.550175
reg667 |
-8.39441
1.829768
-4.59
0.000
-11.98281
-4.806013
reg668 | -2.924975
2.34463
-1.25
0.212
-7.52308
1.67313
reg669 | -2.891917
1.797382
-1.61
0.108
-6.416801
.6329674
_cons |
104.7735
1.624972
64.48
0.000
101.5867
107.9602
------------------------------------------------------------------------------
Do we believe the IV results?