Lecture 12 Heteroskedasticity

Download Report

Transcript Lecture 12 Heteroskedasticity

Review
Gauss–Markov DGP
Yi   0  1 X 1i   i
E ( i )  0
Var ( i )  
2
Cov( i ,  j )  0, if i  j
X 's fixed across samples.
Review of Standard Errors (cont.)
• Problem: we do not know  2
• Solution: estimate  2
• We do not observe the ACTUAL error
terms, i
• We DO observe the residual, ei
ei
s 
n  k 1
2
2
Review of Standard Errors (cont.)
• Our formula for Estimated Standard
Errors relied on ALL the Gauss–Markov
DGP assumptions.
• For this lecture, we will focus on the
assumption of homoskedasticity.
• What happens if we relax the
2
assumption that Var(i )   ?
Heteroskedasticity (Chapter 10.1)
• HETEROSKEDASTICITY
– The variance of i is NOT a constant  2.
– The variance of i is greater for some
observations than for others.
Var(i )  
2
i
Heteroskedasticity (cont.)
• For example, consider a regression of
housing expenditures on income.
Renti  0  1 Incomei  i
• Consumers with low values of income have
little scope for varying their rent expenditures.
Var(i ) is low.
• Wealthy consumers can choose to spend a
lot of money on rent, or to spend less,
depending on tastes. Var(i ) is high.
Figure 10.1 Rents and Incomes for a
Sample of New Yorkers
OLS and Heteroskedasticity
• What are the implications of
heteroskedasticity for OLS?
• Under the Gauss–Markov assumptions
(including homoskedasticity), OLS was
the Best Linear Unbiased Estimator.
• Under heteroskedasticity, is OLS
still Unbiased?
• Is OLS still Best?
OLS and Heteroskedasticity (cont.)
• A DGP with Heteroskedasticity
Yi   0  1 X 1i  ... k X ki   i
E ( i )  0
Var ( i )  
2
i
Cov( i ,  j )  0 for i  j
X ’s fixed across samples
OLS and Heteroskedasticity (cont.)
• The unbiasedness conditions are the same
as under the Gauss–Markov DGP.
• OLS is still unbiased!
OLS and Heteroskedasticity (cont.)
• To determine whether OLS is “Best” (i.e.
the unbiased linear estimator with the
lowest variance), we need to calculate
the variance of a linear estimator under
heteroskedasticity.
OLS and Heteroskedasticity
• The variance of a linear estimator is
wi  i
2
2
• OLS minimizes
 wi
2
2
• OLS is no longer efficient!
OLS and Heteroskedasticity (cont.)
• Under heteroskedasticity, OLS is
unbiased but inefficient.
• OLS does not have the smallest
possible variance, but its variance may
be acceptable. And the estimates are
still unbiased.
• However, we do have one very serious
problem: our estimated standard error
formulas are wrong!
OLS and Heteroskedasticity (cont.)
• Implications of Heteroskedasticity:
– OLS is still unbiased.
– OLS is no longer efficient; some other
linear estimator will have a lower variance.
– Estimated Standard Errors will be incorrect;
C.I.’s and hypothesis tests (both t- and
F- tests) will be incorrect.
OLS and Heteroskedasticity (cont.)
• Implications of Heteroskedasticity
– OLS is no longer efficient; some other
linear estimator will have a lower variance.
• Can we use a better estimator?
– Estimated Standard Errors will be incorrect;
C.I.’s and hypothesis tests (both t- and
F- tests) will be incorrect.
• If we keep using OLS, can we calculate
correct e.s.e.’s?
Tests for Heteroskedasticity
• Before we turn to remedies for
heteroskedasticity, let us first consider tests
for the complication.
• There are two types of tests:
1. Tests for continuous changes in variance:
White and Breusch–Pagan tests
2. Tests for discrete (lumpy) changes in variance:
the Goldfeld–Quandt test
The White Test
• The White test for heteroskedasticity has
a basic premise: if disturbances are
homoskedastic, then squared errors are
on average roughly constant.
• Explanators should NOT be able to
predict squared errors, or their proxy,
squared residuals.
• The White test is the most general test for
heteroskedasticity.
The White Test (cont.)
• Five Steps of the White Test:
1. Regress Y against your various
explanators using OLS
2. Compute the OLS residuals, e1...en
3. Regress ei2 against a constant, all of
the explanators, the squares of the
explanators, and all possible interactions
between the explanators (p slopes total)
The White Test (cont.)
• Five Steps of the White Test (cont.)
4. Compute R2 from the “auxilliary
equation” in step 3
5. Compare nR2 to the critical value
from the Chi-squared distribution with
p degrees of freedom.
The White Test: Example
(1) Estimate Wagei   0  1edi   2 expi   3 IQi   i
(2) Calculate e  Wage  ˆ  ˆ ed  ˆ exp  ˆ IQ
i
i
0
1
i
2
i
3
i
(3) Regress ei 2   0  1edi   2 edi 2   3 expi   4 exp 2
  5 IQi   6 IQi 2   7 edi expi
  8edi IQi   9 expi IQi  vi
(4) Compute nR 2 from (3)
(5) Reject homoskedasticity if nR 2  Chi-Squared critical
value with 9 degrees of freedom (16.92, if the
significance level is 0.05)
The White Test
• The White test is very general, and
provides very explicit directions. The
econometrician has no judgment calls
to make.
• The White test also burns through
degrees of freedom very, very rapidly.
• The White test is appropriate only for
“large” sample sizes.
The Breusch–Pagan Test
• The Breusch–Pagan test is very similar to the
White test.
• The White test specifies exactly which
explanators to include in the auxilliary
equation. Because the test includes crossterms, the number of slopes (p) increases
very quickly.
• In the Breusch–Pagan test, the
econometrician selects which explanators to
include. Otherwise, the tests are the same.
The Breusch–Pagan Test (cont.)
• In the Breusch–Pagan test, the
econometrician selects m explanators to
include in the auxilliary equation.
• Which explanators to include is a
judgment call.
• A good judgment call leads to a more
powerful test than the White test.
• A poor judgment call leads to a poor test.
The Goldfeld–Quandt Test
• Both the White test and the Breusch–Pagan
test focus on smoothly changing variances
for the disturbances.
• The Goldfeld–Quandt test compares
the variance of error terms across
discrete subgroups.
• Under homoskedasticity, all subgroups
should have the same estimated variances.
The Goldfeld–Quandt Test (cont.)
• The Goldfeld–Quandt test compares
the variance of error terms across
discrete subgroups.
• The econometrician must divide the
data into h discrete subgroups.
The Goldfeld–Quandt Test (cont.)
• If the Goldfeld–Quandt test is
appropriate, it will generally be clear
which subgroups to use.
The Goldfeld–Quandt Test (cont.)
• For example, the econometrician
might ask whether men and women’s
incomes vary similarly around their
predicted means, given education
and experience.
• To conduct a Goldfeld–Quandt test,
divide the data into h = 2 groups, one for
men and one for women.
The Goldfeld–Quandt Test (cont.)
(1) Divide the n observations into h groups, of sizes n1..nh
(2) Choose two groups, say 1 and 2.
H 0 :  12   2 2 against H a :  12   2 2
(3) Regress Y against the explanators for group 1.
(4) Regress Y against the explanators for group 2.
Goldfeld–Quandt Test (cont.)
SSRL
SSRS
(5) Relabel the groups as L and S, such that

nL k
nS  k
SSRL
nL  k
Compute G 
SSRS
nS  k
(6) Compare G to the critical value for an F-statistic
with (nL  k) and (nS  k) degrees of freedom.
Goldfeld–Quandt Test: An Example
• Do men and women’s incomes vary
similarly about their respective means,
given education and experience?
• That is, do the error terms for an income
equation have different variances for
men and women?
• We have a sample with 3,394 men and
3,146 women.
Goldfeld–Quandt Test:
An Example (cont.)
(1) Divide the n observations into men and women,
of sizes nm and nw .
(2) We have only two groups, so choose both of them.
H 0 :  m 2   w2 against H a :  m 2   w 2
(3) For the men, regress
log(income)i  0  1edi  2 expi  3 expi 2   i
(4) For the women, regress
log(income)i  0  1edi   2 expi  3 expi 2  vi
Goldfeld–Quandt Test:
An Example (cont.)
(5) sm 2 
sw
2
SSRm 1736.64

 0.5123
n m  k 3394 - 4
SSRw 1851.52


 0.5893
n w  k 3146 - 4
Compute G 
0.5893
 1.15
0.5123
(6) Compare G to the critical value for an F-statistic
with 3142 and 3390 degrees of freedom, which is
0.99997 for the 5% significance level.
We reject the null hypothesis at the 5% level.
WHAT TO DO?
1. Sometimes logging the variables
can solve the problem. Sometimes
not.
2. Use Generalized Least Squares
to estimate the model with
heteroscedasticity.
Generalized Least Squares
• OLS is unbiased, but not efficient.
• The OLS weights are not optimal.
• Suppose we are estimating a straight line
through the origin: Y   X  
• Under homoskedasticity, observations with
higher X values are relatively less distorted by
the error term.
• OLS places greater weight on observations
with high X values.
Figure 10.2 Homoskedastic Disturbances
More Misleading at Smaller X ’s
Generalized Least Squares
• Suppose observations with higher X values
have error terms with much higher variances.
• Under this DGP, observations with high X ’s
(and high variances of ) may be more
misleading than observations with low X ’s
(and low variances of ).
• In general, we want to put more weight on
observations with smaller i2
Heteroskedasticity with Smaller
Disturbances at Smaller X ’s
Generalized Least Squares
• To construct the BLUE Estimator for S, we
follow the same steps as before, but with our
new variance formula. The resulting estimator
is “Generalized Least Squares.”
Start with a linear estimator, wiYi
Impose the unbiasedness conditions,
wi X Ri  0 for R  S, wi X Si  1
Find wi to minimize wi2 i2
Generalized Least Squares (cont.)
• In practice, econometricians choose a
different method for implementing GLS.
• Historically, it was computationally
difficult to program a new estimator
(with its own weights) for every
different dataset.
• It was easier to re-weight the data first,
and THEN apply the OLS estimator.
Generalized Least Squares (cont.)
• We want to transform the data so that
it is homoskedastic. Then we can
apply OLS.
• It is convenient to rewrite the variance
term of the heteroskedastic DGP as
Var(i )   d
2
2
i
Generalized Least Squares (cont.)
• If we know the di factor for each
observation, we can transform the data
by dividing through by di.
• Once we divide all variables by di, we
obtain a new dataset that meets the
Gauss–Markov conditions.
GLS: DGP for Transformed Data
X i i
Yi
1

  0  1
di di
di
di
 i 
E   0
 di 
 i  1
1 2 2
Var    2 Var   i   2  di   2
di
 di  di
 i  j 
1
Cov( i ,  j )  0
Cov  ,  
 di d j  di d j


Xi
fixed across samples.
di
Generalized Least Squares
• This procedure, Generalized Least
Squares, has two steps:
1. Divide all variables by di
2. Apply OLS to the transformed variables
• This procedure optimally weights down
observations with high di’s
• GLS is unbiased and efficient
Generalized Least Squares (cont.)
• Note: we derive the same BLUE
Estimator (Generalized Least Squares)
whether we:
1. Find the optimal weights for
heteroskedastic data, or
2. Transform the data to be
homoskedastic, then use OLS weights
GLS: An Example
• We can solve heteroskedasticity by
dividing our variables through by di.
• The DGP with the transformed data is
Gauss–Markov.
• The catch: we don’t observe di.
How can we implement this strategy
in practice?
GLS: An Example (cont.)
• We want to estimate the relationship
renti  0  1incomei  i
• We are concerned that higher income
individuals are less constrained in how much
income they spend in rent. Lower income
individuals cram into what housing they can
afford; higher income individuals find housing
to suit their needs/tastes.
• That is, Var(i ) may vary with income.
GLS: An Example (cont.)
• An initial guess: Var(i )   ·income
2
2
i
• di = incomei
• If we have modeled heteroskedasticity
correctly, then the BLUE Estimator is:
rent
1
 0
 1  vi
income i
incomei
TABLE 10.1
Rent and Income in New York
TABLE 10.5 Estimating a Transformed
Rent–Income Relationship, var(i )   2 Xi2
Checking Understanding
• An initial guess: Var(i )   ·income
2
2
i
• di = incomei
rent
1
 0
 1  vi
income i
incomei
• How can we test to see if we have
correctly modeled the heteroskedasticity?
Checking Understanding
• If we have the correct model of
heteroskedasticity, then OLS with the
transformed data should be homoskedastic.
rent
1
 0
 1  vi
income i
incomei
• We can apply either a White test or a
Breusch–Pagan test for heteroskedasticity to
the model with the transformed data.
Checking Understanding (cont.)
• To run the White test, we regress
1
1
ei  0  1
 2
 i
2
incomei
incomei
• nR2 = 7.17
• The critical value at the 0.05 significance level
for a Chi-square statistic with 2 degrees of
freedom is 5.99
• We reject the null hypothesis.
GLS: An Example
2
2
Var(

)


·income
• Our initial guess:
i
i
• This guess didn’t do very well. Can we
do better?
• Instead of blindly guessing, let’s try
looking at the data first.
Figure 10.4 The Rent–Income Ratio
Plotted Against the Inverse of Income
GLS: An Example
• We seem to have overcorrected
for heteroskedasticity.
• Let’s try Var(i )   ·incomei
2
rent
income i
 0
1
income i
 1 incomei  vi
TABLE 10.6 Estimating a Second Transformed
Rent–Income Relationship, var(i )   2 Xi
GLS: An Example
• Unthinking application of the White test
procedures for the transformed data leads to
1
1
ei   0  1
 2
  3 income
incomei
income i
1
  4 incomei   5
income  i
income i
• The interaction term reduces to a constant,
which we already have in the auxilliary
equation, so we omit it and use only the first
4 explanators.
GLS: An Example (cont.)
• nR2 = 6.16
• The critical value at the 0.05 significance level
for a Chi-squared statistic with 4 degrees of
freedom is 9.49
• We fail to reject the null hypothesis that the
transformed data are homoskedastic.
• Warning: failing to reject a null hypothesis
does NOT mean we can “accept” it.
GLS: An Example (cont.)
• Generalized Least Squares is not trivial
to apply in practice.
• Figuring out a reasonable di can be
quite difficult.
• Next time we will learn another
approach to constructing di , Feasible
Generalized Least Squares.
Review
• In this lecture, we began relaxing the Gauss–
Markov assumptions, starting with the
assumption of homoskedasticity.
• Under heteroskedasticity, Var(i )   2 di
– OLS is still unbiased
– OLS is no longer efficient
– OLS e.s.e.’s are incorrect, so C.I., t-, and
F- statistics are incorrect
Review (cont.)
• Under heteroskedasticity,
2
2 2
ˆ
Var( )   w d
i
i
• For a straight line through the origin,
2 2

X
OLS
2
i di
ˆ
Var (  )  
2 2
(X i )
Review (cont.)
• We can use squared residuals to test for
heteroskedasticity.
• In the White test, we regress the
squared residuals against all
explanators, squares of explanators,
and interactions of explanators. The
nR2 of the auxilliary equation is
distributed Chi-squared.
Review (cont.)
• The Breusch–Pagan test is similar, but
the econometrician chooses the
explanators for the auxilliary equation.
Review (cont.)
• In the Goldfeld–Quandt test, we first
divide the data into distinct groups, and
conduct our OLS regression on each
group separately.
• We then estimate s2 for each group.
• The ratio of two s2 estimates is
distributed as an F-statistic.
Review (cont.)
• Under heteroskedasticity, the BLUE Estimator
is Generalized Least Squares
• To implement GLS:
1. Divide all variables by di
2. Perform OLS on the transformed variables
• If we have used the correct di , the
transformed data are homoskedastic.
We can test this property.