No Slide Title

Download Report

Transcript No Slide Title

Econ 140
Heteroskedasticity
Lecture 17
Lecture 17
1
Today’s plan
Econ 140
• How to test for it: graphs, Park and Glejser tests
• What we can do if we find heteroskedasticity
• How to estimate in the presence of heteroskedasticity
Lecture 17
2
Palm Beach County revisited
Econ 140
• How far is Palm Beach an outlier?
– Can the outlier be explained by heteroskedasticity?
– If so, what are the consequences?
• Heteroskedasticity will affect the variance of the regression
line
– It will consequently affect the variance of the estimated
coefficients and estimated 95 percent confidence
interval for the prediction (see Lecture 10).
• L17.xls provides an example of how to work through a
problem like this using Excel
Lecture 17
3
Palm Beach County revisited (2)Econ 140
• Palm Beach is a good example to use since there are scale
effects in the data
– The voting pattern shows that the voting behavior and
number of registered voters are related to the
population in each county
• As the county gets larger, voting patterns may diverge
from what would be assumed given the number of
registered voters
– Note from the graph: as we move away from the origin,
the difference between registered Reform voters and
Reform votes cast increases
– We’ll hypothesize that this will have an affect on
heteroskedasticity
Lecture 17
4
Notation
Econ 140
• Heteroskedasticity is observed as cross-section variability
in the data
– data across units at point in time
• In our notation, heteroskedasticity is:
E(ei2)  2
• We can also write:
E(ei2) = i2
– This means that we expect variable variance: the
variance changes with each unit of observation
Lecture 17
5
Consequences
Econ 140
When heteroskedasticity is present:
1) OLS estimator is still linear
2) OLS estimator is still unbiased
3) OLS estimator is not efficient - the minimum
variance property no longer holds
4) Estimates of the variances are biased
5) 2
eˆ 2

ˆ YX 
n  k is not an unbiased estimator of YX2
6) We can’t trust the confidence intervals or
hypothesis tests (t-tests & F-tests): we may draw the
wrong conclusions
Lecture 17
6
Consequences (2)
Econ 140
• When BLUE holds and there is homoskedasticity, the firstorder condition gives:

xi
ci 
 xi2
V bˆ   ci2 2 where
• With heteroskedasticity, we have:
 
V ˆ   ci2 i2
• If we substitute the equation for ci to both equations, we
find:
2


ˆ  i
V bˆ 
and
V

2
2
x
x
 i
 i

Lecture 17
 
7
Cases
Econ 140
• With homoskedasticity: around each point, the variance
around the regression line is constant
• With heteroskedasticity: around each point, the variance
around the regression line varies with each value of the
independent variable (with each i)
Lecture 17
8
Detecting heteroskedasticity
Econ 140
• There are three ways of detecting heteroskedastiticy:
1) Graphically
2) Park Test
3) Glejser Test
Lecture 17
9
Graphical detection
Econ 140
• Graph the errors (or error squared) against the independent
variable(s). Note: you can use either e or e2 on the y-axis.
• With homoskedasticity we have E(ei, X) = 0 :
• The errors are independent of the independent variables
• With heteroskedasticity we can get a variety of patterns
• The errors show a systematic relationship with the
independent variables
Lecture 17
10
Graphical detection (2)
Econ 140
• Using the Palm Beach example (L17.xls), the estimated
regression equation was:
Yˆ  50.28  2.45 X
• The errors of this equation, eˆ  Y  Yˆ can be graphed
against the number of registered Reform party voters, (the
independent variable)
– Graph shows that the errors increasing with the number
of registered reform voters
• While the graphs may be convincing, we also want to use a
test to confirm this. We have two:
Lecture 17
11
Park Test
Econ 140
• Procedure:
1) Run regression Yi = a + bXi + ei despite the
heteroskedasticity problem (it can also be multivariate)
2) Obtain residuals (ei), square them (ei2), and take their
logs (ln ei2)
3) Run a spurious regression: ln ei2  g 0  g1 ln X i  vi
4) Do a hypothesis test on gˆ1 with H0: g1 = 0
5) Look at the results of the hypothesis test:
• reject the null: you have heteroskedasticity
2
ln
e
• fail to reject the null: homoskedasticity, or
i  g0
which is a constant
Lecture 17
12
Glejser Test
Econ 140
• When we use the Glejser, we’re looking for a scaling effect
• The procedure:
1) Run the regression (it can also be multivariate)
2) Collect ei terms
3) Take the absolute value of the errors
4) Regress |ei| against independent variable(s)
• you can run different kinds of regressions:
ei  g 0  g1 X i  ui
or ei  g 0  g1 X i  ui
1
or ei  g 0  g1
 ui
Xi
Lecture 17
13
Glejser Test (2)
Econ 140
4) [continued]
• If heteroskedasticity takes one of these forms, this
will suggest an appropriate transformation of the
model
• The null hypothesis is still H0: g1 = 0 since we’re
testing for a relationship between the errors and the
independent variables
• We reach the same conclusions as in the Park Test
Lecture 17
14
A cautionary note
Econ 140
• The errors in the Park Test (vi) and the Glejser Test (ui)
might also be heteroskedastic.
– If this is the case, we cannot trust the hypothesis test
H0: g1 = 0 or the t-test
• If we find heteroskedastic disturbances in the data, what
can we do?
– Estimate the model Yi = a + bXi + ei using weighted
least squares
– We’ll look at two examples of weighted least squares:
one where we know the true variance, and one where
we don’t
Lecture 17
15
Correction with known i2
Econ 140
• Given that the true variance is known and our model is:
Yi = a + bXi + ei
• Consider the following transformation of the model:
Yi
i
a
1
i
b
Xi
i

ei
i
– In the transformed model, let ei  i  ui
– So the expected value of the error squared is:
 
E ui 2 
Lecture 17
E (ei2 )
 i2
16
Correction with known i2 (2)
Econ 140
• Given that there is heteroskedasticity, E(ei2) = i2
– thus:
2

E (ui2 )  i2  1
i
• In this simplistic example, we re-weighted model by the
constant i
• What this example shows: when the variance is known, we
must transform our model to obtain a homoskedastic error
term.
Lecture 17
17
Correction with unknown i2
Econ 140
• Given an unknown variance, we need to state the ad-hoc
but plausible assumptions with our variance i2 (how the
errors vary with the independent variable)
• For example: we can assert that E(ei2) = 2Xi
• Remember: Glejser Test allows us to choose a relationship
between the errors and the independent variable
Yi
Xi
ei
1
a
b

Xi
Xi
Xi
Xi
Lecture 17
18
Correction with unknown i2 (2)
Econ 140
• In this example you would transform the estimating
equation by dividing through by X i to get:
Yi
Xi
ei
1
a
b

Xi
Xi
Xi
Xi
• Letting:
ei

Xi
– The expected value of this error squared is:
   
E i2
Lecture 17
E ei2

Xi
19
Correction with unknown i2 (3)
Econ 140
• Recalling an earlier assumption, we find:


 
E i2
E ei2  2 X i


2
Xi
Xi
• When we don’t know the true variance we re-scale the
estimating equation by the independent variable
Lecture 17
20
Returning to Palm Beach
Econ 140
• On L17.xls we have presidential election data by county in
Florida
– To get a correct estimating equation, we can run a
regression without Palm Beach if we think it’s an
outlier.
– Then we can see if we can obtain a prediction for the
number of reform votes cast in Palm Beach
– We can perform a Glejser Test for the regression
excluding Palm Beach
– We run a regression of the absolute value of the errors
(|ei|)against registered Reform voters (Xi)
Lecture 17
21
Returning to Palm Beach (2)
Econ 140
• The t-test rejects the null
– this indicates the presence of heteroskedasticity
• We can re-scale the model in different ways or introduce a
new independent variable (such as the total number of
registered voters by county)
• Keep transforming the model and running the Glejser Test
– When we fail to reject the null: there is no longer
heteroskedasticity in the model
Lecture 17
22
Robust estimation
Econ 140
• Heteroskedastic tests not used any more. Most software
reports robust standard errors. Note that this is also the
approach of the text book.
• Have looked at tests for heteroskedasticity to get you used
to weighted least squares. Important for the topics to
come.
• Robust standard errors report approximations to the
estimation of the variance for the coefficient when there is
a non-constant variance. It only holds for large samples.
• Know that for a homoskedastic error term Var(ui|Xi) = 2:
Var() = 2/Sxi2
Lecture 17
23
Robust estimation (2)
Econ 140
• Using analogous arguments, we can state that for the
heteroskedastic case: Var(ui|Xi) = i2:
Var() = i2 Sxi2 /(Sxi2)2
• This can be approximated (in the bi-variate model case)
by:
Var() = Sxi2ui2 /(Sxi2)2
• See L17_robust.xls and hetero.pdf to compare the results
from calculating the robust standard error on the
spreadsheet using EXCEL and the results from STATA for
robust estimation.
Lecture 17
24
Summary
Econ 140
• Even with re-weighted equations, we might still have
heteroskedastic errors
– so we have to rerun the Glejser Test until we cannot
reject the null
• If we cannot reject the null, we may have to rethink our
model transformation
– if we suspect a scale effect, we may want to introduce
new scaling variables
• Variables from the re-scaled equation are comparable with
the coefficients from the original model
Lecture 17
25