No Slide Title

Download Report

Transcript No Slide Title

8.1
Lecture #8
Studenmund(2006): Chapter 8
Objectives
•
•
•
•
Perfect and imperfect multicollinearity
Effects of multicollinearity
Detecting multicollinearity
Remedies for multicollinearity
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.2
The nature of Multicollinearity
Perfect multicollinearity:
When there are some functional relationships existing
among independent variables, that is
or
 iXi = 0
1X1+ 2X2 + 3X3 +…+ iXi = 0
Such as 1X1+ 2X2 =
0  X1= -2X2
If multicollinearity is perfect, the regression coefficients of
the Xi variables, is, are indeterminate and their standard
errors, Se(i)s, are infinite.
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.3
Example:
3-variable Case:
If
x2 =  x1,
(yx1)(x22) - (yx2)(x1x2)
^ =
1
(x12)(x22) - (x1x2)2
^
1 =
^
2 =
Similarly
^ ^
^
Y = 0 + 1X1 + 2X2 + ^
(yx1)(2x12) - (yx1)(x1x1)
(x1
2)(2
x1 2)
2(x
2
x
)
1 1
(yx2)(x12) - (yx1)(x1x2)
=
0
0
Indeterminate
(x12)(x22) - (x1x2)2
(yx1)(x12) - (yx1)(x1x1)
If x2 = x1 ^
2 =
(x12)(2 x12) - 2(x1x1)2
All right reserved by Dr.Bill Wan Sing Hung - HKBU
=
0
0
Indeterminate
8.4
If multicollinearity is imperfect,
x2 = 1 x1+ 
x2 = 0+ 1 x1+  )
(or
where  is a stochastic error
Then the regression coefficients, although determinate,
possess large standard errors, which means the
coefficients can be estimated but with less accuracy.
^
1 =
(yx1)(2x12 +  2 ) - ( yx1 + y )( x1x1+ x1 )
(x12)(2 x12 +  2 ) - ( x1x1 + x1 )2
0
=0
(Why?)
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.5
Example: Production function
Yi = 0 + 1X1i + 2X2i + 3X3i + i
Y
X1
X2
X3
Y: Output
122
10
50
52
X1: Capital
170
15
75
75
202
18
90
97
270
24
120
129
330
30
150
152
X2: Labor
X3: Land
X1 = 5X2
All right reserved by Dr.Bill Wan Sing Hung - HKBU
Example: Perfect multicollinearity
8.6
a. Suppose D1, D2, D3 and D4 = 1 for spring,
summer, autumn and winter, respectively.
Yi = 0 + 1D1i + 2D2i + 3D3i + 4D4i + 1X1i + i.
b. Yi = 0 + 1X1i + 2X2i + 3X3i + i
X1: Nominal interest rate;
X2: Real interest rate;
X3: CPI
c. Yt = 0 + 1Xt + 2Xt + 3Xt-1 + t
Where Xt = (Xt – Xt-1) is called “first different”
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.7
Imperfect Multicollinearity
Yi = 0 + 1X1i + 2X2i + … + KXKi + i
When some independent variables are linearly
correlated but the relation is not exact, there is
imperfect multicollinearity.
0 + 1X1i + 2X2i +  + KXKi + ui = 0
where u is a random error term and k  0 for
some k.
When will it be a problem?
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.8
Consequences of imperfect multicollinearity
1. The estimated coefficients are still BLUE,
however, OLS estimators have large variances and covariances,
thus making the estimation with less accuracy.
2. The estimation confidence intervals tend to be
much wider, leading to accept the “zero null
hypothesis” more readily.
3. The t-statistics of coefficients tend to be
statistically insignificant.
4. The R2 can be very high.
5. The OLS estimators and their standard errors
can be sensitive to small change in the data.
All right reserved by Dr.Bill Wan Sing Hung - HKBU
Can be
detected from
regression
results
8.9
OLS estimators are still BLUE
under imperfect multicollinearity
Why???
Remarks:
•Unbiasedness is a repeated sampling property, not
about the properties of estimators in any given
sample
•Minimum variance does not mean small variance
•Imperfect multicollinearity is just a sample
phenomenon
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.10
Effects of Imperfect Multicollinearity
Unaffected:
a. OLS estimators are still BLUE.
b. The overall fit of the equation
c. The estimation of the coefficients of
non-multicollinear variables
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.11
The variances of OLS estimators increase
with the degree of multicollinearity
Regression model:
Yi = 0 + 1X1i + 2X2i + i
High correlation between X1 and X2
Difficult to isolate effects of X1 and X2 from
each other
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.12
Closer relation between X1 and X2
 larger r212
 larger VIF
 larger variances
where VIFk = 1/(1-Rk²), k=1,...,K and Rk² is
the coefficient of determination of regressing
Xk on all other (K-1) explanatory variables.
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.13
All right reserved by Dr.Bill Wan Sing Hung - HKBU
 
ˆ 
Larger var 
k
8.14
a. More likely to get unexpected signs.
  
b. se ˆ k
2
tends to be large
Larger variances tend to increase the
standard errors of estimated coefficients.
c. Larger standard errors  Lower t-values
ˆ  *
k
k
tk 
se ˆ k
 
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.15
d. Larger standard errors
 Wider confidence intervals
 
ˆ k  t df , / 2  se ˆ k
Less precise interval estimates.
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.16
Detection of Multicollinearity
Example: Data set: CONS8 (pp. 254 – 255)
COi = 0 + 1Ydi + 2LAi + i
CO: Annual consumption expenditure
Yd: Annual disposable income
LA: Liquid assets
All right reserved by Dr.Bill Wan Sing Hung - HKBU
Studenmund (2006) - Eq. 8.9, pp254
Since LA (liquid assets, saving, etc.)
is highly related to YD (disposable income)
Results:
High R2 and Adjusted R2
Less significant t-values
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.17
Drop one variable
8.18
OLS estimates and SE’s can be sensitive to
specification and small changes in data
Specification changes:
Add or drop variables
Small changes:
Add or drop some observations
Change some data values
All right reserved by Dr.Bill Wan Sing Hung - HKBU
High Simple Correlation
Coefficients
rij 
8.19
X  X X  X 
 X  X   X  X 
i
i
j
j
2
i
i
2
j
j
Remark: High rij for any i and j is a sufficient indicator for
the existence of multicollinearity but not necessary.
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.20
Variance Inflation Factors (VIF) method
Procedures:
(1)
Y  0  1 X1  2 X 2  . . .  k X k  
(2)
X1  1   2 X 2  3 X 3  ...  k X k 
Obtain
(3)
2
k
R
 
VIF ˆk 
Rule of thumb: VIF
1
2
1  Rk
> 5  multicollinearity
Notes: (a.) Using VIF is not a statistical test.
(b.) The cutting point is arbitrary.
All right reserved by Dr.Bill Wan Sing Hung - HKBU
Remedial Measures
8.21
1. Drop the Redundant Variable
Using theories to pick the variable(s) to drop.
Do not drop a variable that is strongly supported by theory.
(Danger of specification error)
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.22
Insignificant
Insignificant
Since M1 and M2
are highly related
Other examples: CPI <=> WPI;
CD rate <=> TB rate
GDP AllGNP

GNI
right reserved
by Dr.Bill
Wan Sing Hung - HKBU
8.23
Check after dropping variables:
•
•
•
The estimation of the coefficients of other variables are not
affected. (necessary)
R2 does not fall much when some collinear variables are dropped.
(necessary)
More significant t-values vs. smaller standard errors (likely)
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.24
2. Redesigning the Regression Model
There is no definite rule for this method.
Example (Studenmund(2006), pp.268)





Ft  f ( PFt , PBt Yd t , N t , Pt )  t
Ft = average pounds of fish consumed per capita
PFt = price index for fish
PBt = price index for beef
Ydt = real per capita disposable income
N = the # of Catholic
P = dummy = 1 after the Pop’s 1966 decision, = 0 otherwise
Ft   0  1 PFt   2 PBt   3 LnYdt   4 N t   5 Pt   t
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.25
High correlations
Signs are unexpected
Most t-values are insignificant
VIFPF = 43.4
VIFlnYd =23.3
VIFPB = 18.9
VIFN =18.5
VIFP =4.4
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.26
Drop N, but not improved
Improved
Use the Relative Prices
(RPt = PFt/PBt)



Ft  f ( RPt , Ydt , P t )   t
Ft = 0 + 1RPt + 2lnYdt + 3Pt + t
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.27
Improved much
Using the lagged term of RP to allow the lag effect
in the regression
Ft = 0 + 1RPt-1 + 2lnYdt + 3Pt + t
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.28
3. Using A Priori Information
From previous empirical work, e.g.
Consi = 0 + 1Incomei + 2Wealthi + i
and a priori information: 2 = 0.1.
Then construct a new variable or proxy,
(Cons*i = Consi – 0.1Wealthi)
Run OLS: Cons*i = 0 + 1Incomei + i
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.29
4. Transformation of the Model
Taking first differences of time series data.
Origin regression model:
Yt = 0 + 1X1t + 2X2t + t
Transforming model: First differencing
Yt = ’0 +’1X1t + ’2X2t + ut
Where Yt = Yt- Yt-1, (Yt-1 is called a lagged term)
X1t = X1t- X1,t-1, X2t = X2t- X2,t-1,
All right reserved by Dr.Bill Wan Sing Hung - HKBU
8.30
5. Collect More Data (expand sample size)
Larger sample size means smaller variance of
estimators.
6. Doing Nothing:
Unless multicollearity causes serious biased, and
the change of specification give better results.
All right reserved by Dr.Bill Wan Sing Hung - HKBU