Marietta College

Download Report

Transcript Marietta College

Marietta College
Spring 2011
Econ 420: Applied
Regression Analysis
Dr. Jacqueline Khorassani
Week 14
1
Tuesday, April 12
Exam 3: Monday, April 25, 12- 2:30PM
Bring your laptops to class on
Thursday too
2
Collect Asst 21
Use the data set FISH in Chapter 8 (P 274) to
run the following regression equation:
F = f (PF, PB, Yd, P, N)
1) Conduct all 3 tests of imperfect
multicollinearity problem and report
your results.
2) If you find an evidence for imperfect
multicollinearity problem, suggest and
implement a reasonable solution.
3
Use EViews
•
•
•
•
•
•
Open FISH in Chapter 8
Run P = f (PF, PB, Yd, N)
Click on view on regression output
Click on actual, fitted, residual
Click on residual graph
Do you suspect the residuals to be
autocorrealted?
4
This is what you should have got
.4
.3
.2
.1
.0
-.1
-.2
-.3
Positive residual
-.4
is followed by -.5
46
positive
residual
possible positive
autocorrelation
48
50
52
54
56
58
60
62
64
66
68
70
P Residuals
5
Causes of Impure Serial Correlation
1. Wrong functional form
–
Example: effect of age of the house on its price
2. Omitted variables
–
Example: not including wealth in the consumption
equation
3. Data error
6
Cause of Pure Serial Correlation
•
Lingering shock over time
–
–
–
War
Natural disaster
Stock market crash
7
Consequences of Pure Autocorrelation
• Unbiased estimates but wrong
standard errors
–In case of positive autocorrelation
standard error of the estimated
coefficients drops
–Consequences on the t-test of
significance?
8
Consequences of Impure Autocorrelation
• Biased estimates
• Plus wrong standard errors
9
Let’s look at first order serial
correlation
єt = ρ єt-1 + ut
ρ (row) is first order autocorrelation coefficient
It takes a value between -1 to +1
u2 is a normally distributed error with the mean
of zero and constant variance
10
A Formal Test For First Order Autocorrelation
•
•
•
•
Durbin-Watson test
Estimate the regression equation
Save the residuals, e
Then calculate the Durbin -Watson Stat (d stat)
• d stat ~ 2 (1- ρ)
• What is dstat under perfect positive correlation?
ρ = +1  d = 0
• What is dstat under perfect negative correlation?
ρ = -1  d = 4
• What is dstat under no autocorrelation?
ρ=0d=2
• What is the range of values for dstat?
0 to 4
11
If 2>dstat>0 then suspect (test for)
positive autocorrelation
If 4>dstat>2 then suspect (test for)
negative autocorrelation
dstat=0
Perfect positive
autocorrelation
dstat=2
No autocorrelation
12
dstat=4
Perfect negative
autocorrelation
EViews calculates d-stat automatically
• It is included in your regression output
• Run P = f (PF, PB, Yd, N)
• Do you see the d-stat?
13
Dependent Variable: P
Method: Least Squares
Date: 04/12/11 Time: 08:59
Sample: 1946 1970
Included observations: 25
Variable
C
PF
PB
YD
N -
Coefficient
-2.083188
0.027143
-0.012571
0.001597
5.54E-05
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
Std. Error
0.271658
0.017355
0.011620
0.000387
1.27E-05
0.801154
0.761384
0.182774
0.668123
9.803514
20.14505
0.000001
t-Statistic
-7.668417
1.563934
-1.081865
4.132263
-4.376214
Prob.
0.0000
0.1335
0.2922
0.0005
0.0003
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
What type of serial
correlation shall we test
for?
Positive
0.160000
0.374166
-0.384281
-0.140506
-0.316668
1.498086
14
• If d stat<2, test for positive autocorrelation.
• Null and alternative hypotheses
– H0: ρ≤0 (no positive auto)
– HA: ρ>0 (positive auto)
• Choose the level of significance (say 5%)
• Critical dstat (PP 591- 593)
• Decision rule
– If dstat< dL reject H0  there is significant positive first
order autocorrelation
– If dstat> dU  don’t reject H0  there is no evidence of a
significant autocorrelation
– if dstat is between dL and du  the test is inconclusive.
15
Dependent Variable: P
Method: Least Squares
Date: 04/12/11 Time: 08:59
Sample: 1946 1970
Included observations: 25
Variable
C
PF
PB
YD
N -
Coefficient
-2.083188
0.027143
-0.012571
0.001597
5.54E-05
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
Std. Error
0.271658
0.017355
0.011620
0.000387
1.27E-05
0.801154
0.761384
0.182774
0.668123
9.803514
20.14505
0.000001
t-Statistic
-7.668417
1.563934
-1.081865
4.132263
-4.376214
Prob.
0.0000
0.1335
0.2922
0.0005
0.0003
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
N = 25, K = 4
At 5% level
dL= 1.04, dU =1.77
dstat is between dL and
du  the test is
inconclusive
0.160000
0.374166
-0.384281
-0.140506
-0.316668
1.498086
16
H0: ρ≤0 (no positive auto)
HA: ρ>0 (positive auto)
level of significance = 5%
Critical d-stat
dL =1.04
dU = 1.77
Decision
dstat is between dL and du  the test is inconclusive
Reject H0
DWstat=0
Perfect positive
autocorrelation
inconclusive
Fail to reject H0
1.04 1.5 1.77
DWstat=2
DWstat=4
Perfect negative
autocorrelation
No autocorrelation
17
• If dstat >2, you will to test for negative
autocorrelation.
• Null and alternative hypotheses
– H0: ρ≥0 (no negative auto)
– HA: ρ<0 (negative auto)
• Choose the level of significance (1% or 5%)
• Critical dstat (page 591- 593)
• Decision rule
– If dstat>4-dL reject H0  there is significant negative first
order autocorrelation
– If dstat< 4-dU  don’t reject H0  there is no evidence of a
significant autocorrelation
– if dstat is between 4 – dL and 4 – du  the test is
inconclusive.
18
Example
d-sta >2  test
for negative
autocorrelation
Dependent Variable: CONSUMPTION
Method: Least Squares
Date: 11/09/08 Time: 20:11
Sample: 1 30
Included observations: 30
Variable
C
INCOME
WEALTH
Coefficient
16222.97
0.641166
0.148788
Std. Error
5436.061
0.166878
0.041327
t-Statistic
2.984324
3.842131
3.600281
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.847738
0.836459
12660.43
4.33E+09
-324.3748
75.16274
0.000000
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
52347.37
31306.54
21.82499
21.96511
21.86982
2.211726
19
Prob.
0.0060
0.0007
0.0013
Let’s test for autocorrelation at 1% level in
our example
H0: ρ≥0 (no negative auto)
HA: ρ<0 (negative auto)
•
•
•
•
1% level of significance, k=2, n=30
dL=1.07, du= 1.34
4- dL=2.93, 4- du= 2.66
dstat < 4- du, don’t reject H0
20
Asst 22: Due Thursday
• Use the data on Soviet Defense spending
(Page 335– Data set: DEFEND Chapter 9) to
regress SDH on SDL, UDS and NR only.
1. Conduct a Durbin-Watson test for serial
correlation at 5% level of significance
2. If you find an evidence for autocorrelation, is
it more likely to be pure or impure
autocorrelation? Why?
21
Thursday April 15
• Exam 3: Monday, April 25, 12- 2:30PM
• Bring your laptops to class next Tuesday
22
Collect Asst 22
• Use the data on Soviet Defense spending
(Page 335– Data set: DEFEND Chapter 9) to
regress SDH on SDL, USD and NR only.
1. Conduct a Durbin-Watson test for serial
correlation at 5% level of significance
2. If you find an evidence for autocorrelation, is
it more likely to be pure or impure
autocorrelation? Why?
23
Solutions for Autocorrelation Problem
•
•
If the D-W test indicates autocorrelation
problem
What should you do?
24
1.
Adjust the functional form
• Sometimes autocorrelation is because we use a linear form while we
should have used a non-linear form
With a linear line, errors
have formed a pattern
revenue
The first 3 observations
have positive errors
3
*
2
1
*
The last 2 observations
have negative errors
4
*
5
Revenue curve is not
linear (It is bell shaped)
* What should we use?
*
Price
25
2. Add other relevant (missing) variables
• Sometimes autocorrelation is caused by omitted variables.
We forget to include
wealth in our model
consumption
3
*
*
2
1
*
4
*
In year one (obs. 1)
wealth goes up
5 drastically big positive
error
The effect of the increase
in wealth in year 1 lingers
for 3 years
*
Errors form a pattern
We should include wealth
in our model
Income
26
3. Examine the data
• Any systematic error in the collection or
recording of data may result in
autocorrelation.
27
After you make adjustments 1, 2 and 3
• Test for autocorrelation again
• If autocorrelation is still a problem then
suspect pure autocorrelation
– Follow the Cochrane-Orcutt procedure
– Say what?????
28
Suppose our model is
Yt = β0 + β1 Xt + єt
(1)
And the error terms in Equation 1 are
correlated
єt = ρ є t-1 + ut
(2)
Where ut is not auto-correlated. Rearranging 2 we
get 3
єt - ρ є t-1 = ut
(3)
Let’s lag Equation 1
Yt-1 = β0 + β1 Xt-1 + єt-1
(4)
29
Now multiply Equation 4 by ρ
ρ Yt-1 = ρ β0 + ρ β1 Xt-1 + ρ єt-1
(5)
Now subtract 5 from 1 to get 6
Yt = β0 + β1 Xt + єt
- ρ Yt-1 = - ( ρ β0 + ρ β1 Xt-1 + ρ єt-1)
___________________________________
Yt - ρ Yt-1 = β0 - ρ β0 + β1 Xt - ρ β1 Xt-1 + єt - ρ єt-1
(6)
Note that the last two terms in Equation 6 are equal to Ut
So 6 becomes
Yt - ρ Yt-1 = β0 - ρ β0 + β1 (Xt - ρ Xt-1 ) + ut
(7)
30
Yt - ρ Yt-1 = β0 - ρ β0 + β1 (Xt - ρ Xt-1 ) + ut
(7)
• What is so special about the error term in
Equation 7?
It is not auto-correlated
• So, instead of equation 1 we can estimate
equation 7
Define Zt = Yt – ρYt-1 & Wt = Xt – ρXt-1
Then 7 becomes
Zt = M + β1 Wt + ut
(8)
Where M is a constant = β0 (1- ρ)
Notice that the slope coefficient of Equation 8 is the same as
the slope coefficient of our original equation 1.
31
The Cochrane-Orcutt Method:
So our job will be
Step 1: Apply OLS to the original model
(Equation 1) and find the residuals et
Step 2: Use ets to estimate Equation 2 and
find ρ^ (Note: this equation does not have
an intercept.)
Step 3: Multiply ρ^ by Yt-1 and Xt-1 & find Zt
& Wt
Step 4: Estimate Equation 8
32
• Luckily
• EViews does this (steps 1- 4) automatically
• All you need to do is to add AR(1) to the set
of your independent variables.
• The estimated coefficient of AR(1) is ρ^
• Let’s apply this procedure to Asst 22
33
Dependent Variable: SDH
Dependent Variable: SDH
Variable
Coefficient Std. Error
t-Statistic
Prob.
C
8.83
2.50
3.52
0.0020
SDL
0.97
0.04
22.18
0.0000
USD
NR
-0.005
0.002
0.008
0.0002
-0.60
9.30
R-squared
Adjusted R-squared
0.996792
0.996334
Durbin-Watson stat
1.076364
Variable
Coefficient Std. Error t-Statistic Prob.
C
-9.11
8.4
-1.08
0.2940
SDL
1.38
0.17
8.10
0.0000
USD
6.71E-05
0.013
0.005
0.9959
NR
0.0005
0.0004
1.46
0.1608
AR(1)
0.82
0.10
8.002
0.0000
0.5553
0.0000
What happened to standard errors as we
corrected for serial correlation?
They went up
Positive autocorrelation standard error
R-squared
Adjusted R-squared
Durbin-Watson stat
0.997927
0.997490
2.463339
What is this?
It is ρ^
34
Return and discuss Asst 21
Use the data set FISH in Chapter 8 (P 274) to
run the following regression equation:
F = f (PF, PB, Yd, P, N)
1) Conduct all 3 tests of imperfect
multicollinearity problem and report
your results.
2) If you find an evidence for imperfect
multicollinearity problem, suggest and
implement a reasonable solution.
35
Correlation Matrix
F
P
PB
PF
YD
N
F
P
PB
PF
YD
N
1
0.58
0.82
0.85
0.79
0.74
1
0.66
0.73
0.78
0.57
1
0.96
0.82
0.78
1
0.92
0.88
1
0.93
1
First test
PF is more correlated with
PB than with F  PF is a
problem
Yd is more correlated with
PB and PF than with F Yd
is a problem
N is more correlated with PB,
PF and Yd than with F N is
a problem
PB is more correlated with
PF than with F  PB is a
problem
P is more correlated with
everything else than with
F P is a problem
36
Second test:
problem
areas:
Correlation Matrix
F
P
PB
PF
YD
F
P
PB
PF
YD
N
1
0.58
0.82
0.85
0.79
0.74
1
0.66
0.73
0.78
0.57
1
0.96
0.82
0.78
1
0.92
0.88
1
0.93
N
PF and PB
PF and Yd
PF and N
PB and Yd
Yd and N
1
Note: F being highly correlated
with independent variables is a
good thing not a bad thing
37
Test 3
• Need 5 regression equations
1. PF = f (P, Yd, PB, N)
2. P = f (PF, Yd, PB, N)
3. Yd = f (P, PF, PB, N)
4. PB = f (PF, Yd, P, N)
5. N = f (PF, Yd, PB, P)
• For all find R2 then find VIF
• For all VIF>5  Each independent variable is
highly correlated with the rest
38
Solutions
1. Increase sample size
–
Note: we want at least a df= 30, we have df=19
2. Do we have an irrelevant variable?
–
–
–
Seth argued N is not needed?
What is N? (P 273)
Seth, what was your argument?
3. Generate a new variable that measures the ratio of
prices
–
Makes sense but doesn’t solve the high correlation
between Yd and N
Note: make sure your transformed variable makes sense
–
•
–
That is the estimated coefficient has a meaning that people can
understand
The ratio PF/Yd makes no sense
39