Conclusion to Bivariate Linear Regression

Download Report

Transcript Conclusion to Bivariate Linear Regression

Conclusion to Bivariate Linear
Regression
Economics 224 – Notes for November 19, 2008
Reporting regression results
• Equation format OR table format. For each of these:
– Make sure you define x and y, with the units for each also
provided. In your report, make this accessible.
– Report the sample size and units of observation.
– Report the standard errors or t-statistics associated with
each of the regression coefficients.
– Report the coefficient of determination, along with its
statistical significance. ANOVA table could be provided for
a fuller report.
– Each step involves reorganizing the results from Excel or
other statistical programs to one of these conventions.
– Don’t report too many or too few decimals!
Equation format
• Income and alcohol example. x is mean family income per
capita in dollars, 1986 and y is alcohol consumption in litres of
alcohol per capita for those aged 15 or over, 1985-86. n = 10
observations from the ten provinces of Canada.
• The regression equation, with standard errors reported in
brackets, is as follows. R2 for this equation is 0.625 with a Pvalue of 0.0065.
yˆ  0.835 0.276x
(2.332) (0.076)
• Alternatively, the t statistic could be reported in the brackets –
make sure you indicate whether it is the standard errors or t
statistics that are reported in the brackets.
Table format
Dependent variable is wages and salaries
Variable
Estimated
Coefficient
Standard Error
Probability
Value
Constant
-13,493
23,211
0.568
4,181
1,606
0.017
Yrs schooling
R2 = 0.253, P = 0.017
(+ Other equation test-statistics)
Could be t-statistics
4
Presenting Multiple Results
Dependent variable is wages and salaries
Variable
Constant
Yrs schooling
Equation I
-13,493
(23.211)
4,181
(1,606) †
R2
0.253
Significance
0.017
Equation II
Equation III
….
….
….
….
…
…
(+ Other equation test-statistics)
Note: Standard errors in brackets. * – significant at the 1% level,
† – significant at the 5% level, ‡ – significant at the 10% level
5
Residual analysis
• The t-test and F-test theoretically only work if the
assumptions about the error term are met.
E(ε) = 0;
Variance (ε) = σ2 is constant for each x;
Values of ε are independent of each other.
ε is normally distributed.
• If these assumptions are not met:
– Must correct how our model is constructed.
– Or, must come up with a new estimator other than OLS
and work on correcting the problems –> Econ 324 and up.
• Can’t see true ε’s –> must look at our estimates:
– Estimated residuals are ei = yi – ŷi.
– Best way: plot them versus xi or ŷi using Excel or another
program.
6
e  y  yˆ
Residuals (e) for years of schooling (x) and wages
and salaries regression
yˆ  13,493 4,181x
40000
e
30000
20000
e y
10000
0
0
-10000
-20000
-30000
-40000
-50000
5
10
15
20
25
x
x
17
12
12
11
15
15
19
15
20
16
18
11
14
12
14.5
13.5
15
13
10
12.5
15
12.3
e
4914.407
-21180.1
30819.88
-22999
-11223.4
-13223.4
4052.216
-2223.4
9871.121
-25404.5
3233.312
15500.98
27457.69
-3680.12
-41132.9
19548.24
28276.6
1138.789
7682.075
-17770.7
-8223.4
14565.56
Last slide
• Example of regression of wages and salaries
on years of schooling. Appears to satisfy
assumptions, although it may have
heteroskedasticity. That is, variance of
residuals may not be equal for all values of x.
Regression of alcohol consumption on income,
provinces of Canada, 1986.
1.5
Residuals.
ei = predicted –
actual alcohol
consumption
1
0.5
0
25
27
29
31
33
35
37
-0.5
-1
-1.5
x (income)
Appears to have a reasonable scatter of residuals,
with no obvious violation of assumptions.
39
Consumption function. Example of serial or auto
correlation.
Plot or Residuals of Consumption with GDP on horizontal
Residuals (y - predicted y)
15000
10000
5000
0
800000
850000
900000
950000
1000000
-5000
-10000
-15000
GDP
1050000
1100000
1150000
Solutions
• Tests and results are suspect.
• Violations of assumptions may affect some
estimates more than others.
• Solutions
– May mean we have missing explanatory variables
or wrong equation format.
– May mean that ε does not meet assumptions.
– Use different estimators.
• Examine in detail in courses in Econometrics.
11
Transformations
• Relationship may not be linear. There are
many different possibilities here. Two
examples are provided in other documents:
– Population growth – exponential growth.
– Earnings and age – parabolic relationship.
Some Cautions
• If we conclude β1 ≠ 0
–> doesn’t imply x causes y.
• Could still be random relationships.
• Need some theoretical argument too.
• If we conclude β1 is statistically different than 0
–> doesn’t mean a linear relationship exists for
sure.
– Must watch out for non-linear relationships.
13
Next day
• Begin multiple regression (ASW, Ch. 13).
• Assignment 6 has now been posted.
Remember that this assignment is optional.