Simple OLS - National University of Kaohsiung

Download Report

Transcript Simple OLS - National University of Kaohsiung

1.1
Purpose of Regression Analysis
1. Estimate a relationship among economic
variables, such as y = f(x).
2. Forecast or predict the value of one
variable, y, based on the value of
another variable, x.
1.2
Weekly Food Expenditures
y = dollars spent each week on food items.
x = consumer’s weekly income.
The relationship between x and the expected
value of y , given x, might be linear:
E(y|x) = 1 + 2 x
1.3
f(y|x=480)
f(y|x=480)
y|x=480
y
Probability Distribution f(y|x=480)
of Food Expenditures if given income x=$480.
f(y|x)
f(y|x=480)
f(y|x=800)
y|x=480
y|x=800
Probability Distribution of Food
Expenditures if given income x=$480 and x=$800.
1.4
y
1.5
Average
Expenditure
E(y|x)
E(y|x)=1+2x
E(y|x)
x
2 =
E(y|x)
x
1{
x (income)
The Economic Model: a linear relationship
between average expenditure on food and income.
1.6
Homoskedastic Case
f(yt)
.
.
E(y|x) = 1 + 2 x
x1=480
x2=800
income
xt
The probability density function
for yt at two levels of household income, x t
1.7
Heteroskedastic Case
f(yt)
.
.
x1
x2
x3
.
income
The variance of yt increases
as household income, x t , increases.
xt
The Error Term
1.8
y is a random variable composed of two parts:
I. Systematic component:
This is the mean of y.
E(y|x) = 1 + 2x
II. Random component:
ε =
y - E(y)
= y -  1 -  2x
This is called the random error.
Together
E(y) and ε form the model:
y = 1 + 2x + ε
y
.
y4
y3
.
y2
y1
1.9
.
.
x1
x2
x3
x4
x
y
.
y4
y3
.
y2
y1
1.10
.
y
.
x1
x2
x3
x4
x
y
y4
ε4
y3
ε2
y2
y1
.
{.
.
.
} ε3
1.11
E(y) = 1 + 2x
y
ε1
x1
x2
x3
x4
x
The relationship among y, εand the true regression line.
Why must the stochastic error term
be present in a regression equation
1.12
Many minor influences on Y are omitted from the equation
(for instance, because data are unavailable).
It is virtually impossible to avoid some sort of measurement
error in at least one of the equation’s variables.
The underlying theoretical equation might have a different
functional form (or shape) than the one chosen for the
regression. For example, the underlying equation might
be nonlinear in the variables for a linear regression.
All attempts to generalize human behavior must contain at
least some amount of unpredictable or purely random
variation.
The Assumptions of Simple Linear 1.13
Regression Models
1. The value of y, for each value of x, is
y = 1 + 2x + ε
2. The average value of the random error e is:
E(ε) = 0
3. The variance of the random error e is:
var(ε) = 2 = var(y)
4. The covariance between any pair of e is:
cov(εi , εj) = cov(yi ,yj) = 0 for i  j
5. x must take at least two different values so that
x is not a constant.
6. e is normally distributed with mean 0, var(ε)=2
(optional) ε ~ N(0,2)
Population regression values:
y t = 1 + 2x t + ε t
Population regression line:
E(y t|x t) = 1 + 2x t
Sample regression values:
y t = b1 + b2x t + ε^t
Sample regression line:
y^ t = b1 + b2x t
1.14
1.15
y
ε^4
y^3
ε^2
y^1.
y2
.
{.
y^
y
.4
{.
y^4
^y = b + b x
1
2
.} ε^3
.
y
3
2
ε^1
}
.
y1
x1
x2
x4
x3
^ε
x
The relationship among y, and the fitted
regression line.
1.16
y t = 1 + 2x t + ε t
εt = y t - 1 - 2x t
Minimize error sum of squared deviations:
S(1,2) =
T

(y t - 1 - 2x t )2
t =1
T
=  εt 2
t =1
1.17
Minimize S(1,2) w.r.t. 1 and 2:
S(1,2) =
S()
1
S()
2
T
(y t - 1 - 2x t
2
)
t =1
= - 2 (y t - 1 - 2x t )
= -2 x t (y t - 1 - 2x t )
Set each of these two derivatives equal to zero and
solve these two equations for the two unknowns:
1, 2
1.18
Minimize w.r.t. 1 and 2:
S() =
S(.)
T
2
(
y
-

-

x
)
 t
1
2 t
t =1
S(.)
.
S(.) <
0
i
S(.) =
0
i
.
bi
.S(.)
i
>0
i
To minimize S(.), you set the two
derivatives equal to zero to get:
S()
1
S()
2
1.19
= - 2 (y t- b1 - b2x t ) = 0
= - 2 x t (y t -b1 - b2x t ) = 0
When these two terms are set to zero,
1 and 2 become b1 and b2 because they no longer
represent just any value of 1 and 2 but the special
values that correspond to the minimum of S() .
-2 (y t -
1.20
b1 - b2x t ) = 0
-2 x t (y t -
b1 - b2x t ) = 0
y t - Tb1 - b2  x t
= 0
x t y t - b1  x t - b2  xt
2
= 0
Tb1 + b2  x t = y t
2
b1  x t + b2  xt = x t y t
1.21
Tb1 + b2  x t =
Normal
2
Equation b
=
1  x t + b2  xt
y t
x t y t
Solve for b1 and b2 using definitions of
b2 =
=
T  x t yt
 x t y t
2
2
T  x t - ( x t )
 x t yt - Tx y
2 - T x2
x
 t
b1 = y
- b2 x
-
x and y
1.22
Interpretation of Coefficients, b1 and b2
b2 represents an estimate of the mean change in y
responding to a one-unit change in x.
b1 is an estimate of the mean y when x = 0.
It
must be very careful to interpret the estimated
intercept since we usually do not have any data
points near x = 0.
Note that regression analysis cannot be interpreted as
a procedure for establishing a cause-and-effect
relationship between variables.
Simple Linear Regression Model
y t = 1 + 2 x t +  t
yt = demand for cars
x t = prices
For a given level of x t, the expected
level of demand for cars will be:
E(yt|x t) =
1 + 2 x t
1.23
Assumptions of the Simple
Linear Regression Model
1.24
1. yt = 1 + 2x t +  t
2. E( t) = 0 <=> E(yt I xt) = 1 + 2x t
3. var( t) =  2 = var(yt)
4. cov( i, j) = cov(yi,yj) = 0
5. x t is not constant (no perfect collinearity)
6.
 t~N(0, 2) <=> yt~N(1+ 2x t, 2)
The population parameters 1 and 2
are unknown population constants.
The formulas that produce the
sample estimates b1 and b2 are
called the estimators of 1 and 2.
When b1 and b2 are used to represent
the formulas rather than specific values,
they are called estimators of 1 and 2
which are random variables because
they are different from sample to sample.
1.25
1.26
Estimators are Random Variables
( estimates are not )
If the least squares estimators b1 and b2
are random variables, then what are their
their means, variances, covariances and
probability distributions?
Compare the properties of alternative estimators to
the properties of the
least squares estimators.
1.27
The Expected Values of b1 and b2
The least squares formulas (estimators)
in the simple regression case:
b2 =
Txtyt - xt yt
Txt -(xt)
2
2
b1 = y - b2x
where
y = yt / T and x = x t / T
Substitute in
to get:
yt = 1 + 2x t +  t
1.28
Txtt - xt t
b2 = 2 +
2
2
Txt -(xt)
The mean of b2 is:
TxtE(t) - xt E(t)
E(b2) = 2 +
2
2
Txt -(xt)
Since
E(t) = 0, then E(b2) = 2 .
An Unbiased Estimator
The result E(b2) = 2 means that
the distribution of b2 is centered at 2.
Since the distribution of b2
is centered at 2 ,we say that
b2 is an unbiased estimator of 2.
1.29
Wrong Model Specification
1.30
The unbiasedness result on the
previous slide assumes that we
are using the correct model.
If the model is of the wrong form
or is missing important variables,
then E(t) = 0, then E(b2) = 2 .
1.31
Unbiased Estimator of the Intercept
In a similar manner, the estimator b1
of the intercept or constant term can be
shown to be an unbiased estimator of 1
when the model is correctly specified.
E(b1) = 1
1.32
Equivalent expressions for b2:
(xt  x)yt  y )
b2 =
2
xt  x )
Expand and multiply top and bottom by T:
b2 =
Txtyt  xt yt
Txt (xt)
2
2
=
xtyt – T x y
xt2 – T x2
1.33
Variance of b2
Given that both yt and t have variance
 2,
the variance of the estimator b2 is:
var(b2) =
2
x t  x
2
b2 is a function of the yt values but
var(b2) does not involve yt directly.
1.34
Variance of b1
Given
b1 = y  b2x
the variance of the estimator b1 is:
x t
var(b1) =  2
2
Tx t  x
2
1.35
Covariance of b1 and b2
cov(b1,b2) = 2
x
x t  x
2
What factors determine variance
and covariance of b1 and b2?
1.36
larger the 2, the greater the uncertainty about b1, b2
and their relationship.
2. The more spread out the xt values are then the more
confidence we have in b1, b2, etc.
3. The larger the sample size, T, the smaller the variances and
covariances.
4. The variance b1 is large when the (squared) xt values are
far from zero (in either direction).
5. Changing the slope, b2, has no effect on the intercept, b1,
when the sample mean is zero. But if sample
mean
is positive, the covariance between b1 and
b2 will be
negative, and vice versa.
1. The
Gauss-Markov Theorem
Under the first five assumptions of the
simple, linear regression model, the
ordinary least squares estimators b1
and b2 have the smallest variance of
all linear and unbiased estimators of
1 and 2. This means that b1and b2
are the Best Linear Unbiased Estimators
(BLUE) of 1 and 2.
1.37
Implications of Gauss-Markov
1.38
b1 and b2 are best within the class of linear and
unbiased estimators.
2. Best means smallest variance within the class
of linear/unbiased.
3. All of the first five assumptions must hold to
satisfy Gauss-Markov.
4. Gauss-Markov does not require assumption
six: normality.
5. G-Markov is not based on the least squares
principle but on the estimation rules of b1 and b2.
1.39
G-Markov implications (continued)
6. If we are not satisfied with restricting
our estimation to the class of linear and
unbiased estimators, we should ignore the
Gauss-Markov Theorem and use some
nonlinear and/or biased estimator instead.
(Note: a biased or nonlinear estimator
could have smaller variance than those
satisfying Gauss-Markov.)
7. Gauss-Markov applies to the b1 and b2
estimators and not to particular sample
values (estimates) of b1 and b2.
yt and t normally distributed
1.40
The least squares estimator of 2 and 1 can
be
expressed as a linear combination of yt :
b2 = wt yt
x t  x
where wt =
2
x t  x
b1 = y  b2x
This means that b1and b2 are normal since
linear combinations of normals are normal.
normally distributed under
The Central Limit Theorem
1.41
If the first five Gauss-Markov assumptions
hold, and sample size, T, is sufficiently large,
then the least squares estimators, b1 and b2,
have a distribution that approximates the
normal distribution with greater accuracy
the larger the value of sample size, T.
Probability Distribution
of Least Squares Estimators
1.42
If one of the above two conditions is satisfied,
then the distributions of b1 and b2 are
b1 ~ N  1 ,
b2 ~ N  2 ,
2
x t
2
Tx t  x
2
x t  x
2
2
Consistency
1.43
We would like our estimators, b1 and b2, to collapse
onto the true population values, 1 and 2, as
sample size, T, goes to infinity.
One way to achieve this consistency property is for
the variances of b1 and b2 to go to zero as T goes to
infinity.
Since the formulas for the variances of the least
squares estimators b1 and b2 show that their
variances do, in fact, go to zero, then b1 and b2, are
consistent estimators of 1 and 2.
Estimating the variance
of the error term, 2
^
εt = yt b1  b2 x t
T
^
 =
^2
 εt
t =1
T2
^  is an unbiased estimator of  2
1.44
The Least Squares
Predictor, y^o
Given a value of the explanatory
variable, Xo, we would like to predict a
value of the dependent variable, yo.
The least squares predictor is:
^
y o = b1 + b 2 x o
1.45
1.46
Probability Distribution
of Least Squares Estimators
b1 ~ N  1 ,
2
x t
2
Tx t  x 2
b2 ~ N 2 ,
2
x t  x
2
1.47
b2 ~ N  2 ,
2
x t  x
2
Create a standardized normal random variable, Z,
by subtracting the mean of b2 and dividing by its
standard deviation:

b2 2
var(b2)

Error Variance Estimation
1.48
Unbiased estimator of the error variance:
t

^2 =


^
ε
T
Transform to a chi-square distribution:
^2
T 

2


T
Chi-Square degrees of freedom
Since the errors  t = yt  1  2x t
are not observable, we estimate them with

the sample residuals ε t = yt  b1  b2x t.
Unlike the errors, the sample residuals are
not independent since they use up two degrees
of freedom by using b1 and b2 to estimate 1 and 2.
We get only T2 degrees of freedom instead of T.
1.49
1.50
Student-t Distribution
t=
Z
~ t(m)
V/m
where Z ~ N(0,1)
and V ~

2
(m)
Provided both Z and V are independent.
t =
Z
1.51
~ t(T-2)
V / (T2)
where Z =
(b2 2)
var(b2)
and var(b2) =
2
( xi  x )2
1.52
Z
t =
V / (T-2)
V =
(T2) ^ 2
2
(b2 2)
t =
var(b2)
2
^
(T2) 
2
(T2)
var(b2) =

1.53
2
( xi  x )2
(b2 2)
2
notice the
cancellations
 ( xi  x )2
t =
=
^
(T2)  2
(T2)
2
(b2 2)
^2

( xi  x )2
1.54
t =
(b2 2)
=
2
^

( xi  x )
t =
2
(b2 2)
se(b2)
(b2 2)
^
var(b2)
1.55
Student t - statistic
t =
(b2 2)
se(b2)
~ t (T2)
t has a Student-t Distribution
with T2 degrees of freedom.
The Least Squares
^
Predictor, yo
1.56
Given a value of the explanatory variable, X0,
we would like to predict a value of the
dependent variable, yo. The least squares
predictor
^ is:
y o = b1 + b 2 x o
Prediction error :
f = y^o  yo = (b1 - 1) + (b2 - 2)x0 – ε0
1.57
Prediction error :
f = ^yo  yo = (b1 - 1) + (b2 - 2)x0 – ε0
E[f ] =E[ y^o  yo] = 0
2

x

x

1
o
2
var( f ) =  1 +
+
T
x t  x2
f ~ N [, var( f )]
1.58
Prediction Intervals
A (1)x100% prediction interval for yo is:
y^ o  t(T-2),/2 se( f )
f = y^o  yo
se( f ) =
^
var( f )
2

x

x

1
o
^
^
2
var( f ) =  1 +
+
T
x t  x2
1.59
The Least Squares Estimator of
Mean Response, ^
o
, when x = x0
^ = b + b x
o
1
2 o
Estimation error :
^ o  E[yo] = (b1 - 1) + (b2 - 2)x0
var( ^ 0) =  2
2

x

x

1 +
o
T
x t  x 2
1.60
1.61
f(yt)
.
.
.
.
y1
1
0
y0
x0
x1
X
^ and Predicted Response y^
Mean Response 
Explaining Variation in yt
1.62
Predicting yt without any explanatory variables:
yt = 1 + εt
T
(yt  b1) = 0
t=1
T
εt = (yt  1)
2
t=1
T
T
T
2
t=1
T
ε

t
t=1
=

(y

b
)
=
0

t
1
t=1
1
2
yt  Tb1 = 0
t=1
b1 = y
y
.
y4
y3
.
y2
y1
1.63
.
.
x1
x2
x3
x4
x
y
.
y4
y3
.
y2
y1
1.64
.
y
.
x1
x2
x3
x4
x
Explaining Variation in yt
1.65
^
yt = b1 + b2xt + εt
^
Explained variation: yt = b1 + b2xt
Unexplained variation:
^
^
εt = yt  yt = yt  b1 b2xt
y
Y4
y4
ε^4
ε^2
y3
{.
y2
y1
.
1.66
.
x1
^Y = b +b x
1
.
}
y
ε^3
ε^1
x2
x3
x4
x
2
Explaining Variation in yt
^
^
yt = yt + εt
1.67
using y as baseline
^
^
yt  y = yt  y + εt
T
T
(yty) = 
t=1
2
t=1
cross
2
^
product
ε
t=1 t
term
drops
out
2
^
(y y) +
t
T

SST = SSR + SSE
y
y4
E4 = Y  (b1+b2X) (SSE)
y3
(b1+b2X)  Y (SSR)
}
.
y
y2
SSR
(SST)
y1
.
x1
{
.
.
1.68
b1 + b2x
Y  Y (SST)
SSE
x2
x3
x4
The relationship among SST, SSR, and SSE.
x
Total Variation in yt
1.69
SST = total sum of squares
SST measures variation of yt around y
T
SST
= (yt y) = 
t=1
2
2
yt T
2
y
Explained Variation in yt
1.70
SSR = regression sum of squares
^
Fitted yt values:
^y = b + b x
t
1
2 t
^
SSR measures variation of yt around y
T
SSR=
(yt y) =b2 (xt x)
t=1
^
2
2
2
Unexplained Variation in yt
SSE = error sum of squares
^
^
εt = ytyt = yt b1  b2xt
^
SSE measures variation of yt around yt
T
SSE
T
= (yt yt) = 
t=1
^ 2
t=1
^
2
ε
t
1.71
Analysis of Variance Table
1.72
Table 6.1 Analysis of Variance Table
Source of
Sum of
Mean
Variation
DF Squares
Square
Explained
1
SSR MSR =SSR/1
Unexplained T-2 SSE MSE =SSE/(T-2)
^ 2]
[= 
Total
T-1
SST
Coefficient of Determination
What proportion of the variation
in yt is explained?
2
0< R <1
2
R =
SSR
SST
1.73
Coefficient of Determination
SST = SSR + SSE
SST
SST
Dividing
by SST
=
SSR SSE
+
SST SST
1 =
2
R =
SSR
SST
SSR + SSE
SST SST
= 1
SSE
SST
1.74
Coefficient of Determination
2
R
1.75
is only a descriptive measure.
2
R
does not measure the quality
of the regression model.
Focusing solely on maximizing
2
R is not a good idea.
1.76
In simple linear regression models, there are two ways to test
H0: β2 = 0 vs
1.
HA: β2 ≠ 0
Under H0, t = b2 / se(b2) ~ t(T-2)
2. Under H0, F = MSR / MSE ~ F1, T-2
Note that
1.
2.
It can be show that t2(T-2) = F1, T-2
F = MSR / MSE = R2 / [(1  R2) / (T  2)]
Regression Computer Output
1.77
Typical computer output of regression estimates:
Table 6.2 Computer Generated Least Squares Results
(1)
(2)
(3)
(4)
(5)
Parameter Standard T for H0:
Variable
Estimate
Error Parameter=0 Prob>|T|
INTERCEPT 40.7676 22.1387
1.841
0.0734
X
0.1283
0.0305
4.201
0.0002
Regression Computer Output
b1 = 40.7676
b2 = 0.1283
se(b1) =
^ 1) = 490.12
var(b
se(b2) =
^ 2) = 0.0009326 = 0.0305
var(b
t =
t =
b1
se(b1)
b2
se(b2)
=
=
= 22.1287
40.7676
22.1287
= 1.84
0.1283
0.0305
= 4.20
1.78
Regression Computer Output
1.79
Sources of variation in the dependent variable:
Table 6.3 Analysis of Variance Table
Sum of
Mean
Source
DF
Squares
Square
Explained
1 25221.2229 25221.2229
Unexplained 38 54311.3314 1429.2455
Total
39 79532.5544
R-square: 0.3171
Regression Computer Output
SST = (yty) = 79532
2
^
SSR = (yty) = 25221
2
^
SSE = ε = 54311
2

t
SSE /(T-2) = ^2
2
R =
SSR
SST
= 1
= 1429.2455
SSE
= 0.317
SST
1.80
Reporting Regression Results
1.81
2
R = 0.317
This R2 value may seem low but it is
typical in studies involving cross-sectional
data analyzed at the individual or micro level.
A considerably higher R2 value would be
expected in studies involving time-series data
analyzed at an aggregate or macro level.