No Slide Title

Transcript No Slide Title

Chapter 4: Properties of the Least Squares
Estimators
Least-squares estimators.
The calculus.
Unconstrained optimization.
Derivation.
(one variable)
Unconstrained optimization.
(two variables)
Properties.
(we are here)
(we are here)
Inference.
Key concepts:
Parameters.
Estimates.
Bias
Sampling variability.
Best linear unbiased estimation.
Simulation (MATLAB).
Properties of the Least Squares Estimators
Assumptions of the Simple Linear Regression Model
SR1.
yt  1   2 xt  et
SR2.
E (et )  0  E ( yt )  1  2 xt
SR3.
var(et )   2  var( yt )
SR4.
cov(ei , e j )  cov( yi , y j )  0
SR5.
xt is not random and takes at least two values
SR6.
et ~ N (0,  2 )
(optional)

yt ~ N [(1   2 xt ),  2 ]
4.1 The Least Squares Estimators as Random Variables
 The least squares estimator b2 of the slope parameter 2 ,
based on a sample of T observations, is
b2 
T  xt yt   xt  yt
T  xt2    xt 
2
(3.3.8a)
 The least squares estimator b1 of the intercept parameter 1
is
where
b1  y  b2 x
y   yt / T and x   xt / T
(3.3.8b)
are the sample means of
the observations on y and x, respectively.
 When the formulas for b1 and b2, are taken to be rules
that are used whatever the sample data turn out to be,
then b1 and b2 are random variables. In this context we
call b1 and b2 the least squares estimators.
 When actual sample values, numbers, are substituted into
the formulas, we obtain numbers that are values of
random variables. In this context, we call b1 and b2 the
least squares estimates.
4.2 The Sampling Properties of the Least Squares
Estimators
4.2.1
The Expected Values of b1 and b2
 We begin by rewriting the formula in equation 3.3.8a
into the following one that is more convenient for
theoretical purposes,
b2   2   wt et
(4.2.1)
where wt is a constant (non-random) given by
wt 
xt  x
 ( xt  x )2
(4.2.2)
The expected value of a sum is the sum of the expected
values (see Chapter 2.5.1):
E (b2 )  E  2   wt et   E (2 )   E ( wt et )
 2   wt E (et )  2
[since E (et )  0]
4.2.1a
The Repeated Sampling Context
Table 4.1 contains least squares estimates of the food
expenditure model from 10 random samples of size T=40
from the same population
Table 4.1 Least Squares Estimates from
10 Random Samples of size T=40
n
b1
b2
1
51.1314
0.1442
2
61.2045
0.1286
3
40.7882
0.1417
4
80.1396
0.0886
5
31.0110
0.1669
6
54.3099
0.1086
7
69.6749
0.1003
8
71.1541
0.1009
9
18.8290
0.1758
10
36.1433
0.1626
4.2.1b
Derivation of Equation 4.2.1
 1

2
2
2
2
2
(
x

x
)

x

2
x
x

T
x

x

2
x
 t
t t
t
 T  xt   T x
 T

  xt2  2T x 2  T x 2   xt2  T x 2
(x  x )   x
2
t
2
t
 T x 2   xt2  x  xt   xt2
 x 

2
t
T
(4.2.4b)
To obtain this result we have used the fact that
x   xt / T , so  xt  T x .
 ( x  x )( y
t
t
 y )   xt yt  Tx y   xt yt
x y


t
t
T
(4.2.5)
b2 in deviation from the mean form is:
b2
( x  x )( y  y )


 (x  x )
t
t
2
t
(4.2.6)
 Recall that
 (x
t
 x)  0
(4.2.7)
 Then, the formula for b2 becomes
b2
( x  x )( y  y )  ( x  x ) y  y  ( x  x )



(
x

x
)

(x  x )
t
t
t
t
t
2
2
t
t
 (x  x )
(x  x ) y


 
(
x

x
)
  ( x  x )

t
t
t
2
t
t

y   wt yt
2 t

(4.2.8)
where wt is the constant given in equation 4.2.2.
To obtain equation 4.2.1, replace yt by yt  1  2 xt  et
and simplify:
b2   wt yt   wt (1  2 xt  et )  1  wt  2  wt xt   wt et
w
t
 0 , this eliminates the term 1  wt .
w x
t t
 1 , so  2  wt xt   2 , and (4.2.9a) simplifies to
equation 4.2.1
b2   2   wt et
(4.2.9b)
The term
w
t
 0 , because
 ( xt  x ) 
1
w

 t   ( x  x )2   ( x  x )2  ( xt  x ) 0
  t
  t
 using  ( x  x )  0 
t
To show that  wt xt  1 we again use  ( xt  x )  0 .
2
Another expression for  ( xt  x ) is
( x  x )  ( x  x )( x  x )  ( x  x )x  x ( x  x )  ( x  x )x
2
t
t
t
t
t
t
t
t
Consequently
 wt xt 
(x  x )x  (x  x )x
(x  x ) (x  x )x
t
t
2
t
t
t
t
t
1
4.2.2
The Variances and Covariance of b1 and b2
var(b2 )  E[b2  E (b2 )]2
If the regression model assumptions SR1-SR5 are correct
(SR6 is not required), then the variances and covariance of
b1 and b2 are:
2


x

t
2
var(b1 )   
2
 T  ( xt  x ) 
2
var(b2 ) 
 ( xt  x )2


x
cov(b1 , b2 )   
2
  ( xt  x ) 
2
(4.2.10)
Let us consider the factors that affect the variances and
covariance in equation 4.2.10.
2
1. The variance of the random error term, , appears in
each of the expressions.
2. The sum of squares of the values of x about their sample
mean,  ( x
t
 x ) 2 , appears in each of the variances and
in the covariance.
1. The larger the sample size T the smaller the variances
and covariance of the least squares estimators; it is
better to have more sample data than less.
2. The term x2 appears in var(b1).
3. The sample mean of the x-values appears in cov(b1,b2).
Deriving the variance of b2:
The starting point is equation 4.2.1.
var(b2 )  var  2   wt et   var   wt et 
=  wt2 var(et )
=2  wt2
2

 ( xt  x )2
[since 2 is a constant]
[using cov(ei , e j )  0]
[using var(et )  2 ]
The very last step uses the fact that


2
(
x

x
)
1
2
t


w


 t 
2
2
2

(
x

x
)

t
(
x

x
)

  t
(4.2.12)
4.2.3
Linear Estimators
 The least squares estimator b2 is a weighted sum of the
observations yt, b2   wt yt
 Estimators like b2, that are linear combinations of an
observable random variable, linear estimators
4.3 The Gauss-Markov Theorem
Gauss-Markov
Theorem:
Under
the
assumptions SR1-SR5 of the linear regression
model the estimators b1 and b2 have the smallest
variance of all linear and unbiased estimators of
1 and 2. They are the Best Linear Unbiased
Estimators (BLUE) of 1 and 2
1. The estimators b1 and b2 are “best” when compared to
similar estimators, those that are linear and unbiased.
The Theorem does not say that b1 and b2 are the best of
all possible estimators.
2. The estimators b1 and b2 are best within their class
because they have the minimum variance.
3. In order for the Gauss-Markov Theorem to hold, the
assumptions (SR1-SR5) must be true.
If any of the
assumptions 1-5 are not true, then b1 and b2 are not the
best linear unbiased estimators of 1 and 2.
4.
The Gauss-Markov Theorem does not depend on the
assumption of normality
5.
In the simple linear regression model, if we want to
use a linear and unbiased estimator, then we have to
do no more searching
6.
The Gauss-Markov theorem applies to the least
squares estimators.
It does not apply to the least
squares estimates from a single sample.
Proof of the Gauss-Markov Theorem:
*
 Let b2   kt yt (where the kt are constants) be any other
linear estimator of 2.
 Suppose that kt  wt  ct , where ct is another constant
and wt is given in equation 4.2.2.
 Into this new estimator substitute yt and simplify, using
the properties of wt in equation 4.2.9.
b2*   kt yt   ( wt  ct ) yt   ( wt  ct )(1  2 xt  et )
  ( wt  ct ) 1   ( wt  ct ) 2 xt   ( wt  ct ) et
 1  wt  1  ct  2  wt xt  2  ct xt   ( wt  ct ) et
 1  ct  2  2  ct xt   ( wt  ct ) et
since wt = 0 and wt xt = 1.
Hence:
E (b2* )  1  ct  2   2  ct xt   ( wt  ct )E (et )
 1  ct   2   2  ct xt
(4.3.2)
In order for the linear estimator b2*   kt yt to be unbiased
it must be true that
c
t
 0 and
c x
t t
0
These conditions must hold in order for b2*   kt yt to be
in the class of linear and unbiased estimators
 So we will assume the conditions (4.3.3) hold and use
them to simplify expression (4.3.1):
b2*   kt yt   2   ( wt  ct ) et
(4.3.4)
We can now find the variance of the linear unbiased
*
estimator b2 following the steps in equation 4.2.11 and
using the additional fact that
 ct ( xt  x ) 
1
x
c
w


c
x

c 0
 t t   ( x  x )2  ( x  x )2  t t
2  t
  t
  t
 ( xt  x )
Use the properties of variance to obtain:
var(b2* )  var  2   ( wt  ct ) et    ( wt  ct ) var(et )
2
 2  ( wt  ct )  2  wt2   2  ct2
2
 var(b2 )  2  ct2
 var(b2 ) since  ct2  0
4.4 The Probability Distribution of the Least Squares
Estimators
 If we make the normality assumption, assumption SR6
about the error term, then the least squares estimators are
normally distributed.

 2  xt2 
b1 ~ N  1 ,
 T  ( x  x ) 2 
t




2
b2 ~ N  2 ,
2 


(
x

x
)

t


 If assumptions SR1-SR5 hold, and if the sample
size T is sufficiently large, then the least squares
estimators have a distribution that approximates the
normal distributions shown in equation 4.4.1
4.5 Estimating the Variance of the Error Term
The variance of the random variable et is
var( et )   2  E[et  E (et )]2  E (et2 )
(4.5.1)
if the assumption E(et)=0 is correct.
Since the “expectation” is an average value we might
consider estimating 2 as the average of the squared errors,
ˆ 2
e


2
t
T
But we don’t observe the errors, only the residuals.
The least squares residuals are obtained by replacing the
unknown parameters by their least squares estimators,
eˆt  yt  b1  b2 xt
This leads to:
ˆ
2
 eˆ

2
t
T
(4.5.3)
There is a simple modification that produces an unbiased
estimator, and that is
ˆ
2
eˆ


2
t
T 2
E (ˆ 2 )   2
(4.5.4)
4.5.1
Estimating the Variances and Covariances of the
Least Squares Estimators
 Replace the unknown error variance 
4.2.10 by its estimator to obtain:
2
in equation


xt2

ˆ b1 )  ˆ 
var(
,
2
 T  ( xt  x ) 
2
ˆ 2
ˆ b2 ) 
var(
,
2
 ( xt  x )
ˆ b1 )
se(b1 )  var(
ˆ b2 )
se(b2 )  var(


x
ˆ b1 , b2 )  ˆ 2 
cov(
2
(
x

x
)
  t

(4.6.6)
4.6.2
The Estimated Variances and Covariances for the
Food Expenditure Example
Table 4.1 Least Squares Residuals for
Food Expenditure Data
yˆ  b1  b2 x eˆ  y  yˆ
y
52.25
58.32
81.79
119.90
125.80
73.9045
84.7834
95.2902
100.7424
102.7181
21.6545
26.4634
13.5002
19.1576
23.0819
ˆ
2
eˆ


2
t
T 2

54311.3315
 1429.2456
38
2


x
 21020623 

t
2
ˆ
ˆ
var(b1 )   

1429.2456

 40(1532463)   490.1200
2
T
(
x

x
)
  t



ˆ b1 )  490.1200  22.1387
se(b1 )  var(
ˆ 2
1429.2456
ˆ b2 ) 
var(

 0.0009326
2
(
x

x
)
1532463
 t
ˆ b2 )  0.0009326  0.0305
se(b2 )  var(


x
 698 
ˆ b1 , b2 )  ˆ 
cov(

1429.2456
2
1532463   0.6510
(
x

x
)
  t

2
On the issue of understanding sampling variability, there are two
important exercises.
Some experiments with MATLAB will follow in the laboratory.
There we will use the least-squares estimator for the model y1, y2,
…, yN ~ iid N(, 2).
Hence, prove, using the Gauss-Markov theorem, that the statistic,
yˆ 
is BLUE.
 yi
N

No Slide Title

Transcript No Slide Title

Directory