Transcript PowerPoint

Distribution of Estimates and
Multivariate Regression
Lecture XXIX
Models and Distributional
Assumptions

The conditional normal model assumes
that the observed random variables are
distributed
yi ~ N a  bxi , s

2

Thus, E[yi|xi]=a+bxi and the variance of yi
equals s2. The conditional normal can be
expressed as
yi  a  bxi   i
 i ~ N 0, s
2

Further, the i are independently and
identically distributed (consistent with
our BLUE proof).
 Given this formulation, the likelihood
function for the simple linear model can
be written:



n
L a , b ,s x  
2
i 1
2


yi  a  bxi  
1
exp 

2
2s
2 s



Taking the log of this likelihood function
yields:
 
n
n
1
2
ln L    ln 2   ln s  2
2
2
2s

n
2


y

a

b
x
 i
i
i 1
As discussed in Lecture XVII, this
likelihood function can be concentrated in
such a way so that
n
n
2
ln L    ln sˆ  
2
2
n
1
2
2
ˆ
s    yi  a  bxi 
n i 1
So that the least squares estimator are
also maximum likelihood estimators if the
error terms are normal.
 Proof of the variance of b can be derived
from the Gauss-Markov results. Note
from last lecture:


xi  x 
ˆ
a  bxi   i 
b   d i yi  
S xx
i 1
i 1
n
n
n
n
n
i 1
i 1
i 1
  d ia   d i bxi   d i i

Remember that the objective function of
the minimization problem that we solved
to get the results was the variance of
estimate:

2
2
2
ˆ
V b  s  di
i 1

This assumes that the errors are
independently distributed. Thus,
substituting the final result for di into this
expression yields:

V bˆ  s
n
2

i 1
xi  x 
2
S
2
xx
S xx s
s 2 
S xx S xx
2
2
Multivariate Regression Models

In general, the multivariate relationship
can be written in matrix form as:
y  1 x1
 b0 
 
x2  b1   b 0  b1 x1  b 2 x2
b 
 2

If we expand the system to three
observations, this system becomes:
x21  b 0 
 
x22  b1 
x32  b 2 
 b 0  b1 x11  b 2 x21 


  b 0 b1 x21  b 2 x22 
b  b x  b x 
1 31
2 32 
 0
 y1  1 x11
  
 y2   1 x21
 y  1 x
31
 3 

Expanding the exactly identified model,
we get
 y1  1
  
 y 2  1

 y  1
 3 
 y  1
 4 
x11
x21
x31
x41
x21   1 
  
x22    2 




x32
3
  



x42    4 

In matrix form this can be expressed as
y  Xb  

The sum of squared errors can then be
written as:
SSE   y  yˆ '  y  yˆ    y  Xb '  y  Xb 
  y ' b ' X ' y  Xb 

A little matrix calculus is a dangerous
thing
d SSE  d  y  Xb ' y  Xb    y  Xb ' d  y  Xb 
 db ' X '  y  Xb    y  Xb ' Xdb

Note that each term on the left hand side
is a scalar. Since the transpose of a scalar
is itself, the left hand side can be
rewritten as:
d SSE  2 y  Xb ' X db
d SSE

 2 y  Xb ' X  0
db
y' X  b ' X ' X  0
y' X  b ' X ' X
X ' y  X ' Xb
X ' X 
1
X'y  b
Variance of the estimated
parameters

The variance of the parameter matrix can
be written as:




V bˆ  E  bˆ  b bˆ  b '


1
ˆ
y  Xb    b   X ' X  X ' y
  X ' X  X '  Xb   
1
  X ' X  X ' Xb   X ' X  X ' 
1
1
 b  X ' X  X '
1


Substituting this back into the variance
relationship yields:
1
1
ˆ
V b  E  X ' X  X '  ' X  X ' X  



Note that ’=s2I therefore

1
1
ˆ
V b  E  X ' X  X '  ' X  X ' X  


  X ' X  X ' s IX  X ' X 
1
2
X 'X  X 'X X 'X 
1
2
 s X 'X 
s
2
1
1
1

Theorem 12.2.1 (Gauss-Markov) Let
b*=C’y where C is a T x K constant matrix
such that C’X=I. Then, b^is better than b*
if b* ≠ b^.
b  b  C' u
*
◦ This choice of C guarantees that the
estimator b* is an unbiased estimator of b.
The variance of b* can then be written as:
 
V b  EC ' uu ' C 
 C ' Euu 'C
*
 s C'C
2

To complete the proof, we want to add a
special form of zero. Specifically, we want
to add s2(X’X)-1-s2(X’X)-1=0.
V b   s  X ' X   s C ' C  s  X ' X 
*
2
1
2
2
1
Focusing on the last terms, we note that
by the orthogonality conditions for the C
matrix
1
2
2
s C 'C  s  X ' X  






1
1

s C '  X ' X  X ' C  X  X ' X  


2
V b   s
*
2
X 'X 

1
1
1

 s C '  X ' X  X ' C  X  X ' X  


2

Focusing on the last terms, we note that
by the orthogonality conditions for the C
matrix

Z Z  C  X  X X 
1
 C  X  X X  

1
 C C  C X  X X    X X  X C
1
  X X  X X  X X 
1
1
1

s C 'C s
2
Substituting backwards
2
X 'X 
1
1



s C C X X  


1
2 
 s C C  C X  X X  

2
1
1
1





X
X
X
C

X
X
X
X
X
X
 
 
  
1 
1 
2 
 s  C  X  X X 
C  X  X X  






V b
*
Thus,
 s X 'X 
2

1


1
1

s
C '  X ' X  X ' C  X  X ' X  


2

The minimum variance estimator is then
C=X(X’X)-1 which is the ordinary least
squares estimator.