Transcript PowerPoint
Distribution of Estimates and
Multivariate Regression
Lecture XXIX
Models and Distributional
Assumptions
The conditional normal model assumes
that the observed random variables are
distributed
yi ~ N a bxi , s
2
Thus, E[yi|xi]=a+bxi and the variance of yi
equals s2. The conditional normal can be
expressed as
yi a bxi i
i ~ N 0, s
2
Further, the i are independently and
identically distributed (consistent with
our BLUE proof).
Given this formulation, the likelihood
function for the simple linear model can
be written:
n
L a , b ,s x
2
i 1
2
yi a bxi
1
exp
2
2s
2 s
Taking the log of this likelihood function
yields:
n
n
1
2
ln L ln 2 ln s 2
2
2
2s
n
2
y
a
b
x
i
i
i 1
As discussed in Lecture XVII, this
likelihood function can be concentrated in
such a way so that
n
n
2
ln L ln sˆ
2
2
n
1
2
2
ˆ
s yi a bxi
n i 1
So that the least squares estimator are
also maximum likelihood estimators if the
error terms are normal.
Proof of the variance of b can be derived
from the Gauss-Markov results. Note
from last lecture:
xi x
ˆ
a bxi i
b d i yi
S xx
i 1
i 1
n
n
n
n
n
i 1
i 1
i 1
d ia d i bxi d i i
Remember that the objective function of
the minimization problem that we solved
to get the results was the variance of
estimate:
2
2
2
ˆ
V b s di
i 1
This assumes that the errors are
independently distributed. Thus,
substituting the final result for di into this
expression yields:
V bˆ s
n
2
i 1
xi x
2
S
2
xx
S xx s
s 2
S xx S xx
2
2
Multivariate Regression Models
In general, the multivariate relationship
can be written in matrix form as:
y 1 x1
b0
x2 b1 b 0 b1 x1 b 2 x2
b
2
If we expand the system to three
observations, this system becomes:
x21 b 0
x22 b1
x32 b 2
b 0 b1 x11 b 2 x21
b 0 b1 x21 b 2 x22
b b x b x
1 31
2 32
0
y1 1 x11
y2 1 x21
y 1 x
31
3
Expanding the exactly identified model,
we get
y1 1
y 2 1
y 1
3
y 1
4
x11
x21
x31
x41
x21 1
x22 2
x32
3
x42 4
In matrix form this can be expressed as
y Xb
The sum of squared errors can then be
written as:
SSE y yˆ ' y yˆ y Xb ' y Xb
y ' b ' X ' y Xb
A little matrix calculus is a dangerous
thing
d SSE d y Xb ' y Xb y Xb ' d y Xb
db ' X ' y Xb y Xb ' Xdb
Note that each term on the left hand side
is a scalar. Since the transpose of a scalar
is itself, the left hand side can be
rewritten as:
d SSE 2 y Xb ' X db
d SSE
2 y Xb ' X 0
db
y' X b ' X ' X 0
y' X b ' X ' X
X ' y X ' Xb
X ' X
1
X'y b
Variance of the estimated
parameters
The variance of the parameter matrix can
be written as:
V bˆ E bˆ b bˆ b '
1
ˆ
y Xb b X ' X X ' y
X ' X X ' Xb
1
X ' X X ' Xb X ' X X '
1
1
b X ' X X '
1
Substituting this back into the variance
relationship yields:
1
1
ˆ
V b E X ' X X ' ' X X ' X
Note that ’=s2I therefore
1
1
ˆ
V b E X ' X X ' ' X X ' X
X ' X X ' s IX X ' X
1
2
X 'X X 'X X 'X
1
2
s X 'X
s
2
1
1
1
Theorem 12.2.1 (Gauss-Markov) Let
b*=C’y where C is a T x K constant matrix
such that C’X=I. Then, b^is better than b*
if b* ≠ b^.
b b C' u
*
◦ This choice of C guarantees that the
estimator b* is an unbiased estimator of b.
The variance of b* can then be written as:
V b EC ' uu ' C
C ' Euu 'C
*
s C'C
2
To complete the proof, we want to add a
special form of zero. Specifically, we want
to add s2(X’X)-1-s2(X’X)-1=0.
V b s X ' X s C ' C s X ' X
*
2
1
2
2
1
Focusing on the last terms, we note that
by the orthogonality conditions for the C
matrix
1
2
2
s C 'C s X ' X
1
1
s C ' X ' X X ' C X X ' X
2
V b s
*
2
X 'X
1
1
1
s C ' X ' X X ' C X X ' X
2
Focusing on the last terms, we note that
by the orthogonality conditions for the C
matrix
Z Z C X X X
1
C X X X
1
C C C X X X X X X C
1
X X X X X X
1
1
1
s C 'C s
2
Substituting backwards
2
X 'X
1
1
s C C X X
1
2
s C C C X X X
2
1
1
1
X
X
X
C
X
X
X
X
X
X
1
1
2
s C X X X
C X X X
V b
*
Thus,
s X 'X
2
1
1
1
s
C ' X ' X X ' C X X ' X
2
The minimum variance estimator is then
C=X(X’X)-1 which is the ordinary least
squares estimator.