Transcript PowerPoint
Distribution of Estimates and Multivariate Regression Lecture XXIX Models and Distributional Assumptions The conditional normal model assumes that the observed random variables are distributed yi ~ N a bxi , s 2 Thus, E[yi|xi]=a+bxi and the variance of yi equals s2. The conditional normal can be expressed as yi a bxi i i ~ N 0, s 2 Further, the i are independently and identically distributed (consistent with our BLUE proof). Given this formulation, the likelihood function for the simple linear model can be written: n L a , b ,s x 2 i 1 2 yi a bxi 1 exp 2 2s 2 s Taking the log of this likelihood function yields: n n 1 2 ln L ln 2 ln s 2 2 2 2s n 2 y a b x i i i 1 As discussed in Lecture XVII, this likelihood function can be concentrated in such a way so that n n 2 ln L ln sˆ 2 2 n 1 2 2 ˆ s yi a bxi n i 1 So that the least squares estimator are also maximum likelihood estimators if the error terms are normal. Proof of the variance of b can be derived from the Gauss-Markov results. Note from last lecture: xi x ˆ a bxi i b d i yi S xx i 1 i 1 n n n n n i 1 i 1 i 1 d ia d i bxi d i i Remember that the objective function of the minimization problem that we solved to get the results was the variance of estimate: 2 2 2 ˆ V b s di i 1 This assumes that the errors are independently distributed. Thus, substituting the final result for di into this expression yields: V bˆ s n 2 i 1 xi x 2 S 2 xx S xx s s 2 S xx S xx 2 2 Multivariate Regression Models In general, the multivariate relationship can be written in matrix form as: y 1 x1 b0 x2 b1 b 0 b1 x1 b 2 x2 b 2 If we expand the system to three observations, this system becomes: x21 b 0 x22 b1 x32 b 2 b 0 b1 x11 b 2 x21 b 0 b1 x21 b 2 x22 b b x b x 1 31 2 32 0 y1 1 x11 y2 1 x21 y 1 x 31 3 Expanding the exactly identified model, we get y1 1 y 2 1 y 1 3 y 1 4 x11 x21 x31 x41 x21 1 x22 2 x32 3 x42 4 In matrix form this can be expressed as y Xb The sum of squared errors can then be written as: SSE y yˆ ' y yˆ y Xb ' y Xb y ' b ' X ' y Xb A little matrix calculus is a dangerous thing d SSE d y Xb ' y Xb y Xb ' d y Xb db ' X ' y Xb y Xb ' Xdb Note that each term on the left hand side is a scalar. Since the transpose of a scalar is itself, the left hand side can be rewritten as: d SSE 2 y Xb ' X db d SSE 2 y Xb ' X 0 db y' X b ' X ' X 0 y' X b ' X ' X X ' y X ' Xb X ' X 1 X'y b Variance of the estimated parameters The variance of the parameter matrix can be written as: V bˆ E bˆ b bˆ b ' 1 ˆ y Xb b X ' X X ' y X ' X X ' Xb 1 X ' X X ' Xb X ' X X ' 1 1 b X ' X X ' 1 Substituting this back into the variance relationship yields: 1 1 ˆ V b E X ' X X ' ' X X ' X Note that ’=s2I therefore 1 1 ˆ V b E X ' X X ' ' X X ' X X ' X X ' s IX X ' X 1 2 X 'X X 'X X 'X 1 2 s X 'X s 2 1 1 1 Theorem 12.2.1 (Gauss-Markov) Let b*=C’y where C is a T x K constant matrix such that C’X=I. Then, b^is better than b* if b* ≠ b^. b b C' u * ◦ This choice of C guarantees that the estimator b* is an unbiased estimator of b. The variance of b* can then be written as: V b EC ' uu ' C C ' Euu 'C * s C'C 2 To complete the proof, we want to add a special form of zero. Specifically, we want to add s2(X’X)-1-s2(X’X)-1=0. V b s X ' X s C ' C s X ' X * 2 1 2 2 1 Focusing on the last terms, we note that by the orthogonality conditions for the C matrix 1 2 2 s C 'C s X ' X 1 1 s C ' X ' X X ' C X X ' X 2 V b s * 2 X 'X 1 1 1 s C ' X ' X X ' C X X ' X 2 Focusing on the last terms, we note that by the orthogonality conditions for the C matrix Z Z C X X X 1 C X X X 1 C C C X X X X X X C 1 X X X X X X 1 1 1 s C 'C s 2 Substituting backwards 2 X 'X 1 1 s C C X X 1 2 s C C C X X X 2 1 1 1 X X X C X X X X X X 1 1 2 s C X X X C X X X V b * Thus, s X 'X 2 1 1 1 s C ' X ' X X ' C X X ' X 2 The minimum variance estimator is then C=X(X’X)-1 which is the ordinary least squares estimator.