Transcript 幻灯片 1

3 Simple Regression

Y i

   

X i

 

i

, 

i

n

.

i

.

d

.( 0 ,  2 ) Classical normal linear regression: 1) How to estimate the parameters. 2) Properties of the estimators. 3) Hypothesis testing. 4) Prediction. 1

3.1 Model Specification

 Terms:

Y i

   

X i

 

i

, 

i

n

.

i

.

d

.( 0 ,  2 )

Y i

: dependent variable regressed   

X i

: systematic component/explanatory independent regressor  : parameters 

i

: disturbance  Three parameters to be estimated:  ,  &  2 . 2

3.1.1 Classical Disturbance

 

i

n

.

i

.

d

.( 0 ,  2 ) -- heteroskedastic: 

i

n

.

i

.

d

.( 0 , 

i

2 ) homoskedastic: 

i

2   2 for all

i

-- temporal dependence: autocorrelated 3

3.1.2 Explanatory Variable

3.1.2.1 Nonstockastic Explanatory Variable -- Explanatory Variables are under control of the researchers e.g. crop yield to fertilizer application

X

: fertilizer  : soil quality, rainfall, and so on.  But in reality, data are random.  So the random aspect of

Y

is only result from 

i

.

E

(

Y i

)    

X i

E

( 

i

)    

X i

, var(

Y i

)  var( 

i

)  Since

Y i

is linear in 

i

, so

Y i

N

(  Some cases are special: cov(

X i

, 

i

) 

E

(

X i

i

) 

E

(

X i

)

E

( 

i

)   

E

(

X

i

i X

)

i

,   2 ) 0   2 4

3.1.2.2 The explanatory variable must exhibit variation

i n

  1 (

X i

X

) 2  0 If the same amount of fertilizer is applied to the field, then it is impossible to learn the relationship between crop & fertilizer. 5

3.1.3 Normality & Linearity

Normal: 

i

n

.

i

.

d

.( 0 ,  2 )  ,  are linear in

Y i

( 

i

) So t-test applies even for small data size. Linear: in parameters.

3.1.4 Attention

For the current normal linear regression model, OLS is a best way to derive the estimators. However, 1) restriction on parameters. 2) regression system must be estimated simultaneously. 6

3.2 Least Squares Estimation

OLS NLS (nonlinear) GLS (heteroskedastic)

E

(

Y i

)    

X i

population regression line We want  ˆ ,  ˆ , function of sample data. They define a sample regression line:

Y i

ˆ   ˆ   ˆ

X i Y

ˆ

i

: fitted or predicted

e i

Y i

Y i

ˆ : residual Diagram 7

3.2.1 Estimation Method – Minimize the sum of squares function

-- Sum of squares function

S

i n

  1 (

Y i

EY i

) 2 

i n

  1 (

Y i

   

X i

) 2 In previous parts,

S

i n

  1 (

Y i

  ) 2 -- Least squares Normal Equations 

S

  

i n

  1 2 (

Y i

   

X i

)(  1 ) 

S

  

i n

  1 2 (

Y i

   

X i

)( 

X i

) 8

Optimal solution:

i n

  1 (

Y i

  ˆ   ˆ

X i

)  0

i n

  1 (

Y i

  ˆ   ˆ

X i

)

X i

 0    Using the definition of residuals:

i n

  1

e i

 0 ,

i n

  1

X i e i

 0 Properties of the residuals: 1) positives cancel with negatives 2) orthogonal 9

-- Solving the normal equations

i n

  1

Y i

n

 ˆ   ˆ

i n

  1

X i

Y

  ˆ   ˆ

X

 (

X Y

) &  ˆ 

Y

 ˆ

X i n

  1

X i Y i

  ˆ

i n

  1

X i

  ˆ

i n

  1

X i

2   

i n

  1

i n

  1

X i Y i X i

2 

n X Y

n X

2 In terms of deviation form, we have  ˆ 

i n

  1

x i y i i n

  1

x i

2 where

x i

X i

X

,

s XY

,

r XY

&  ˆ have the same sign.

y i

Y i

Y

10

e.g. use least squares to estimate a linear regression between the production cost and output for 145 electric power companies. (Nerlove (1963)) Solution: The least squares coefficient estimates are  ˆ  -0.741

and  ˆ  0.00643

. The sample regression line may be expressed as:

Y i

ˆ  -0.741

 0.00643

X i

11

3.2.2 Estimating

 2  From its definition:  2 

E

( 

i

2 ) 

E

(

Y i

EY i

) 2 

E

(

Y i

   

X i

) 2 

e

2 is the analog of  2 ,  So the estimator for

e i

Y i

  ˆ   ˆ

X i

 2 can be an average with numerator

i n

  1

e i

2 . (Sum of squared errors (SSE)).  To obtain the unbiased estimator,

s

2 

n

1  1

i n

  1

e i

2  Standard deviation:

s

s

2 standard error of the estimator. 12

3.3 Sampling Properties of the OLS estimators 3.3.1 Unbiasedness

We say  ˆ centered around is unbiased if its sampling distribution is  . -- OLS estimators are unbiased.

E

(  ˆ ) 

E

(  

x i x i

2

y i

) .   1

x i

2

E

( 

x i y i

) 13

E

(  ˆ ) 

E

(  

x i x i

2

y i

)   1

x i

2

E

( 

x i y i

) by Corollary B.1.1 in Appendix B   1

x i

2

E

( 

x i Y i

) by Law A.5 in Appendix A   1

x i

2 

x i E

(

Y i

) by Law B.3 in Appendix B   1

x i

2 

x

i

  

X i

) using the population regression line (4.8)   1

x

2 (

i

 

x i

  

x i

2 )   The last step uses Law A.4: the sum of any deviatioin-form variable is zero.

14

 ˆ 

Y

 ˆ

X

   

X

   ˆ

X

  

X

(    ˆ )  

E

(  ˆ )   

X E

(    ˆ ) 

E

(  )   -- Efficiency & Best Linear Unbiasedness.

(Gauss-Markov Theorem)

: In the context of the classical linear regression model

Y i

   

X i

 

i

i

~

i

.

i

.

d

.( 0 ,  2 ) the least squares estimators  ˆ and  ˆ have the smallest variances in the class of all linear unbiased estimators. 15

That is, least squares is BLUE. Note that  ˆ is indeed linear in the

Y i

values because it is of the form

E

(  ˆ ) 

E

(  

x i x i

2

y i

) 

i

x i

x i

2 Because  ˆ 

Y

X

and each of

Y

and the

Y i

, it follows that  ˆ is also linear.  ˆ are linear in 16

3.4 The sampling Distribution of

 ˆ

and

 ˆ  Both are linear function of

Y i

 They are normally distributed.  Since

E

(  ˆ )   ,

E

(  ˆ )   , so  ˆ ~

N

(  ,  ) ,  ˆ ~

N

(  ,  ) . -- The variances of the least squares estimators.  Precision of Estimation depends on the sample size.  var(  ˆ ) depends on the variation of the data.  When

X

is 0, var(  ˆ ) is minimized.   ˆ ~

N

(  ,  1 

n

2

X

x i

2 )) ,  ˆ ~

N

(  ,   2

x i

2 ) 17

V

(  ˆ ) 

V

(  

x i x i

2

y i

)  (  1

x i

2 ) 2

V

( 

x i y i

) by Corollary B.2.1 in Appendix B  (  1

x i

2 ) 2

V

( 

x i Y i

) by Law A.5 in Appendix A  (  1

x i

2 ) 2 

x i

2

V

(

Y i

) by Corollary B.4.1 in Appendix B  (  1

x i

2 ) 2  2 

x i

2 using the population regression line (4.8)    2

x i

2 18

-- Estimation the sampling variances.

s

 2 ˆ 

s

2 ( 1

n

 2 

X x i

2 ) ,

s

 2 ˆ  

s

2

x i

2 Since

E

(

s

2  ˆ ) 

E

( 

s

2

x i

2 )  2

E

 (

s x i

2 )    2

x i

2 

V

(  ˆ ) , so they are unbiased.  Estimated standard errors

s

2  ˆ 

s

2  ˆ 

s

1 

n

2 

X x i

2 ,

s

2  ˆ 

s

2  ˆ  

s x i

2 19

3.5 Hypothesis Testing

 Since  ˆ ~

N

(  ,   2

x i

2 ) , so  /  ˆ   

x i

2  Replace  with

s

 ˆ ~

N

( 0 , 1 )

Result 4.2

In the context of the classical normal linear regression model  ˆ  

s

 ˆ ~

t

(

n

 2 ) and  ˆ  

s

 ˆ ~

t

(

n

 2 )  Test some theoretical explanation for economic events. 

H

0 :    0 ,

H A

:    0 

t

  ˆ  

s

 ˆ ~

t

(

n

 2 ) 20

e.g. In Nerlove’s electricity generation data, do production costs depend on the level of output? (Nerlove (1963)) Solution:

H

0 :   0

H

1 :   0 Sample size n=145, 1% significant level’s critical value

t

0 .

005 (

n

 2 )  2 .

576

t

  ˆ   0

s

 ˆ  0 .

00643  0 0 .

0001719  37 .

40 This greatly exceeds the critical value, and so the null hypothesis is rejected. The conclusion is that Nerlove’s data offer strong evidence that production costs depend on the level of output. 21

3.6 Decomposition of sample variation

Y i

   

X i

 

i Y i

Y

ˆ

i

e i

Variation in

Y i

:

SST

i n

  1 (

Y i

Y

) 2 

i n

  1

y i

2 In the other way: So:

Y i

ˆ 

Y

 sum of squares (total)

Y

ˆ

i

  ˆ (

X i

 ˆ   ˆ

X

X

) 

i

 , ˆ

x i Y

  ˆ   ˆ

X SSR

i n

  1 (

Y

ˆ

i

Y

) 2   ˆ 2

i n

  1

x i

2

R

2 

SSR SST

  ˆ 2

i n

   1

y i

2

x i

2 If  ˆ  0 ,

R

2  0 . 22

23

-- Relationship to the correlation coefficient.

r XY

 

x i

2

x i y i

 

y i

2

R

2   ˆ 2  

x i

2

y i

2  (  

x i x i

2

y i

) 2  

x i

2

y i

2  ( 

x i

2

x i y i

  ) 2

y i

2  2

r XY

e.g. (Norlove) the decomposition of sample variation is

SST

 56422 .

86 ,

SSR

 51190 .

39 ,

SSE

 5232.47

What is the

R

2 and how does it relate to the sample correlation?

R

2 

SSR SST

 51990.39

56422.86

 0.9073

The correlation between variables is given as

r XY

 0.9525

.

r

2

XY

 ( 0.9525) 2  0.9073

R

2 24

Analysis of Variance for Simple Regression:

Source of variation

Regression

Sum of

SSR

squares

  ˆ 2 

x i

2

Degrees of freedom

1

Mean square

SSR

F

SSE SSR

/(

n

 2 ) Error

SSE

 

e i

2 n-2

SSE n

 2 Total

SST

 

y i

2 n-1 -- SSR and SSE are statistically independent. -- F statistic is the ratio of two independent chi-square random variables, each divided by its respective degrees of freedom. 25

e.g. Present the ANOVA table for the cost function regression using Nerlove’s data. Perform an

F

test of the hypothesis   0 . ANOVA:

Source of variation

Regression Error Total

Sum of squares

51190.39 5232.47 56422.86

Degree of freedom

1 143 144

Mean square

51990.39 36.591

F

1399.0

F

SSE SSR

/(

n

 2 )  51990 .

39 5232 .

47 / 143  1399 .

0

F

F

0 .

01 ( 1 , 143 )  6 .

63

H

0 is rejected at the significant level a=0.01.

F

statistic is the square of the t statistic.

F

 1399 .

0  37 .

40 

t

2

F

0 .

01 ( 1 , 143 )  6 .

63  2.567

2  2

t

0 .

005 ( 143 ) 26

-- Special case

Y i

   

i

 ˆ 

Y e i

so

Y i

  ˆ 

e i

Y

e i

 

Y i y i

2 

Y

ˆ  

Y i

e i

2 -

Y

y i

all sample variation is attributed to the unsystematic component.

-- Generalized case

Y i

Y

 (

Y i

Y

ˆ

i

)  (

Y

ˆ

i

Y

) 

e i

 (

Y

ˆ

i

Y

)

SST

    [ (

Y i

ˆ (

Y

ˆ

i

*  

Y e i

 

Y Y SSR

 

Y SSE

e i

) ) 2   

e i

0 ] 2  

e i

2  (

Y

ˆ

i

Y

) 2  

e i

2  2  (

Y

ˆ

i

Y

)

e i

(sum of squares of the regression+sum of squared errors) 

Y

ˆ

i e i

  (  ˆ   ˆ

X i

)

e i

  ˆ 

e i

  ˆ 

X i e i

 0 27

3.6.1

R

2

(coefficient of determination)

R

2 

SSR SST

 1 

SSE SST R

2  [0,1]

3.6.2 Analysis of Variance & the F test

H

0 :   0 ,

H

A :   0 Whether

Y i

   

i

or

Y i

   

X i

 

i

or systematic component SSR is relatively large in relation to unsystematic component SSE.

F

SSE SSR

/(

n

 2 ) ~

F

( 1 ,

n

 2 )

F

F a

( 1 ,

n

 2 ) reject

H

0 28

Relationship between

t

-test and

F

-test

F

SSE SSR

/(

n

 2 )   ˆ 2

EX i

2

s

2   ˆ 2

s

2 /

EX i

2   ˆ (

s

 ˆ ) 2 

t

2

F

( 1 ,

v

) 

t

2 (

v

) Analysis of Variance (ANOVA) Review of F-statitics

3.7 Presetation of Regression Results

29