Ch.6 Simple Linear Regression: Continued

Download Report

Transcript Ch.6 Simple Linear Regression: Continued

6.1

Ch.6 Simple Linear Regression: Continued

To complete the analysis of the simple linear regression model, in this chapter we will consider • how to measure the variation in

y t

, that is explained by the model • how to report the results of a regression analysis • how changes in the units of measurement affect the estimates • some alternative functional forms that may be used to represent possible relationships between

y t

and

x t

.

6.2

The Coefficient of Determination (R

2

)

Two major reasons for analyzing the model

y =

1 +

2 x + e

are • To explain how the dependent varaible (

y t

) changes as the independent variable (

x t

) changes • To predict y o given x o .

We want the independent variable (

x t

) to explain as much of the variation in the dependent variable (

y t

) as possible. We introduced the independent variable (

x t

) in hope that its variation will explain the variation in y A measure of goodness of fit will measure how much of the variation in the dependent variable (

y t

) has been explained by variation in the independent variable (

x t

).

6.3

Separate

y t

into its

explainable

and

unexplainable

components:

y t

E

(

y t

) 

e t

where

E

(

y t

)   1   2

x t

is explainable.

The error term

e t

is unexplainable. Using our estimates for  1 and  2 , we get estimates of E(y t ) and our residuals give us estimates of the error terms.

e

ˆ

t

ˆ

t

 

y t

b

1

y t

ˆ

t

 

b

2

x t

ˆ

t

e

ˆ

t

Residual is defined as the difference between the actual and the predicted values of y.

6.4

The total variation in

y t

is measured as the sum of the squared deviations from the mean:  (

y t

y

) 2 Also known as SST (Total Sum of Squares) A single deviation of

y t

from its mean can be split into two parts:

y t

y

 ˆ

t

e

ˆ

t

y

The sum of squared deviations from the mean is:  (

y t

y

) 2         ( ( ( ((

t

e

ˆ

t

y

) 2

t t t

  

y

)

y

) 2

y

) 2   

e

ˆ

t

) 2  

e

ˆ

t

2

e

ˆ

t

2  2  (

y

ˆ

t

This term is zero 

y

)

e

ˆ

t

Graphically, a single y deviation from mean can be split into the two parts: 6.5

y y

ˆ

t y t

Total Variation

y t

y

e

ˆ

t

Unexplained 

y t

 ˆ

t y t

Explained 

y ˆ y

b

1 

b

2

x x t

Analysis of Variance (ANOVA):  (

y t

y

) 2   (

t

y

) 2  

e

ˆ

t

2

SST = SSR + SSE

Where: SST: Total Sum of Squares with T-1 degrees of freedom. It measures the total variation in the actual y t values about its mean.

SSR: Regression Sum of Squares with 1 degree of freedom. It measures the variation in the predicted values of

y t

about their mean. It is the part of the total variation that is explained by the model.

SSE: Error Sum of Squares with T-2 degrees of freedom. It measures the variation in the actual

y t

values about the predicted

y t

values. It is the part of the total variation that is left unexplained. 6.6

6.7

R 2 = SSR/SST = 1 – SSE/SST Multiple R R Square Adjusted R Square Standard Error Observations ANOVA Regression Residual Total Intercept x 0.563132517

0.317118231

0.299147658

37.80536423

40 SST SSR SSE

df

1 38 39

SS MS F

25221.22299 25221.22299 17.64652878

54311.33145 1429.245564

79532.55444

Significance F

0.00015495

Coefficients

40.76755647

0.128288601

Standard Error t Stat P-value

22.13865442 1.841464964 0.073369453

0.030539254 4.200777164

0.00015495

Lower 95% Upper 95%

-4.049807902 85.58492083

0.066465111 0.190112091

Coefficient of Determination: R 2

R

2 

SSR SST

 1 

SSE SST

 1   ( 

e

ˆ

t

2

y t

y

) 2 • R 2 is the proportion of the total variation (SST) that is explained by the model. We can also think of it as one minus the proportion of the total variation that is unexplained (left in the residuals). • 0  R 2  1 • The closer R2 is to 1.0, the better the fit of the model and the greater is the predictive ability of the model over the sample.

• If R 2 =1  the model has explained everything. All the data points lie on the regression lie (very unlikely). There are no residuals. • If R 2 = 0  the model has explained nothing. 6.8

y

Graph A R 2 appears to be 1.0. All data Points lie on a line.

6.9

y

Graph B

x

R 2 appears to be 0. The best line thru these points appears to have a slope of zero.

x

y

Graph C R 2 appears to be close to 1.0.

6.10

y

Graph D

x

R 2 appears to be greater than 0 but less than R 2 in graph C.

x

• In the food expenditure example, R 2 = 0.317  “31.7% of the total variation in food expenditures has been explained by variation in household income.” • More Examples: 6.11

6.12

Correlation Analysis

• Correlation coefficient between x and y is: • The Sample Correlation between x and y is:

r

  

Cov

(

x

,

y

)

Var

(

x

)

Var

(

y

)

C o

ˆ

v

(

x

,

y

)

V a

ˆ

r

(

x

)

V a

ˆ

r

(

y

) • It is always true that -1  r  1 • It measures the strength of a linear relationship between x and y.

 

T

1  1  (

x t

x

)(

y t

y

)

T

1  1   ( 

x t

( 

x t x

( 

x t

) 2

x

x

) 2 )(

y t

 ( 

y t T y

)  1  1

y

) 2  (

y t

y

) 2

Correlation and R

2 • It can be shown that the square of the sample correlation coefficient for x and y is equal to R 2 .

• R 2 can also be computed as the square of the sample correlation coefficient for the y values and ˆ • It can also be shown that

b

2 

r s y s x

6.13

6.14

Reporting Regression Results

y

ˆ

t

 40 .

768  0 .

1283

x t

(s.e.) (22.139) (0.0305) R

2

= 0.317

• The numbers in parentheses are the standard errors of the coefficients estimates. These can be used to construct the necessary t-statistics to ascertain the significance of the estimates. • Sometimes, authors will report the t-statistic instead of the standard error. This would be the t-statistic for the H o :  = 0

t

 40 .

768  0 .

1283

x t

(t-stat) (1.841) (4.201) R

2

= 0.317

Units of Measurement

b

1 

y

b

2

x b

2   (

x

t

 (

x x t

)(

y t

x

) 2 

y

) b 1 is measured in “y units” b 2 is measured in “y units over x units” Example 3.15 from Chapter 3 Exercises y = number of sodas sold x = temperature in degrees ( o F)

y t

  240  6

x

If x o So b 1 = 0 o then the model predicts: is measured in y units (# of sodas).

o

  240 b 2 = 6 where 6 is in (# of sodas / degrees).

If x increases by 10 degrees 

^

by 60 sodas  ˆ

o

 6 

x

6.15

6.16

Let newx = x/100. • no change to b 1 • b 2 increases by 100 SUMMARY OUTPUT

Regression Statistics

Multiple R 0.563132517

R Square Adjusted R Square 0.317118231

0.299147658

Standard Error Observations 37.80536423

40 ANOVA Regression Residual Total

df

1 38 39

SS MS F

25221.22299 25221.22299 17.64652878

54311.33145 1429.245564

79532.55444

Significance F

0.00015495

Intercept newx

Coefficients

40.76755647

12.82886011

Standard Error t Stat P-value

22.13865442 1.841464964 0.073369453

3.053925406 4.200777164

0.00015495

Lower 95% Upper 95%

-4.049807902 85.58492083

6.646511122 19.01120909

6.17

Functional Forms

A linear model is one that is linear in the

parameters

with an additive error term.

The coefficient 

2 y =

1 +

2 x + e

measures the effect of a one unit change in x on y. As the model is written above, this effect is assumed to be constant: However, we want to have the ability to model relationships among economic variables where the effect of x on y is not constant. Example: our food expenditure example assumes that the increase in food spending from an additional dollar of income was the same whether the family had a high or low income. We can capture these effects using logs, powers and reciprocals yet still maintain a model that is linear in the

parameters

with an additive error term.

The Natural Logarithm

• We will use the derivative property often: • Let y be the log of X: y = ln(x)  dy/dx = 1/x or dy = dx/x • This means that the absolute change in the log of X is equivalent to the relative change in the level of X.

Let x=50  ln(x) = 3.912

Let x=52  ln(x) = 3.951  dln(x) = 3.951 – 3.912 = 0.039

The absolute change in ln(x) is 0.039, which can be interpreted as a relative change in X (X increases from 50 to 52, which, in relative terms, is 3.9%) 6.18

y t

  1   2 ln(

x

) 

e t

Using Logs

What does  2 measure?

6.19

ln(

y t

)   1   2

x t

e t

What does  2 measure?

ln(

y t

)   1   2 ln(

x t

) 

e t

What does  2 measure?

Example: Y: food $, X: Weekly Income

y t

  1   2

ln(

x

)

e t

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.571864156

R Square 0.327028613

Adjusted R Square Standard Error Observations 0.30931884

37.5300348

40 ANOVA Regression Residual Total

df

xbar=698 ybar=130 1 38 39

SS

26009.42098

53523.13346

79532.55444

MS F

26009.42098 18.46599654

1408.503512

Intercept lnx

Coefficients

-415.5556981

83.91235619

Standard Error t Stat P-value

127.1672145 -3.267789578 0.002303753

19.52718051

4.297207994 0.000115804

6.20

SUMMARY OUTPUT

Regression Statistics

Multiple R R Square 0.586638275

0.344144465

Adjusted R Square Standard Error Observations 0.326885109

0.27659846

40 Dep Variable is lny ANOVA Regression Residual Total

df

1 38 39

SS MS

1.525512305 1.525512305

2.907254917 0.076506708

4.432767221

F

19.9395888

Intercept x

Coefficients

4.118265149

0.00099773

Standard Error t Stat

0.161974838 25.42533897

0.000223437

4.46537667

P-value

1.84238E-25 6.94104E-05 Log- Linear Model Double Log Model SUMMARY OUTPUT

Regression Statistics

Multiple R R Square 0.633562676

0.401401664

Adjusted R Square Standard Error Observations 0.385649077

0.264249039

40 Dep Variable is lny ANOVA Regression Residual Total

df

1 38 39

SS

1.77932014

MS

1.77932014

2.653447081 0.069827555

4.432767221

F

25.48163324

Intercept lnx

Coefficients

0.299762103

0.694044986

Standard Error t Stat

0.895384575 0.334785869

0.137490911 5.047933561

P-value

0.73962763

1.14292E-05 6.21