Forecasting by Regression

Download Report

Transcript Forecasting by Regression

Regression Method

1

Chapter Topics

• Multiple regression • Autocorrelation Slide 2

Regression Methods

• To forecast an outcome (response variable, dependent variable) of a study based on a certain number of factors (explanatory variables, regressors).

• The outcome has to be quantitative but the factors can either by quantitative or categorical.

• Simple Regression deals with situations with one explanatory variable, whereas multiple regression tackles case with more than one regressors. Slide 3

Simple Linear Regression

– Collect data Population Random Sample

Y

b

0 

b

1

X

e

Unknown Relationship

Y i

  0   1

X i

 

i

J

$

J

$

J

$

J

$

J

$

Slide 4

Multiple Regression

• Two or more explanatory variables • Multiple linear regression model

Y

  where  0   1

X

1   2

X

2  ...

 

p

is the error term and

X

  

p ~ N

(0,  2 ) • Multiple Linear Regression Equation

E

(

Y

)   0   1

X

1   2

X

2  ...

 

p X p

• Estimated Multiple Linear Regression Equation

Y

ˆ 

b

0 

b

1

X

1 

b

2

X

2  ...

b p X p

Slide 5

Multiple Regression

• Least Squares Criterion min

i n

  1

e i

2  min

i n

  1 (

Y i

Y

ˆ

i

) 2 • The formulae for the regression coefficients b 0 , b 1 , b 2 , . . .

b p

involve the use of matrix algebra. We will rely on computer software packages to perform the calculations.

b

i

represents an estimate of the change in Y corresponding to a one-unit change in X

i

when all other independent variables are held constant.

Slide 6

Multiple Regression

R

2 =SSR/SST=1-SSE/SST

• Adjusted R 2

R a

2

SST R a

/(  

n

 1) 1)

R

2 )

n n

 1 1 where n is the number of observations and p is the number of independent variables • The Adjusted R 2 compensates for the number of independent variables in the model. It may rise or fall. • It will fall if the increase in R in the degrees of freedom.

2 due to the inclusion of additional variables is not enough to offset the reduction Slide 7

Test for Significance

• Test for Individual Significance: t test – Hypothesis

H

0 : 

i

 0

H a

: 

i

 0 – Test statistic

t

b i s b i

– Decision rule: reject the null hypothesis at α significance if •

t

t

(

n

p

 1 ;  2 ) • p-value < α , or level of Slide 8

Test for Significance

• Testing for Overall Significance: F test – Test whether the multiple regression model as a whole is useful to explain Y, i.e., at least one X– variable in the regression model is useful to explain Y.

– Hypothesis

H

0 : all slope coefficients are equal to zero

H a

(i.e.

β

1 =

β

2 =…=

β p

=0) : not all slope coefficients are equal to zero Slide 9

Test for Significance

• Testing for Overall Significance: F test – Test statistic

F

MSR MSE

SSE SSR

(

n

p p

 1 )   (

Y i

  (

Y

ˆ

i Y

ˆ

i

) 2 

Y

(

n

) 2 

p p

 1 ) – Decision rule: reject null hypothesis if • F > F

α

is based on an F distribution with p degrees of freedom in the numerator and n p –1 degrees of freedom in the denominator, or • p-value <

α

Slide 10

Example: District Sales

• Use both target population and per capita discretionary income to forecast district sales.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 District i Sales (gross of jars; 1 gross = 12 dozens) Y i 162 120 223 131 67 169 81 192 116 55 252 232 144 103 212 Target population (‘000 persons) X 1i 274 180 375 205 86 265 98 330 195 53 430 372 236 157 370 Per capita discretionary income ($) X 2i 2450 3254 3802 2838 2347 3782 3008 2450 2137 2560 4020 4427 2660 2088 2605 Slide 11

Example: District Sales

• Excel output Slide 12

Example: District Sales

• Multiple regression model

Y

  0   1

X

1   2

X

2   where

Y

= district sales

X

1 = target population

X

2 = per capita discretionary income • Multiple Regression Equation Using the assumption E(  ) = 0, we obtain

E

(

Y

)   0   1

X

1   2

X

2 Slide 13

Example: District Sales

• Estimated Regression Equation

b

0 , b 1 , b 2 are the least squares estimates of  0 ,  1 ,  2 . Thus

Y

ˆ 

b

0 

b

1

X

1 

b

2

X

2 • For this example,

Y

ˆ  3 .

4526  0 .

4960

X

1  0 .

0092

X

2 – Predicted sales are expected to increase by 0.496 gross when the target population increases by one thousand, holding per capita discretionary income constant.

– Predicted sales are expected to increase by 0.0092 gross when per capita discretionary income increase by one dollar, holding population constant.

Slide 14

Example: District Sales

t Test for Significance of Individual Parameters – Hypothesis

H

0 : 

i

 0

H a

: 

i

 0 – – Decision rule For  = .05 and d.f. = 15 – 2 – 1 = 12, t .025

– Test statistic Reject H 0 if |t| > 2.179

t

b

1  

t

b

2  0 .

00920 = 2.179

s b

1 0 .

49600 0 .

00605 81 .

92

s b

2 0 .

000968 9 .

50 Conclusions Reject H 0 :  1 = 0 Reject H 0 :  2 = 0 Slide 15

Example: District Sales

• To test whether sales are related to population and per capita discretionary income – Hypothesis

H

0 :

β

1 =

β

2 =0 –

H a

: not both Decision Rule For  Reject H 0 – Test statistic – Conclusion Reject H 0

β

1 and if F > 3.89.

β

2 discretionary income. equal to zero = .05 and d.f. = 2, 12: F .05

= 3.89

F = MSR/MSE = 26922/4.74 = 5679.47

, sales are related to population and per capita Slide 16

Example: District Sales

• R 2 = 99.89% means that 99.89% of total variation of sales can be explained by its linear relation with population and per capita discretionary income.

• R a 2 = 99.88%. Both R 2 fits the data very well.

and R a 2 mean the model Slide 17

Regression Diagnostics

• Model assumptions about the error term  – The error  zero, i.e., E( is a random variable with mean of  ) = 0 – The variance of for all values of the independent variable(s), i.e., Var(  ) =  2  , denoted by  2 , is the same – The values of  – The error variable.

 are independent.

is a normally distributed random Slide 18

Regression Diagnostics

• Residual analysis: validating model assumptions • Calculate the residuals and check the following.

– Are the errors normally distributed?

• Normal probability plot – Is the error variance constant?

• Plot of residuals against

y

ˆ – Are the errors uncorrelated (time series data)?

• Plot of residuals against time periods – Are there observations that are inaccurately recorded or do not belong to the target population?

• Double check the accuracy of outliers and influential observations.

Slide 19

Autocorrelation

• Autocorrelation is present if the disturbance terms are correlated. Three issues need to be addressed.

– How does autocorrelation arise?

– How to detect autocorrelation?

– Alternative estimation strategies under autocorrelation Slide 20

Causes of Autocorrelation

1. Omitting relevant regressors Suppose the true model is

Y t

  0   1

X

1

t

  2

X

2

t

 

t

But the model is mis-specified as

Y t

  0   1

X

1

t

 

t

That is, 

t

  2

X

2

t

 

t

If X 2t is correlated with X 2,t-1 ,

ν t

is also correlated with

ν

t-1 . This is particularly serious if X 2t represents a lagged dependent variable.

Slide 21

Causes of Autocorrelation

2. Specification errors in the functional form Suppose the true model is

Y t

  0   1

X t

  2

X t

2  

t

But the model is mis-specified as

Y t

  0   1

X t

 

t

ν t would tend to be positive for X<A and X>B, and negative for A<X<B. Slide 22

Causes of Autocorrelation

3. Measurement errors in the variables ν Suppose Y

t

= Y

t

* + ν t where Y is the observed value, Y* is the true value and is the measurement error. Hence, the true model is

Y t

*   0   1

X

1

t

  2

X

2

t

 ...

 

p X pt

 

t

and the observed model is

Y t

  0   1

X

1

t

  2

X

2

t

 ...

 

p X pt

 (    

t

)

u t

Given a “common” measurement method, it is likely that measurement errors in period t and t-1 are correlated.

Slide 23

Causes of Autocorrelation

4. Pattern of business cycle Time-series data relating to business and economics often exhibit pattern of business cycle. Sluggishness during recession persists over a certain time period while prosperity in bloom continues for a certain duration of time. It is apparent that successive observations tend to be correlated.

Slide 24

Testing for First Order Autocorrelation

• First-order autocorrelation – The error term in time period t is related to the error term in time period t–1 by the equation

ε t

=

ρε t

-1 +

a t

, where of first order autocorrelation

a t ~ N

(0,

σ a

2 ).

– Use Durbin-Watson test to test the existence Slide 25

Testing for First Order Autocorrelation

• Durbin-Watson test – For positive autocorrelation H 0 : The error terms are not autocorrelated (

ρ

H a = 0) : The error terms are positively autocorrelated (

ρ

– For negative autocorrelation H 0 : The error terms are not autocorrelated (

ρ

H a = 0) : The error terms are negatively autocorrelated (

ρ

– For positive or negative autocorrelation H 0 : The error terms are not autocorrelated (

ρ

H a : The error terms are positively or negatively autocorrelated (

ρ

 0) – Test statistic

DW

t n

  2 (

e t n

e t

 1 ) 2 = 0) > 0) < 0)

t

  1

e t

2 Slide 26

Testing for First Order Autocorrelation

DW

t n

  2 (

e t

e t

 1 ) 2

t n

  1

e t

2 

t n

  2

e t

2 

t n

  2

e t

2  1

t n

  1

e t

2 

n

2   2

t e t e t

 1 

t n

  1

e t

2 

e

1 2 

t n

  1

e t

2

t n

  1

e t

2 

e n

2 

n

2   2

t e t e t

 1 

n

2 

t

 1

e t

2 

n

2   2

t e t e t

 1

t n

  1

e t

2  (

e

1 2 

e n

2 )  2 ( 1 

r

) 

e

1 2 

e n

2

t n

  1

e t

2 , where r is the sample autocorrelation coefficient expressed as

r

t n

  2

e t e t

 1

t n

  1

e t

2 Slide 27

Testing for First Order Autocorrelation

• In “large samples”, DW – If the disturbances are uncorrelated, then r = 0 and DW  2 – If negative first order autocorrelation exists, then r<0 and DW > 2 – If positive first order autocorrelation exists, then r>0 and DW < 2 • Exact critical values of the Durbin-Watson test cannot be calculated. Instead, Durbin-Watson established upper (d the critical values. They are for testing first order autocorrelation only.

U

 2(1–r) ) and lower (d

L

) bounds for Slide 28

Testing for First Order Autocorrelation

• Test for positive autocorrelation H 0 H a : :

ρ ρ

= 0 > 0 • Decision rules – If DW <

d

L, α , we reject H 0 .

– If DW >

d

U, α , we do not reject H 0 .

– If

d

L, α ≤ DW ≤

d

U, α , the test is inconclusive.

Slide 29

Example: Company Sales

• The Blasidell Company wished to predict its sales by using industry sales as a predictor variable. Year 1977 1978 1979 1980 1981 Quarter 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 X 127.3 130 132.7 129.4 135 137.1 141.2 142.8 145.5 145.3 148.3 146.4 150.2 153.1 157.3 160.7 164.2 165.6 168.7 171.7 Y 20.96 21.4 21.96 21.52 22.39 22.76 23.48 23.66 24.1 24.01 24.54 24.3 25 25.64 26.36 26.98 27.52 27.78 28.24 28.78 Slide 30

Example: Company Sales

• From the scatter plot, a linear regression model is appropriate 29 28 27 26 25 24 23 22 21 20 130 140 150

X

160 170 Slide 31

Example: Company Sales

• SAS output Slide 32

Example: Company Sales

• Estimated regression equation

Y

ˆ   1 .

45475  0 .

17628

X

• The market research analyst was concerned with the possibility of positively correlated errors. Using the Durbin-Watson test: H H a 0 :

ρ

:

ρ

= 0 > 0 Slide 33

Example: Company Sales

DW

  0 .

735

t

20   2 (

e t

e t

 1 ) 2

t

20   1

e t

2  0 .

09794 0 .

13330 Suppose

d L

α = 0.01. For n=20 (n denotes the number of observations) and k the number of independent variables), = 0.95 and d

U ’

=1 (k’ denotes =1.15.

Since DW < d

L

autocorrelated.

, we conclude that the error terms are positively et -0.02605

-0.06202

0.02202

0.16375

0.04657

0.04638

0.04362

-0.05844

-0.0944

-0.14914

-0.14799

-0.05305

-0.02293

0.10585

0.08546

0.1061

0.02911

0.04232

-0.04416

-0.03301

et-1 -0.02605

-0.06202

0.02202

0.16375

0.04657

0.04638

0.04362

-0.05844

-0.0944

-0.14914

-0.14799

-0.05305

-0.02293

0.10585

0.08546

0.1061

0.02911

0.04232

-0.04416

et-et-1 (et-et-1)^2 et^2 0.000679

-0.03597 0.001294 0.003846

0.08404

0.14173

0.007063 0.000485

0.020087 0.026814

-0.11718 0.013731 0.002169

-0.00019

3.61E-08 0.002151

-0.00276

7.62E-06 0.001903

-0.10206 0.010416 0.003415

-0.03596 0.001293 0.008911

-0.05474 0.002996 0.022243

0.00115

1.32E-06 0.021901

0.09494

0.03012

0.12878

0.009014 0.002814

0.000907 0.000526

0.016584 0.011204

-0.02039 0.000416 0.007303

0.02064

0.000426 0.011257

-0.07699 0.005927 0.000847

0.01321

0.000175 0.001791

-0.08648 0.007479

0.00195

0.01115

0.000124

sum= 0.097942

0.00109

0.1333

Slide 34

Testing for First Order Autocorrelation

• Remark – In order to use the Durbin-Watson table, there must be an intercept term in the model.

Slide 35

Testing for First Order Autocorrelation

• Test for negative autocorrelation H 0 H a : :

ρ ρ

= 0 < 0 • Decision rules – If 4 – DW <

d

L, α , we reject H 0 .

– If 4 – DW >

d

U, α , we do not reject H 0 .

– If

d

L, α ≤ 4 – DW ≤

d

U, α , the test is inconclusive.

Slide 36

Testing for First Order Autocorrelation

• Test for positive or negative autocorrelation H 0 H a : :

ρ ρ

= 0  0 • Decision rules – If DW <

d

L, α/2 – If DW >

d U,

α/2 reject H 0 .

or 4 – DW <

d

L, α/2 , we reject H 0 .

and 4 – DW > – If

d

L, α/2 ≤ DW ≤

d

U, α/2 or the test is inconclusive.

d

L, α/2

d

U, α/2 , we do not 4 – DW ≤

d

U, α/2 , Slide 37

Testing for First Order Autocorrelation

• Remarks – The validity of the Durbin-Watson test depends on the assumption that the population of all possible residuals at any time t has a normal distribution.

– Positive autocorrelation is found in practice more commonly than negative autocorrelation.

– First-order autocorrelation is not the only type of autocorrelation.

Slide 38

Solutions to Autocorrelation (1)

1.

2.

Re-examine the model. The typical causes of autocorrelation are omitted regressors or wrong functional forms.

Go for alternative estimation strategy. Several approaches are commonly used. The approach considered here is the two-step Cochrane-Orcutt procedure.

Consider the following model with AR(1) disturbances : (1)

t

, with 

t

 

t

 1 

u t

.

Slide 39

Solutions to Autocorrelation (2)

Since equation (1) holds true for all observation, in terms of the (t-1) th observation, we have (2)

Y t

 1  1 2

X t

 1 where 

t

 1  

t

 2 

u t

 1 .

 

t

 1 , Now, multiply (2) by  , we obtain (3) 

Y t

 1    2

X t

 1  

t

 1 , Subtracting (3) from (1), we get (

Y t

 

Y t

 1 )  ( 1  1  2 (

X t

 

X t

 1 )  ( 

t

 

t

 1 ) That is, (4)

Y t

*   1 *   2

X t

* 

u t

Note that the u t ’s are uncorrelated. However,  estimated.

is unknown and needs to be Slide 40

Two-step Cochrance-Orcutt

1.

Estimate equation (1) by Least Squares method and obtain the resulting residuals e t ’s. Regress e t =  e t-1 + u t and obtain

r

t n

  2

t n

  2

e e t t

 1

e t

2  1 2.

Substitute r into equation (4) and obtain OLS estimates of coefficients based on equation (4).

Slide 41

 The following table represents the annual U.S. personal consumption expenditure (C) in billions of 1978 dollars from 1976 to 1990 inclusively : Slide 42

 An OLS linear trend model has been fitted to the above data, and it gives the following residuals : Slide 43

 To test for positive first order autocorrelation in the error and hence estimate a model for this error process, consider H 0 H a :  = 0 :   0  Using the Durbin-Watson test,

DW

t

15   2 (

e t

t

15   1

e t

2

e t

 1 ) 2   .

Slide 44

  When k ’ =1 and n=15, d l = 1.08, d u = 1.36

Hence we reject H 0 By regressing e t on e t-1 , we obtain r = 0.79

Hence the error process is

e t

e t

 1 

u t

Re-estimate the trend model for consumption using the two-step Cochrane-Orcutt procedure.

Slide 45

 Using the transformed model

C t

rC t

 1   1 ( 1 

r

)   2 [

t

  1 )] 

u t

with t=1 indicating year 1976, sequentially until t=15 representing year 1990, the transformed data are tabulated in following table.

Slide 46

 Applying OLS to the transformed data yields

C t

*  

t

* or

C t

  That is,   2 

C t

 1  ,

t

*   1  1   are parameter estimates of the original model.

Slide 47

Note that 1. Because lagged values of Y and X had to be formed, we are left with n-1 observations only 2. The estimate r is obtained based on OLS estimation assuming a standard linear regression model satisfying all classical assumptions. It may not be efficient estimator of r. This leads to the iterative Cochrane-Orcutt estimator.

Slide 48

Chapter Summary

• Simple linear regression • Multiple regression • Regression on Dummy Variables • Autocorrelation – Durbin-Watson test – Two step Cochrane Orcutt procedure Slide 49