Transcript Chapter 18

Multiple Regression

Reference: Chapter 18 of

Statistics for Management and Economics, 7 th Edition

, Gerald Keller. 1

Earlier: Simple Linear Regression

Y

Y X + randomness

  0   1

X

  • Now we extend the simple linear regression model to

multiple regression

that allows any number of independent variables.

• We expect to build a model that fits the data better than the simple linear regression mode 2

Multiple regression: Two or more independent variables

?

?

X

1

X

3 ?

X

2

X Y

6

X

8 ?

X

4

X

7

X

5 ?

3

Multiple Regression Model: Y =  0 +  1 X 1 +  2 X 2 + …+  k X k +  Y – dependent variable X 1 , X 2 , …, X k -

k

independent variables An independent variable can be function of others: Eg. X 2 =X 1 2 ,X 5 =X 3 X 4 , ....

4

Eg:

Y

return

X 1 X 2 X 3

share price interest inflation 5

Motive:

• More ”realistic” models • Better predictions • Separate effects of the different variables 6

y The simple linear regression model allows for one independent variable, “x” y =  0 +  1 x +  X 7

y X 2 X 1 The multiple linear regression model allows for more than one independent variable.

Y =  0 +  1 x 1 +  2 x 2 +  8

Note how the straight line becomes a plain, and...

y X 1 The multiple linear regression model allows for more than one independent variable.

Y =  0 +  1 x 1 +  2 x 2 +  X 2 9

Required conditions for the error variable

• The error  is normally distributed.

• The mean is equal to zero and the standard deviation is constant ( s  ) for all values of y. • The errors are independent.

10

Earlier:

Procedure for Regression Diagnostics • …… • …….

• …....

• Determine the regression equation.

• Check the required conditions for the errors.

• …….

• …… • If the model fits the data, use the regression equation.

11

Now

Estimate  0 ,  1 ,  2 ,…  k with the estimates b 0 , b 1 , b 2 , …, b k using the LS-method.

Use SPSS for estimation and get the least square regression equation 

b

0 

b

1

x

1 

b

2

x

2  ....

b k x k

In practice, before trying to interpret regression coefficients, we check how well the model fits 12

Estimating the Coefficients and Assessing the Model • The procedure used to perform regression analysis: – Obtain the model coefficients and statistics using a statistical software.

– Diagnose violations of required conditions. Try to remedy problems when identified.

– Assess the model fit using statistics obtained from the sample.

– If the model assessment indicates good fit to the data, use it to interpret the coefficients and generate predictions.

13

Estimating the Coefficients and Assessing the Model, Example •

Example: Where to locate a new motor inn?

– La Quinta Motor Inns is planning an expansion.

– Management wishes to predict which sites are likely to be profitable.

– Several areas where predictors of profitability can be identified are: • Competition • Market awareness • Demand generators • Demographics • Physical quality 14

Profitability Margin Competition Market awareness Customers Community Physical Rooms Number of hotels/motels rooms within 3 miles from the site.

Nearest Distance to the nearest La Quinta inn.

Office space College enrollment Income Median household income.

Disttwn Distance to downtown.

15

Data were collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model:

Margin =

 0  3

Office

6 Disttwn

  1

Rooms

  2

Nearest

   4

College +

5 Income +

Margin 55.5

33.8

49 31.9

57.4

49 Number 3203 2810 2890 3422 2687 3759 Nearest 4.2

2.8

2.4

3.3

0.9

2.9

Office Space 549 496 254 434 678 635 Enrollment 8 17.5

20 15.5

15.5

19 Income 37 35 35 38 42 33 Distance 2.7

14.4

2.6

12.1

6.9

10.8

16

MARGIN NUMBER NEAREST OFFICE_S ENROLLME INCOME DISTANCE Descriptive Statistics Mean Std. Deviation 45,7390 7,75213 2985,21 447,979 2,3100 ,88563 492,23 165,652 16,0750 36,2200 7,1610 4,21480 4,05662 3,70053 N 100 100 100 100 100 100 100

17

Regression Analysis, SPSS Output

Margin = 37.149 - 0.008Number +1.591Nearest

+ 0.020Office Space +0.196Enrollment

+ 0.421Income - 0.004Distance

Model 1 (Constant) NUMBER -,008 ,001 Coefficients a Unstandardized Coefficients B 37,149 Std. Error 7,044 Standardized Coefficients Beta -,447 NEAREST INCOME 1,591 ,651 ,182 OFFICE_S ,020 ,003 ,418 ENROLLME ,196 ,134 ,107 ,421 ,141 ,220 DISTANCE a. Dependent Variable: MARGIN -,004 ,156 -,002 t Sig.

5,274 ,000 -6,112 ,000 95% Confidence Interval for B Lower Bound 23,162 -,010 2,442 ,016 ,297 Upper Bound 51,136 -,005 2,884 5,690 ,000 ,013 ,026 1,461 ,147 -,070 ,463 2,994 ,004 ,142 ,701 -,028 ,978 -,314 ,305

18

Model Assessment

• The model is assessed using three tools: – The standard error of estimate – The coefficient of determination – The F-test of the analysis of variance • The standard error of estimates participates in building the other tools.

19

Standard Error of Estimate

not known • It is estimated by the

Standard Error of

Estimate:

s   n SSE  k  1 • The magnitude of s  comparing it with .y

is judged by 20

Model Summary Change Statistics Model R R Square Adjusted R Square Std. Error of the Estimate R Square Change F Change 1 ,719 a ,517 ,486 5,55895 ,517 a. Predictors: (Constant), DISTANCE, INCOME, NUMBER, ENROLLME, OFFICE_S, NEAREST 16,588 df1 6 df2 Sig. F Change 93 ,000

Similar to linear regression, coefficient of determination

R 2

is interpreted 21

Coefficient of Determination

From the printout, R 2 = 0.517

That is, 51.7% of the variation in operating margin is explained by the six independent variables. And 48.3% of it remains unexplained.

Adjusted

R

2  1  

SSE

(

y i

 /(

n

y

) 2

k

 1 ) /(

n

 1 ) When adjusted for degrees of freedom, Adjusted R 2 = 48.6

22

For linear relationship between dependent and independent variable(s) Earlier:  1  0 Now: Multiple regression: Test overall validity of the model 23

Testing the Validity of the Model

• We pose the question: Is there at least one independent variable linearly related to the dependent variable? • To answer the question we test the hypothesis H 0 :  0 =  1 =  2 = … =  k =0 H 1 : At least one  i zero.

is not equal to • If at least one  i is not equal to zero, the model has some validity. 24

Make an overall test of the model

Model:

Y =  0 +  1 X where  1 +  2 X 2 + …+  6 X 6 ~N(0, s 2 ) + 

Hypothesis:

H 0 :  0 =  1 =  2 H 1 : Not all  i =0 = … =  k =0 25

Total variation in Y =SST= SST=SSR+SSE SSR=explained variation in Y  (

y i

y

) 2 SSE=unexplained varaition in Y Bigger the SSR relative to SSE (R 2 is high) better the model

Test statistic:

F

SSE SSR

(

n

k k

 1 ) ~

F

(

k

,

n

k

 1 ) if H 0 is true.

(Degrees of freedom

k

=6

n-k-1

=93)

Level of significance:

Let α=0.05

26

Rejection area:

Reject H 0 if F obs >F crit F 0.05,6,93 ≈ 3

Observation:

F obs

Conclusion: Interpretation:

= 27

We can have k number of T-tests for each co-efficient

Test

1 

Test

2 

H

0 1

H

0 1 :  :  1  2  0

H

1 1 0

H

1 1 :  1  :  2  0 ; 0 ; .......

Test k

 .....

F-test in the analysis of variance combines all these T tests into one test It has lesser probability of Type 1 error than in the case of conducting multiple T tests 28

ANOVA b Model Sum of Squares df Mean Square F Sig.

1 Regression Residual Total 3075,583 2873,875 5949,458 6 93 99 512,597 30,902 16,588 ,000 a a. Predictors: (Constant), DISTANCE, INCOME, NUMBER, ENROLLME, OFFICE_S, NEAREST b. Dependent Variable: MARGIN

29