Transcript Chapter 18

Procedure for Regression Diagnostics • Develop a model that has a theoretical basis.

• Gather data for the two variables in the model.

• Draw the scatter diagram to determine whether a linear model appears to be appropriate.

• Determine the regression equation.

• Check the required conditions for the errors.

• Check the existence of outliers and influential observations • Assess the model fit.

• If the model fits the data, use the regression equation.

Chapter 18

Earlier: Y X + randomness

Y

  0   1

X

 

• In this chapter we extend the simple linear regression model, and allow for any number of independent variables.

• We expect to build a model that fits the data better than the simple linear regression model.

Now: Two or more variables

?

?

X

1

X

3 ?

X

2

X Y

6

X

8 ?

X

4

X

7

X

5 ?

Model: Y =  0 +  1 X 1 +  2 X 2 + …+  k X k +  Y – dependent variable X 1 , X 2 , …, X k -

k

independent variables

Motive:

• More ”realistic” models • Better predictions • Separate effects of different variables

Y

return

X 1 X 2 X 3

share price interest inflation

y The simple linear regression model allows for one independent variable, “x” y =  0 +  1 x +  X

y X 2 X 1 The multiple linear regression model allows for more than one independent variable.

Y =  0 +  1 x 1 +  2 x 2 + 

Note how the straight line becomes a plain, and...

y X 1 The multiple linear regression model allows for more than one independent variable.

Y =  0 +  1 x 1 +  2 x 2 +  X 2

Required conditions for the error variable

• The error  is normally distributed.

• The mean is equal to zero and the standard deviation is constant ( s  ) for all values of y. • The errors are independent.

Earlier:

Procedure for Regression Diagnostics • …… • …….

• …....

• Determine the regression equation.

• Check the required conditions for the errors.

• …….

• …… • If the model fits the data, use the regression equation.

Now

Estimate  0 ,  1 ,  2 ,…  k with the estimates b 0 , b 1 , b 2 , …, b k using the LS-method.

Estimating the Coefficients and Assessing the Model • The procedure used to perform regression analysis: – Obtain the model coefficients and statistics using a statistical software.

– Diagnose violations of required conditions. Try to remedy problems when identified.

– Assess the model fit using statistics obtained from the sample.

– If the model assessment indicates good fit to the data, use it to interpret the coefficients and generate predictions.

Estimating the Coefficients and Assessing the Model, Example •

Example: Where to locate a new motor inn?

– La Quinta Motor Inns is planning an expansion.

– Management wishes to predict which sites are likely to be profitable.

– Several areas where predictors of profitability can be identified are: • Competition • Market awareness • Demand generators • Demographics • Physical quality

Profitability Margin Competition Market awareness Customers Community Physical Rooms Number of hotels/motels rooms within 3 miles from the site.

Nearest Distance to the nearest La Quinta inn.

Office space College enrollment Income Median household income.

Disttwn Distance to downtown.

Data were collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model:

Margin =

 0  3

Office

6 Disttwn

  1

Rooms

  2

Nearest

   4

College +

5 Income +

Margin 55.5

33.8

49 31.9

57.4

49 Number 3203 2810 2890 3422 2687 3759 Nearest 4.2

2.8

2.4

3.3

0.9

2.9

Office Space 549 496 254 434 678 635 Enrollment 8 17.5

20 15.5

15.5

19 Income 37 35 35 38 42 33 Distance 2.7

14.4

2.6

12.1

6.9

10.8

Regression Analysis, SPSS Output

Margin = 38.14 - 0.0076Number +1.65Nearest

+ 0.020Office Space +0.21Enrollment

+ 0.41Income - 0.23Distance

Model 1 (Cons tant) NUMBER NEAREST OFFICE_S ENROLLME INCOME DISTANCE Uns tandardized Coefficients B 38,139 -7,62E-03 Std. Error 6,993 ,001 1,646 1,977E-02 ,212 ,413 -,225 a. Dependent Variable: MARGIN

Coefficients a

,633 ,003 ,133 ,140 ,179 Standardized Coefficients Beta -,440 ,188 ,422 ,115 ,216 -,091 t 5,454 -6,069 2,601 5,796 1,587 2,960 -1,260 Sig.

,000 ,000 ,011 ,000 ,116 ,004 ,211

Model Assessment

• The model is assessed using three tools: – The standard error of estimate – The coefficient of determination – The F-test of the analysis of variance • The standard error of estimates participates in building the other tools.

Standard Error of Estimate

• The standard deviation of the error is estimated by the

Standard Error of

Estimate:

s   n SSE  k  1 • The magnitude of s  comparing it to y .

is judged by

Model Summary

Model 1 R ,725 a R Square ,525 Adjus ted R Square ,494 Std. Error of the Es timate a. Predictors : (Constant), DISTANCE, OFFICE_S, ENROLLME, NEAREST, NUMBER, INCOME 5,51208

Coefficient of Determination

From the printout, R 2 = 0.5251

52.51% of the variation in operating margin is explained by the six independent variables. 47.49% remains unexplained.

When adjusted for degrees of freedom, Adjusted R 2 = 1-[SSE/(n-k-1)] / [SS(Total)/(n-1) = 49.44%

Testing the Validity of the Model

• We pose the question: Is there at least one independent variable linearly related to the dependent variable? • To answer the question we test the hypothesis H 0 :  0 =  1 =  2 = … =  k H 1 : At least one  i zero.

is not equal to • If at least one  i is not equal to zero, the model has some validity.

ANOVA b

Model 1 Regress ion Res idual Total Sum of Squares 3123,832 2825,626 5949,458 df 6 93 99 Mean Square 520,639 30,383 F 17,136 Sig.

,000 a. Predictors : (Constant), DISTANCE, OFFICE_S, ENROLLME, NEAREST, NUMBER, INCOME b. Dependent Variable: MARGIN a

Make an overall test of the model

Model:

Y =  0 +  1 X where  1 +  2 X 2 + …+  6 X 6 ~N(0, s 2 ) + 

Hypothesis:

H 0 :  0 =  1 =  2 H 1 : Not all  i =0 = … =  k

Test statistic:

F

SSR k SSE

(

n

k

 1 ) ~

F

(

k

,

n

k

 1 ) if H 0 is true.

(Degrees of freedom

k

=6

n-k-1

=93)

Level of significance:

Let α=0.05

Rejection area:

Reject H 0 if F obs >F crit F 0.05,6,93 ≈ 3

Observation:

F obs

Conclusion: Interpretation:

=