Chap. 12: Multiple Regression

Download Report

Transcript Chap. 12: Multiple Regression

© 2001 Prentice-Hall, Inc.
Statistics for Business and
Economics
Multiple Regression and Model
Building
Chapter 11
11 - 1
Learning Objectives
© 2001 Prentice-Hall, Inc.
1. Explain the Linear Multiple Regression Model
2. Test Overall Significance
3. Describe Various Types of Models
4. Evaluate Portions of a Regression Model
5. Interpret Linear Multiple Regression Computer
Output
6. Describe Stepwise Regression
7. Explain Residual Analysis
8. Describe Regression Pitfalls
11 - 2
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Simple
Linear
11 - 3
NonLinear
Linear
NonLinear
Regression Modeling
Steps
© 2001 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 4
Regression Modeling
Steps
© 2001 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 5
© 2001 Prentice-Hall, Inc.
Linear Multiple Regression
Model
Hypothesizing the
Deterministic Component
11 - 6
Regression Modeling
Steps
© 2001 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of Random
Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 7
© 2001 Prentice-Hall, Inc.
Linear Multiple
Regression Model
1. Relationship between 1 dependent &
2 or more independent variables is a
linear function
Population
Y-intercept
Population
slopes
Random
error
Yi   0   1X 1i   2 X 2i   k X ki   i
Dependent
(response)
variable
11 - 8
Independent
(explanatory)
variables
© 2001 Prentice-Hall, Inc.
Population Multiple
Regression Model
Bivariate model
Y
Response
Plane
X1
Yi =  0 +  1X1i +  2X2i +  i
(Observed Y)
0
i
X2
(X1i,X2i)
E(Y) =  0 +  1X1i +  2X2i
11 - 9
© 2001 Prentice-Hall, Inc.
Sample Multiple
Regression Model
Bivariate model
Y
Response
Plane
X1
Yi = ^0 + ^1X1i + ^2X2i + ^i
(Observed Y)
^

0
^
i
X2
(X1i,X2i)
^ ^
Yi =  0 + ^1X1i + ^2X2i
11 - 10
© 2001 Prentice-Hall, Inc.
Parameter Estimation
11 - 11
Regression Modeling
Steps
© 2001 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 12
© 2001 Prentice-Hall, Inc.
Multiple Linear
Regression Equations
Too
complicated
by hand!
11 - 13
Ouch!
© 2001 Prentice-Hall, Inc.
11 - 14
Interpretation of
Estimated Coefficients
Interpretation of
Estimated Coefficients
© 2001 Prentice-Hall, Inc.
^
1. Slope (k)

^
Estimated Y Changes by k for Each 1
Unit Increase in Xk Holding All Other
Variables Constant
^1 = 2, then Sales (Y) Is Expected
 Example: If 
to Increase by 2 for Each 1 Unit Increase in
Advertising (X1) Given the Number of Sales
Rep’s (X2)
11 - 15
Interpretation of
Estimated Coefficients
© 2001 Prentice-Hall, Inc.
^
1. Slope (k)

^
Estimated Y Changes by k for Each 1
Unit Increase in Xk Holding All Other
Variables Constant
^1 = 2, then Sales (Y) Is Expected
 Example: If 
to Increase by 2 for Each 1 Unit Increase in
Advertising (X1) Given the Number of Sales
Rep’s (X2)
^
2. Y-Intercept (0)

Average Value of Y When Xk = 0
11 - 16
© 2001 Prentice-Hall, Inc.
Parameter Estimation
Example
You work in advertising for
the New York Times. You
want to find the effect of
ad size (sq. in.) &
newspaper circulation
(000) on the number of ad
responses (00).
11 - 17
You’ve collected the
following data:
Resp Size Circ
1
1
2
4
8
8
1
3
1
3
5
7
2
6
4
4
10
6
Parameter Estimation
Computer Output
© 2001 Prentice-Hall, Inc.
^P
Parameter
Variable DF Estimate
INTERCEP 1
0.0640
ADSIZE
1
0.2049
CIRC
1
0.2805
Parameter Estimates
Standard T for H0:
Error Param=0 Prob>|T|
0.2599 0.246
0.8214
0.0588 3.656
0.0399
0.0686 4.089
0.0264
^0
^1
11 - 18
^2
© 2001 Prentice-Hall, Inc.
11 - 19
Interpretation of
Coefficients Solution
Interpretation of
Coefficients Solution
© 2001 Prentice-Hall, Inc.
^
1. Slope (1)

# Responses to Ad Is Expected to Increase
by .2049 (20.49) for Each 1 Sq. In. Increase
in Ad Size Holding Circulation Constant
11 - 20
Interpretation of
Coefficients Solution
© 2001 Prentice-Hall, Inc.
^
1. Slope (1)

# Responses to Ad Is Expected to Increase
by .2049 (20.49) for Each 1 Sq. In. Increase
in Ad Size Holding Circulation Constant
^
2. Slope (2)

# Responses to Ad Is Expected to Increase
by .2805 (28.05) for Each 1 Unit (1,000)
Increase in Circulation Holding Ad Size
Constant
11 - 21
© 2001 Prentice-Hall, Inc.
Evaluating the Model
11 - 22
Regression Modeling
Steps
© 2001 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
11 - 23
© 2001 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
11 - 24
© 2001 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
11 - 25
© 2001 Prentice-Hall, Inc.
Variation Measures
11 - 26
© 2001 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
11 - 27
© 2001 Prentice-Hall, Inc.
Coefficient of
Multiple Determination
1. Proportion of Variation in Y ‘Explained’
by All X Variables Taken Together
R2 = Explained Variation = SSR
Total Variation
SSyy
2. Never Decreases When New X Variable
Is Added to Model


Only Y Values Determine SSyy
Disadvantage When Comparing Models
11 - 28
© 2001 Prentice-Hall, Inc.
Testing Parameters
11 - 29
© 2001 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
11 - 30
Testing Overall
Significance
© 2001 Prentice-Hall, Inc.
1. Shows If There Is a Linear Relationship
Between All X Variables Together & Y
2. Uses F Test Statistic
3. Hypotheses

H0: 1 = 2 = ... = k = 0


No Linear Relationship
Ha: At Least One Coefficient Is Not 0

11 - 31
At Least One X Variable Affects Y
Testing Overall Significance
Computer Output
© 2001 Prentice-Hall, Inc.
Analysis of Variance
Source DF
Model
2
Error
3
C Total 5
k
n - k -1
n-1
11 - 32
Sum of
Squares
9.2497
0.2503
9.5000
Mean
Square
4.6249
0.0834
F Value
55.440
Prob>F
0.0043
MS(Model)
MS(Error)
P-Value
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 33
© 2001 Prentice-Hall, Inc.
Models With a Single
Quantitative Variable
11 - 34
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 35
© 2001 Prentice-Hall, Inc.
11 - 36
First-Order Model With
1 Independent Variable
© 2001 Prentice-Hall, Inc.
First-Order Model With
1 Independent Variable
1. Relationship Between 1 Dependent &
1 Independent Variable Is Linear
E (Y )   0   1X 1i
11 - 37
© 2001 Prentice-Hall, Inc.
First-Order Model With
1 Independent Variable
1. Relationship Between 1 Dependent &
1 Independent Variable Is Linear
E (Y )   0   1X 1i
2. Used When Expected Rate of Change
in Y Per Unit Change in X Is Stable
11 - 38
© 2001 Prentice-Hall, Inc.
First-Order Model With
1 Independent Variable
1. Relationship Between 1 Dependent &
1 Independent Variable Is Linear
E (Y )   0   1X 1i
2. Used When Expected Rate of Change in
Y Per Unit Change in X Is Stable
3. Used With Curvilinear Relationships If
Relevant Range Is Linear
11 - 39
First-Order Model
Relationships
© 2001 Prentice-Hall, Inc.
E (Y )   0   1X 1i
Y
1 > 0
Y 1 < 0
X1
11 - 40
X1
© 2001 Prentice-Hall, Inc.
First-Order Model
Worksheet
Case, i
Yi
X1i
2
X1i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
1
64
9
25
:
Run regression with Y, X1
11 - 41
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 42
© 2001 Prentice-Hall, Inc.
Second-Order Model With
1 Independent Variable
1. Relationship Between 1 Dependent & 1
Independent Variables Is a Quadratic
Function
2. Useful 1St Model If Non-Linear
Relationship Suspected
11 - 43
© 2001 Prentice-Hall, Inc.
Second-Order Model With
1 Independent Variable
1. Relationship Between 1 Dependent & 1
Independent Variables Is a Quadratic
Function
2. Useful 1St Model If Non-Linear
Relationship Suspected
Curvilinear
3. Model
effect
E (Y )   0   1X 1i 
2
 2 X 1i
Linear effect
11 - 44
Second-Order Model
Relationships
© 2001 Prentice-Hall, Inc.
Y
2 > 0
Y 2 > 0
X1
Y
2 < 0
Y 2 < 0
X1
11 - 45
X1
X1
© 2001 Prentice-Hall, Inc.
Second-Order Model
Worksheet
Case, i
Yi
X1i
2
X1i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
1
64
9
25
:
Create X12 column.
Run regression with Y, X1, X12.
11 - 46
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 47
© 2001 Prentice-Hall, Inc.
Third-Order Model With
1 Independent Variable
1. Relationship Between 1 Dependent & 1
Independent Variable Has a ‘Wave’
2. Used If 1 Reversal in Curvature
11 - 48
© 2001 Prentice-Hall, Inc.
Third-Order Model With
1 Independent Variable
1. Relationship Between 1 Dependent & 1
Independent Variable Has a ‘Wave’
2. Used If 1 Reversal in Curvature
3. Model
E (Y )   0   1X 1i   2 X 12i   3 X 13i
Linear effect
11 - 49
Curvilinear
effects
© 2001 Prentice-Hall, Inc.
Third-Order Model
Relationships
E (Y )   0   1X 1i 
Y
3 > 0
Y
X1
11 - 50
2
 2 X 1i

3
 3 X 1i
3 < 0
X1
Third-Order Model
Worksheet
© 2001 Prentice-Hall, Inc.
Case, i
Yi
X1i
X1i2
X1i3
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
1
64
9
25
:
1
512
27
125
:
Multiply X1 by X1 to get X12.
Multiply X1 by X1 by X1 to get X13.
Run regression with Y, X1, X12 , X13.
11 - 51
© 2001 Prentice-Hall, Inc.
Models With Two or More
Quantitative Variables
11 - 52
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 53
© 2001 Prentice-Hall, Inc.
First-Order Model With
2 Independent Variables
1. Relationship Between 1 Dependent &
2 Independent Variables Is a Linear
Function
2. Assumes No Interaction Between X1 & X2

Effect of X1 on E(Y) Is the Same Regardless
of X2 Values
11 - 54
© 2001 Prentice-Hall, Inc.
First-Order Model With
2 Independent Variables
1. Relationship Between 1 Dependent &
2 Independent Variables Is a Linear
Function
2. Assumes No Interaction Between X1 & X2

Effect of X1 on E(Y) Is the Same Regardless
of X2 Values
3. Model
E (Y )   0   1X 1i   2 X 2i
11 - 55
No Interaction
© 2001 Prentice-Hall, Inc.
11 - 56
No Interaction
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2
12
8
4
0
X1
0
0.5
11 - 57
1
1.5
No Interaction
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2
12
8
4
E(Y) = 1 + 2X1 + 3(0) = 1 + 2X1
0
X1
0
0.5
11 - 58
1
1.5
No Interaction
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2
12
8
E(Y) = 1 + 2X1 + 3(1) = 4 + 2X1
4
E(Y) = 1 + 2X1 + 3(0) = 1 + 2X1
0
X1
0
0.5
11 - 59
1
1.5
No Interaction
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2
12
E(Y) = 1 + 2X1 + 3(2) = 7 + 2X1
8
E(Y) = 1 + 2X1 + 3(1) = 4 + 2X1
4
E(Y) = 1 + 2X1 + 3(0) = 1 + 2X1
0
X1
0
0.5
11 - 60
1
1.5
No Interaction
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2
E(Y) = 1 + 2X1 + 3(3) = 10 + 2X1
12
E(Y) = 1 + 2X1 + 3(2) = 7 + 2X1
8
E(Y) = 1 + 2X1 + 3(1) = 4 + 2X1
4
E(Y) = 1 + 2X1 + 3(0) = 1 + 2X1
0
X1
0
0.5
11 - 61
1
1.5
No Interaction
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2
E(Y) = 1 + 2X1 + 3(3) = 10 + 2X1
12
E(Y) = 1 + 2X1 + 3(2) = 7 + 2X1
8
E(Y) = 1 + 2X1 + 3(1) = 4 + 2X1
4
E(Y) = 1 + 2X1 + 3(0) = 1 + 2X1
0
X1
0
0.5
1
1.5
Effect (slope) of X1 on E(Y) does not depend on X2 value
11 - 62
First-Order Model
Relationships
© 2001 Prentice-Hall, Inc.
Y
Response
Surface
X1
11 - 63
0
X2
First-Order Model
Worksheet
© 2001 Prentice-Hall, Inc.
Case, i
Yi
X1i
X2i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
3
5
2
6
:
Run regression with Y, X1, X2
11 - 64
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 65
© 2001 Prentice-Hall, Inc.
Interaction Model With
2 Independent Variables
1. Hypothesizes Interaction Between Pairs
of X Variables

Response to One X Variable Varies at
Different Levels of Another X Variable
11 - 66
© 2001 Prentice-Hall, Inc.
Interaction Model With
2 Independent Variables
1. Hypothesizes Interaction Between Pairs
of X Variables

Response to One X Variable Varies at
Different Levels of Another X Variable
2. Contains Two-Way Cross Product Terms
E (Y )   0   1X 1i   2 X 2i   3 X 1i X 2i
11 - 67
© 2001 Prentice-Hall, Inc.
Interaction Model With
2 Independent Variables
1. Hypothesizes Interaction Between Pairs
of X Variables

Response to One X Variable Varies at
Different Levels of Another X Variable
2. Contains Two-Way Cross Product Terms
E (Y )   0   1X 1i   2 X 2i   3 X 1i X 2i
3. Can Be Combined With Other Models

Example: Dummy-Variable Model
11 - 68
Effect of Interaction
© 2001 Prentice-Hall, Inc.
11 - 69
Effect of Interaction
© 2001 Prentice-Hall, Inc.
1. Given:
E (Y )   0   1X 1i   2 X 2i   3 X 1i X 2i
11 - 70
Effect of Interaction
© 2001 Prentice-Hall, Inc.
1. Given:
E (Y )   0   1X 1i   2 X 2i   3 X 1i X 2i
2. Without Interaction Term, Effect of X1
on Y Is Measured by 1
11 - 71
Effect of Interaction
© 2001 Prentice-Hall, Inc.
1. Given:
E (Y )   0   1X 1i   2 X 2i   3 X 1i X 2i
2. Without Interaction Term, Effect of X1
on Y Is Measured by 1
3. With Interaction Term, Effect of X1 on
Y Is Measured by 1 + 3X2

Effect Increases As X2i Increases
11 - 72
© 2001 Prentice-Hall, Inc.
11 - 73
Interaction Model
Relationships
Interaction Model
Relationships
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2 + 4X1X2
12
8
4
0
X1
0
11 - 74
0.5
1
1.5
Interaction Model
Relationships
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2 + 4X1X2
12
8
E(Y) = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
X1
0
11 - 75
0.5
1
1.5
Interaction Model
Relationships
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2 + 4X1X2
E(Y) = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
E(Y) = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
X1
0
11 - 76
0.5
1
1.5
Interaction Model
Relationships
© 2001 Prentice-Hall, Inc.
E(Y)
E(Y) = 1 + 2X1 + 3X2 + 4X1X2
E(Y) = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
E(Y) = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
X1
0
0.5
1
1.5
Effect (slope) of X1 on E(Y) does depend on X2 value
11 - 77
Interaction Model
Worksheet
© 2001 Prentice-Hall, Inc.
Case, i
Yi
X1i
X2i
X1i X2i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
3
5
2
6
:
3
40
6
30
:
Multiply X1 by X2 to get X1X2.
Run regression with Y, X1, X2 , X1X2
11 - 78
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 79
© 2001 Prentice-Hall, Inc.
Second-Order Model With
2 Independent Variables
1. Relationship Between 1 Dependent &
2 or More Independent Variables Is a
Quadratic Function
2. Useful 1St Model If Non-Linear
Relationship Suspected
11 - 80
© 2001 Prentice-Hall, Inc.
Second-Order Model With
2 Independent Variables
1. Relationship Between 1 Dependent &
2 or More Independent Variables Is a
Quadratic Function
2. Useful 1St Model If Non-Linear
Relationship Suspected
3. Model
E (Y )   0   1X 1i   2 X 2i   3 X 1i X 2i
  4 X 12i   5 X 22i
11 - 81
© 2001 Prentice-Hall, Inc.
Y
Second-Order Model
Relationships
4 + 5 > 0
X2
X1
Y
X1
11 - 82
 32 > 4  4  5
X2
Y
4 + 5 < 0
X2
X1
E (Y )   0   1X 1i   2 X 2i
  3 X 1i X 2i
2
  4 X 1i

2
 5 X 2i
© 2001 Prentice-Hall, Inc.
Second-Order Model
Worksheet
Case, i
Yi
X1i
X2i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
3
5
2
6
:
X1i X2i X1i2
3
40
6
30
:
1
64
9
25
:
X2i 2
9
25
4
36
:
Multiply X1 by X2 to get X1X2; then X12, X22.
Run regression with Y, X1, X2 , X1X2, X12, X22.
11 - 83
© 2001 Prentice-Hall, Inc.
Models With One Qualitative
Independent Variable
11 - 84
© 2001 Prentice-Hall, Inc.
Types of
Regression Models
Explanatory
Variable
1
Quantitative
Variable
2 or More
Quantitative
Variables
1
Qualitative
Variable
1st
2nd
3rd
Order Order Order
Model Model Model
1st
Inter- 2nd
Order Action Order
Model Model Model
Dummy
Variable
Model
11 - 85
Dummy-Variable Model
© 2001 Prentice-Hall, Inc.
1. Involves Categorical X Variable With
2 Levels

e.g., Male-Female; College-No College
2. Variable Levels Coded 0 & 1
3. Number of Dummy Variables Is 1 Less
Than Number of Levels of Variable
4. May Be Combined With Quantitative
Variable (1st Order or 2nd Order Model)
11 - 86
© 2001 Prentice-Hall, Inc.
Dummy-Variable Model
Worksheet
Case, i
Yi
X1i
X2i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
1
0
1
1
:
X2 levels: 0 = Group 1; 1 = Group 2.
Run regression with Y, X1, X2
11 - 87
© 2001 Prentice-Hall, Inc.
Interpreting DummyVariable Model Equation
11 - 88
© 2001 Prentice-Hall, Inc.
Interpreting DummyVariable Model Equation




Given: Yi   0   1X 1i   2 X 2i
Y  Starting salary of college grad's
X 1  GPA
0 if Male
X2 
1 if Female
11 - 89
© 2001 Prentice-Hall, Inc.
Interpreting DummyVariable Model Equation




Given: Yi   0   1X 1i   2 X 2i
Y  Starting salary of college grad's
X 1  GPA
0 if Male
X2 
1 if Female
Males ( X 2  0):






Yi   0   1X 1i   2 (0)   0   1X 1i
11 - 90
© 2001 Prentice-Hall, Inc.
Interpreting DummyVariable Model Equation
Given: Yi   0   1X 1i   2 X 2i
Y  Starting salary of college grad's
X 1  GPA
0 if Male
X2 
1 if Female
Same slopes
Males ( X 2  0):
Yi   0   1X 1i   2 (0)   0   1X 1i
Females (X 2  1):
Yi   0   1X 1i   2 (1)  ( 0  2 )  1X 1i
11 - 91
© 2001 Prentice-Hall, Inc.
Dummy-Variable Model
Relationships
Y
^
Same Slopes 
1
Females
^
0 + ^2
^
0
Males
0
0
11 - 92
X1
© 2001 Prentice-Hall, Inc.
Dummy-Variable Model
Example
11 - 93
© 2001 Prentice-Hall, Inc.
Dummy-Variable Model
Example
Computer Output: Yi  3  5 X 1i  7 X 2i
0 if Male
X2 
1 if Female
11 - 94
© 2001 Prentice-Hall, Inc.
Dummy-Variable Model
Example
Computer Output: Yi  3  5 X 1i  7 X 2i
0 if Male
X2 
1 if Female
Males ( X 2  0):
Yi  3  5 X 1i  7(0)  3  5 X 1i
11 - 95
© 2001 Prentice-Hall, Inc.
Dummy-Variable Model
Example
Computer Output: Yi  3  5 X 1i  7 X 2i
0 if Male
X2 
1 if Female
Males ( X 2  0):
Same slopes
Yi  3  5 X 1i  7(0)  3  5 X 1i
Females (X 2  1):
Yi  3  5 X 1i  7(1)  (3 + 7)  5 X 1i
11 - 96
© 2001 Prentice-Hall, Inc.
Testing Model Portions
11 - 97
Testing Model Portions
© 2001 Prentice-Hall, Inc.
1. Tests the Contribution of a Set of X
Variables to the Relationship With Y
2. Null Hypothesis H0: g+1 = ... = k = 0

Variables in Set Do Not Improve
Significantly the Model When All Other
Variables Are Included
3. Used in Selecting X Variables or Models
4. Part of Most Computer Programs
11 - 98
© 2001 Prentice-Hall, Inc.
Selecting Variables
in Model Building
11 - 99
© 2001 Prentice-Hall, Inc.
Selecting Variables in
Model Building
A Butterfly Flaps its Wings in Japan, Which
Causes It to Rain in Nebraska. -- Anonymous
Use Theory Only!
11 - 100
Use Computer Search!
Model Building with
Computer Searches
© 2001 Prentice-Hall, Inc.
1. Rule: Use as Few X Variables As Possible
2. Stepwise Regression


Computer Selects X Variable Most Highly
Correlated With Y
Continues to Add or Remove Variables
Depending on SSE
3. Best Subset Approach

Computer Examines All Possible Sets
11 - 101
© 2001 Prentice-Hall, Inc.
Residual Analysis
11 - 102
© 2001 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
11 - 103
Residual Analysis
© 2001 Prentice-Hall, Inc.
1. Graphical Analysis of Residuals

Plot Estimated Errors vs. Xi Values
Difference Between Actual Yi & Predicted Yi
 Estimated Errors Are Called Residuals


Plot Histogram or Stem-&-Leaf of Residuals
2. Purposes


Examine Functional Form (Linear vs.
Non-Linear Model)
Evaluate Violations of Assumptions
11 - 104
© 2001 Prentice-Hall, Inc.
Linear Regression
Assumptions
1. Mean of Probability Distribution of Error
Is 0
2. Probability Distribution of Error Has
Constant Variance
3. Probability Distribution of Error is
Normal
4. Errors Are Independent
11 - 105
© 2001 Prentice-Hall, Inc.
Residual Plot
for Functional Form
Add X2 Term
Correct Specification
^
e
^
e
X
11 - 106
X
© 2001 Prentice-Hall, Inc.
Residual Plot
for Equal Variance
Unequal Variance
SR
Correct Specification
SR
X
Fan-shaped.
Standardized residuals used typically.
11 - 107
X
© 2001 Prentice-Hall, Inc.
Residual Plot
for Independence
Not Independent
Correct Specification
SR
SR
X
Plots reflect sequence data were collected.
11 - 108
X
© 2001 Prentice-Hall, Inc.
Residual Analysis
Computer Output
Dep Var Predict
Student
Obs SALES
Value Residual Residual -2-1-0 1 2
1 1.0000 0.6000
0.4000
1.044 |
|**
2 1.0000 1.3000 -0.3000
-0.592 |
*|
3 2.0000 2.0000
0
0.000 |
|
4 2.0000 2.7000 -0.7000
-1.382 |
**|
5 4.0000 3.4000
0.6000
1.567 |
|***
Plot of standardized
(student) residuals
11 - 109
|
|
|
|
|
© 2001 Prentice-Hall, Inc.
Regression Pitfalls
11 - 110
© 2001 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
11 - 111
Multicollinearity
© 2001 Prentice-Hall, Inc.
1. High Correlation Between X Variables
2. Coefficients Measure Combined Effect
3. Leads to Unstable Coefficients
Depending on X Variables in Model
4. Always Exists -- Matter of Degree
5. Example: Using Both Age & Height as
Explanatory Variables in Same Model
11 - 112
Detecting
Multicollinearity
© 2001 Prentice-Hall, Inc.
1. Examine Correlation Matrix

Correlations Between Pairs of X Variables
Are More than With Y Variable
2. Examine Variance Inflation Factor (VIF)

If VIFj > 5, Multicollinearity Exists
3. Few Remedies


Obtain New Sample Data
Eliminate One Correlated X Variable
11 - 113
© 2001 Prentice-Hall, Inc.
Correlation Matrix
Computer Output
Correlation Analysis
Pearson Corr Coeff /Prob>|R| under HO:Rho=0/ N=6
RESPONSE
1.00000
0.0
ADSIZE
0.90932
0.0120
CIRC
0.93117
0.0069
ADSIZE
0.90932
0.0120
1.00000
0.0
0.74118
0.0918
CIRC
0.93117
0.0069
0.74118
0.0918
1.00000
0.0
RESPONSE
rY1
11 - 114
rY2
r12
All 1’s
© 2001 Prentice-Hall, Inc.
Variance Inflation Factors
Computer Output
Parameter Standard T for H0:
Variable DF Estimate
Error Param=0 Prob>|T|
INTERCEP 1
0.0640
0.2599 0.246
0.8214
ADSIZE
1
0.2049
0.0588 3.656
0.0399
CIRC
1
0.2805
0.0686 4.089
0.0264
Variable DF
INTERCEP 1
ADSIZE
1
CIRC
1
11 - 115
Variance
Inflation
0.0000
2.2190
2.2190
VIF1  5
Extrapolation
© 2001 Prentice-Hall, Inc.
Y
Interpolation
Extrapolation
Extrapolation
Relevant Range
11 - 116
X
Cause & Effect
© 2001 Prentice-Hall, Inc.
Liquor
Consumption
# Teachers
11 - 117
Conclusion
© 2001 Prentice-Hall, Inc.
1. Explained the Linear Multiple Regression Model
2. Tested Overall Significance
3. Described Various Types of Models
4. Evaluated Portions of a Regression Model
5. Interpreted Linear Multiple Regression
Computer Output
6. Described Stepwise Regression
7. Explained Residual Analysis
8. Described Regression Pitfalls
11 - 118
End of Chapter
Any blank slides that follow are
blank intentionally.