Multiple Regression Models

Download Report

Transcript Multiple Regression Models

Multiple Regression
Models
The Multiple
Regression Model
The relationship between one dependent & two or
more independent variables is a linear function
Population
Y-intercept
Population slopes
Random
Error
Yi      X 1i    X 2i 
  p X pi   i
Yi  b0  b1 X1i  b2 X 2i 
 bp X pi  ei
Dependent (Response)
variable for sample
Independent (Explanatory)
variables for sample model
Multiple Regression
Model: Example
Develop a model for
estimating heating oil used
for a single family home in
the month of January
based on average
temperature and amount of
insulation in inches.
Oil (Gal) Temp(0F) Insulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10
Sample Multiple Regression
Model: Example
Yˆi  b0  b1 X 1i  b2 X 2i 
Excel Output Intercept
X Variable 1
X Variable 2
 bp X pi
Coefficients
562.1510092
-5.436580588
-20.01232067
Yˆi  562.151  5.437 X1i  20.012 X 2i
For each degree increase in
temperature, the estimated average
amount of heating oil used is
decreased by 5.437 gallons, holding
insulation constant.
For each increase in one inch
of insulation, the estimated
average use of heating oil is
decreased by 20.012 gallons,
holding temperature constant.
Interpretation of Estimated
Coefficients
Slope (bi)
The average Y changes by bi each time
Xi is increased or decreased by 1 unit
holding all other variables constant.

For example: If b1 = -2, then fuel oil usage
(Y) is expected to decrease by an estimated
2 gallons for each 1 degree increase in
temperature (X1) given the inches of
insulation (X2).
Interpretation of Estimated
Coefficients
Intercept (b0)
The intercept (b0) is the
estimated average value of Y
when all Xi = 0.
Using The Model to
Make Predictions
Predict the amount of heating oil used for a
home if the average temperature is 300 and
the insulation is 6 inches.
ˆ
Yi  562.151  5.437 X1i  20.012 X 2i
 562.151  5.437  30  20.012  6
 278.969
The predicted heating
oil used is 278.97 gallons
Developing the Model
Checking for problems.
Being sure the model passes all
tests for model quality.
Identifying Problems
Do all the residual tests listed for simple
regression.
Check for multicolinearity.
Multicolinearity
•
•
•
This occurs when there is a high
correlation between the explanatory
variables.
This leads to unstable coefficients .
The VIF used to measure colinearity
(values exceeding 5 are not good and
exceeding 10 are a big problem):
VIF j 
1
1
2
Rj
,
2
R j = Coefficient of Multiple
Determination of Xj
with all the others
Is the fit to the data good?
Coefficient of Multiple
Determination
Excel Output
R e g r e ssi o n S ta ti sti c s
M u lt ip le R
0 .9 8 2 6 5 4 7 5 7
R S q u a re
0 .9 6 5 6 1 0 3 7 1
A d ju s t e d R S q u a re
0 .9 5 9 8 7 8 7 6 6
S t a n d a rd E rro r
2 6 .0 1 3 7 8 3 2 3
O b s e rva t io n s
15
r2
Adjusted r2
The r2 is adjusted
downward to reflect
small sample sizes.
Do the variables collectively
pass the test?
Testing for Overall
Significance
•Shows if there is a linear relationship between all of
the X variables taken together and Y
•Hypothesis:
H0: 1 = 2 = … = p = 0 (No linear relationships)
H1: At least one i  0 (At least one independent
variable effects Y)
Test for Overall Significance
Excel Output: Example
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
2 228014.6 114007.3 168.4712
1.65411E-09
12 8120.603 676.7169
14 236135.2
p = 2, the number of
explanatory variables
p value
n-1
MSR
MSE = F Test Statistic
Test for Overall Significance
H0: 1 = 2 = … = p = 0
H1: At least one I  0
 = .05
df = 2 and 12
Critical value(s):
Test Statistic:
F 
168.47
(Excel Output)
Decision:
Reject at  = 0.05
Conclusion:
 = 0.05
0
3.89
F
There is evidence that at
least one independent
variable affects Y.
Test for Significance:
Individual Variables
•Shows if there is a linear relationship between each
variable Xi and Y.
•Hypotheses:
H0: i = 0 (No linear relationship)
H1: i  0 (Linear relationship between Xi and Y)
T Test Statistic
Excel Output: Example
t Test Statistic for X1
(Temperature)
C o e ffi c i e n ts S ta n d a r d E r r o r
In te r c e p t
t S ta t
5 6 2 .1 5 1 0 0 9
2 1 .0 9 3 1 0 4 3 3
2 6 .6 5 0 9 4
X
V a ria b le
1
-5 .4 3 6 5 8 0 6
0 .3 3 6 2 1 6 1 6 7
-1 6 .1 6 9 9
X
V a ria b le
2
-2 0 .0 1 2 3 2 1
2 .3 4 2 5 0 5 2 2 7
-8 .5 4 3 1 3
t
bk
Sbk
t Test Statistic for X2
(Insulation)
t Test : Example Solution
Does temperature have a significant effect on monthly
consumption of heating oil? Test at  = 0.05.
H0: 1 = 0
h1: 1  0
Test Statistic:
t Test Statistic = -16.1699
Decision:
Reject H0 at  = 0.05
df = n-2 = 12
critical value(s):
Reject H0
Reject H0
.025
.025
-2.1788
0 2.1788
t
Conclusion:
There is evidence of a
significant effect of
temperature on oil
consumption.
Confidence Interval
Estimate For The Slope
Provide the 95% confidence interval for the population
slope 1 (the effect of temperature on oil consumption).
b1  t n  p  1Sb1
C o e ffi c i e n ts
I n te r c e p t
L o w er 95%
Up p er 95%
5 6 2 .1 5 1 0 0 9
5 1 6 .1 9 3 0 8 3 7
6 0 8 .1 0 8 9 3 5
X V a ria b le 1
-5 . 4 3 6 5 8 0 6
-6 . 1 6 9 1 3 2 6 7 3
-4 . 7 0 4 0 2 8 5
X V a ria b le 2
-2 0 . 0 1 2 3 2 1
-2 5 . 1 1 6 2 0 1 0 2
-1 4 . 9 0 8 4 4
-6.169  1  -4.704
The estimated average consumption of oil is reduced by
between 4.7 gallons to 6.17 gallons per each increase of 10 F.
Special Regression Topics
Dummy-variable Models
•
Create a categorical variable (dummy
variable) with 2 levels:
For example, yes and no or male and female.
 The date is coded as 0 or 1.

•
•
•
The coding makes the intercepts different.
This analysis assumes equal slopes.
The regression model has same form:
Y i   0   1 X 1 i   2 X 2 i       p X pi   i
Dummy-variable Models
Assumption
Given:
Yˆ i  b0  b1 X 1i  b2 X 2 i
Y = Assessed Value of House
X1 = Square footage of House
X2 = Desirability of Neighborhood =
0 if undesirable
1 if desirable
Desirable (X2 = 1)
Yˆ i  b0  b1 X 1i  b2 ( 1 )  ( b0  b2 )  b1 X 1i
Undesirable (X2 = 0)
Yˆ i  b0  b1 X 1i  b2 ( 0 )  b0  b1 X 1i
Same
slopes
Dummy-variable Models
Assumption
Y (Assessed Value)
Same
slopes
b0 + b2
Intercepts
different
b0
X1 (Square footage)
Interpretation of the Dummy
Variable Coefficient
For example:
Y i  b0  b1 X 1i  b2 X 2i 20  5 X 1i  6 X 2i
Y : Annual salary of college graduate in thousand $
X 1 : GPA
X 2:
0 Female
1 Male
This 6 is interpreted as given the same GPA, the male
college graduate is making an estimated 6 thousand
dollars more than female on average.