Multiple Regression Analysis

Download Report

Transcript Multiple Regression Analysis

Multiple Regression Analysis
The principles of Simple Regression Analysis can be extended to two or
more explanatory variables.
With two explanatory variables we get an equation
Y = α + β1X1 + β2X2. .
It is customary to write it as Y = β0 +β1X1 + β2X2
As an example, if a hypotensive agent is administered prior to surgery,
recovery time for blood pressure to normal value will depend on the dose of
the hypotensive and the blood pressure during surgery.
This can be modelled as Recovery time = log dose – Surgery B.P.
Categorical Explanatory Variables

Binary variables are coded 0, 1. For example a
binary variable x1(‘Gender’) is coded male = 0,
female = 1.
Recovery time for Blood Pressure
and dose of hypotensive
There are many outliers
because of individual
variability of subjects and
because of different types
of surgical operations.
Recovery time for Blood Pressure and dose of hypotensive
RecvTime = -14.2576 + 8.00772 Logdose
S = 14.7103
R-Sq = 15.5 %
R-Sq(adj) = 13.8 %
70
60
50
RecvTime
The scatter plot shows a
linear relationship. Blood
Pressure takes longer to
come back to normal value
the larger the dose of the
hypotensive.
40
30
20
Regression
10
95% CI
0
2.5
3.5
4.5
Logdose
5.5
6.5
Recovery time for Blood Pressure and
lowest Blood Pressure reading during
surgery
Recovery time for Blood Pressure and lowest B.P. reading during surgery
RecvTime = 34.4692 - 0.183546 Bpsurg
R-Sq = 0.8 %
R-Sq(adj) = 0.0 %
70
60
50
RecvTime
The lower the blood pressure
achieved during surgery the longer
the time for it to reach normal
value during recovery from
anaesthesia
S = 15.9386
40
30
20
Regression
10
95% CI
0
50
60
70
Bpsurg
80
90
Multiple Regression Analysis
The effects of the two explanatory variables acting jointly is described
by the equation
Recov. Time = 22.3 + 10.6 Log dose – 0.740 Surg. B.P.
As noted on the scatter plots several observations had outliers or
larger than expected X values.
Categorical Explanatory Variables
Binary variables are coded 0, 1. For example a variable x1 (Gender) is coded
male = 0 female = 1. Then in the regression equation
Y = β0 + β1x1 + β2x2 when x1 = 1 the value of Y indicates what is obtained for
female gender; and when x1 = 0 the value of Y indicates what is obtained for
males.
If we have a nominal variable with more than two categories we have to create a
number of new dummy (also called indicator) binary variables

How many Explanatory Variables?

As a rule of thumb multiple regression analysis
should not be performed if the total number of
variables is greater than the number of
subjects ÷ 10.
Analysis
In the computer output look for:



Adjusted R2. It represents the proportion of variability of Y
explained by the X’s. R2 is adjusted so that models with different
number of variables can be compared.
The F-test in the ANOVA table. Significant F indicates a linear
relationship between Y and at least one of the X’s.
The t-test of each partial regression coefficient. Significant t
indicates that the variable in question influences the Y response
while controlling for other explanatory variables.
Usefulness of Scatter Plots - I

The scatter plot on the right
illustrates the relationship
between water hardness and
mortality in 61 large towns in
England and Wales.
The regression line indicates
inverse relationship between
water hardness and mortality
rates.
Motality and Water Hardness
Mortal = 1676.36 - 3.22609 Calcium
S = 143.029
R-Sq = 42.9 %
R-Sq(adj) = 41.9 %
2000
Mortal

1500
Regression
95% CI
1000
0
20
40
60
80
Calcium
100
120
140
Usefulness of Scatter Plots - II
Motality and Water Hardness in Towns in the North
2000
Motality and Water Hardness in Towns in the South
MortalN = 1692.31 - 1.93134 CalciumN
100
MortalS = 1522.82 - 2.09272 CalciumS
S = 129.209
R-Sq = 13.6 %
R-Sq(adj) = 11.0 %
S = 114.297
R-Sq = 36.3 %
R-Sq(adj) = 33.6 %
1600
MortalN
1800
1700
0
1600
1500
1400
0


East
West
North
50
10
1st
Qtr
20
30
40
3rd
Qtr
50
60
CalciumN
Regression
1500
MortalS
1900
1400
1300
1200
Regression
95% CI
95% CI
1100
70
80
90
100
0
20
40
60
80
100
120
140
CalciumS
The inverse relationship between water hardness is till
maintained. But
For towns in the North the regression line is less steep than for
towns in the South indicating that other causes of mortality are
stronger in the North compared to the South.