No Slide Title

Download Report

Transcript No Slide Title

Clinical Research Training Program 2021
MULTIPLE REGRESSION
Fall 2004
www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html
1
OUTLINE
Multiple (Multivariable) Regression Model
Estimation
ANOVA Table
Partial F tests
Multiple Partial F tests
2
MULTIPLE REGRESSION
Features
Simple linear regression
Multiple regression
(Multivariable regression)
Multivariate regression
One outcome variable;
One predictor
One outcome variable;
Several predictors
Several (correlated)
outcome variables
3
MULTIPLE REGRESSION
Model
y  β0  β1 x1      β p x p  ε
Suppose there are n subjects in the data set. For the
ith individual,
yi  β0  β1 x1i      β p x pi  εi
, where yi is the observed outcome; x1i, …, xpi are
predictors; and 0, 1, …, p are unknown regression
coefficients corresponding to x1, …, xp, respectively.
4
MULTIPLE REGRESSION
Model
yi  β0  β1 x1i      β p x pi  εi
Assumptions
 Independence – yi are independent of one another
 Linearity – E(yi) is a linear function of x1i, …, xpi. That is, E(yi |
x1i, …, xpi) = 0 + 1x1i + …+ pxpi.
 Homoscedasticity – var(yi) is the same for any fixed combination
of x1i, …, xpi. That is, var(yi| x1i, …, xpi) = 2.
 Normality – for any fixed combination of x1i, …, xpi, yi ~ N(0 +
1x1i + …+ pxpi, 2) or equivalently  ~ N(0, 2)
5
ESTIMATION
The Least-squares Method
The least-squares solution gives the best fitting
regression plane by minimizing the sum of squares of
the distances between the observed outcomes yi and the
corresponding expected outcome values
yˆ i  βˆ 0  βˆ1 x1i      βˆ p x pi . That is, the quantity
n
y
i 1
i
n
 yˆ i   
2
i 1

yi  βˆ 0  βˆ1 x1i      βˆ p x pi
is minimized to find the least-squares estimates
βˆ 0 , βˆ1 , ..., βˆ p of β0 , β1 , ..., β p .

2
6
y
•
•
•
•
•
The best-fitting
plane
•
•
•
•
•
•
•
•
x1
x2
7
ESTIMATION
The least-squares regression equation
yˆ  βˆ  βˆ x      βˆ x
i
0
1
1i
p
pi
is the unique linear combination of the independent
variables x1, …, xp that has maximum possible
correlation with the dependent variable y. In other
words, of all possible linear combination of the form
b0  b1 x1      b p x p , the linear combination yˆ is such
that the correlation n
ry, yˆ 
is a maximum.
y
i 1
n
y
i 1
i

 y  yˆ i  yˆ
 y
2
i
  yˆ
n
i 1
i

 yˆ

2
Multiple
correlation
coefficient
8
ANOVA TABLE
Source
of
variation
Sum of Squares
(SS)
Df
n
2
Regression SSR    yˆ i  y 
p
i 1
Residuals
(Errors)
n
SSE    yi  yˆ i 
SST    yi  y 
i 1
2
MSR 
n - p -1
n-1
F
Statistic
SSR
MSR
F

p
MSE
n-p-1 MSE  SSE
i 1
n
Total
2
Mean
Squares
(MS)
P-value
PF  calculated F 
H0: 1 =···= p = 0
 σˆ 2  s 2
SST (Total sum of squares) = SSR (Regression sum of squares)
+ SSE (Residual sum of squares)9
ANOVA TABLE
Null hypothesis for the ANOVA table
H0: All p independent variables considered
together do not explain a significant amount of
the variation in y.
H0: There is no significant overall regression
using all p independent variables in the model.
H0: 1 = 2 = ··· = p = 0
10
ANOVA TABLE
Test for Significant Overall Regression
Hypothesis: H0: 1 = 2 = ··· = p = 0
MSR
SSR/p
Test statistic: F 

~ F p , n  p 1
MSE SSE/( n-p-1)
Decision: If F > Fp, n-p-1, 1-, then reject H0.
11
ANOVA TABLE
Relationship between F and R2
MSR
SSR/p
F

MSE SST  SSR / n  p  1
2
R /p

(1  R 2 )/( n  p  1)
SSR
Note that R 
.
SST
2
SSR=R2(SST)
12
INTERPRETATIONS OF THE
OVERALL TEST
If H0: 1 = 2 = ··· = p = 0 is not rejected, one of the
following is true:
• For a true underlying linear model, linear
combinations of x1, …, xp provide little or no
help in predicting y, that is, y is essentially as
good as βˆ 0  βˆ1 x1      βˆ p x p for predicting y.
• The true model may involve quadratic, cubic,
or other more complex functions of x1, …, xp.
13
INTERPRETATIONS OF THE
OVERALL TEST
If H0: 1 = 2 = ··· = p = 0 is rejected,
• Linear combinations of x1, …, xp provide
significant information for predicting y, that is,
the model βˆ 0  βˆ1 x1      βˆ p x p is far better than
the naive model y for predicting y.
• However, the result does not mean that all
independent variables x1, …, xp are needed for
significant prediction of y; perhaps only one or
two of them are sufficient. In other words, a
more parsimonious model than the one involving
14
all p variables may be adequate.
PARTIAL F TEST
ANOVA table for y regressed on x1, x2, and x3.
Source
of
variation
Sum of
Squares
(SS)
Df
693.06
3
231.02
9.47
0.005
1
1
1
8
588.92
103.90
0.24
24.40
24.1
4.26
0.01
0.001
0.057
0.923
Residual
588.92
103.90
0.24
195.19
Total
888.25
11
Regression
x1, x2, x3
x1
x2|x1
x3|x1, x2
Mean
Squares
(MS)
F
Statistic
Overall test
P-value
15
PARTIAL F TEST
In the previous ANOVA table, the sum of
squares regression were partitioned into three
components:
SS(x1) – the sum of squares explained by using
only x1 to predict y.
SS(x2| x1) – the extra sum of squares explained by
using x2 in addition to x1 to predict y.
SS(x3| x1, x2) – the extra sum of squares explained
by using x3 in addition to x1 and x2 to predict y.
16
PARTIAL F TEST
These extra information in the ANOVA table
are able to answer the following questions:
Does x1 alone significantly aid in predicting y?
Does the addition of x2 significantly contribute to
the prediction of y after we account (or control) for
the contribution of x1?
Does the addition of x3 significantly contribute to
the prediction of y after we account (or control) for
the contribution of x1 and x2?
17
PARTIAL F TEST
 To answer the first question, we simply fit the simple
linear regression model using x1 as the single
independent variable. The corresponding F statistic
is used to test the significant contribution of x1.
 To answer the second and the third questions, we
must use what is called a partial F test. This test
assesses whether the addition of any specific
independent variable, given others already in the
model, significantly contributes to the prediction of y.
18
PARTIAL F TEST
Test for Significant Additional Information (I)
y  β0  β1 x1      β p x p  β x  ε
*
*
H0:  * = 0
The addition of x* to a model already
containing x1, …, xp does not significantly
improve the prediction of y.
19
PARTIAL F TEST
Test for Significant Additional Information (I)
Extra SS from
adding x*, given
x1, …, xp
=
SSR when
x1, …, xp, x*
are all in the
model
SSR when
x1, …, xp
are all in the
model
SS(x*|x1, …, xp)
= SSR(x1, …, xp, x*) - SSR(x1, …, xp)
= SSE(x1, …, xp) - SSE(x1, …, xp, x*)
20
PARTIAL F TEST
Test for Significant Additional Information (I)
Test statistic (Partial F test)


F x | x1 , ..., x p 
*
*
SS(x | x1 , ..., x p )
*
MSE(x1 , ..., x p , x )
~ F1,n p2
If F > F1, n-p-2, 1-, then reject H0.
21
PARTIAL F TEST
From the example,
SS x 2 | x1 
F  x 2 | x1  
MSE x1 , x 2 
SS x 2 | x1 

SSE x1 , x 2 / 12  1  2
103.90

SSE x1 , x 2 , x 3   SS x 3 | x1 , x 2 /9
103.90

 4.78 ~ F1,9
195.19  0.24/9
22
PARTIAL F TEST
From the example,
SS x 3 | x1 , x 2 
F  x 3 | x1 , x 2  
MSE x1 , x 2 , x 3 
SS x 3 | x1 , x 2 

SSE x1 , x 2 , x 3 / 12  2  2
0.24

195.19/8
0.24

 0.01 ~ F1,8
24.40
23
MULTIPLE PARTIAL F TEST
Test for Significant Additional Information (II)
y  β0  β1 x1      β p x p  β1* x1*      β k* x*k  ε
H0: 1* = ···= k* = 0
The addition of x1*, ···, xk* to a model
already containing x1, …, xp does not
significantly improve the prediction of y.
24
MULTIPLE PARTIAL F TEST
Test for Significant Additional Information (II)
Extra SS from
adding x1*, …, xk*
given x1, …, xp
SSR when
SSR when
x1, …, xp, and
x1, …, xp
= x *, …, x *
are all in the
1
k
are all in the
model
model
SS(x1*, …, xk*|x1, …, xp)
= SSR(x1, …, xp, x1*, …, xk*) - SSR(x1, …, xp)
= SSE(x1, …, xp) - SSE(x1, …, xp, x1*, …, xk*)
25
MULTIPLE PARTIAL F TEST
Test for Significant Additional Information (II)
Test statistic (Multiple Partial F test)
F

*
1
*
k
SS x ,..., x | x1 , ..., x p

*
1
MSE x1 , ..., x p , x ,..., x

*
k

~ Fk,n p  k 1
If F > Fk, n-p-k-1, 1-, then reject H0.
26
A variable should be included in
the model?
Two methods
1. Partial F tests for variable added in
order (Type I F tests)
2. Partial F tests for variable added
last (Type III F tests) : Significance
of a variable is tested as if it were the
last variable to enter the model.
27
TYPE III F TEST
ANOVA table for y regressed on x1, x2, and x3.
Source
of
variation
Sum of
Squares
(SS)
Df
693.06
3
231.02
9.47
0.005
1
1
1
8
166.58
101.81
0.24
24.40
6.83
4.17
0.01
0.031
0.075
0.923
Residual
166.58
101.81
0.24
195.19
Total
888.25
11
Regression
x1, x2, x3
x1|x2, x3
x2|x1, x3
x3|x1, x2
Mean
Squares
(MS)
F
Statistic
Overall test
P-value
28
TYPE III F TEST
From the example,
SS x 2 | x1 , x 3 
F  x 2 | x1 , x 3  
MSE x1 , x 2 , x 3 
SS x 2 | x1 , x 3 

SSE x1 , x 2 , x 3 / 12  2  1  1
101.81

 4.17 ~ F1,8
195.19/8
29
TYPE III F TEST
From the example,
SS x 3 | x1 , x 2 
F  x 3 | x1 , x 2  
MSE x1 , x 2 , x 3 
SS x 3 | x1 , x 2 

SSE x1 , x 2 , x 3 / 12  2  1  1
0.24

 0.01 ~ F1,8
195.19/8
30
Model #
6
Model
y=
+
SSE
48,139.00
7
y = 0
+
888.25
8
y = 0 + 1x1
+
299.33
4
5
y = 0 + 1x1 + 2x2
+
y = 0 + 1x1 + 2x2 + 3x3 + 
195.43
195.19
Model #
1
Model
y=
1x1 + 2x2 + 3x3 + 
SSE
24,421.45
+ 2x2 + 3x3 + 
361.77
2
y = 0
3
y = 0 + 1x1
+ 3x3 + 
297.01
4
5
y = 0 + 1x1 + 2x2
+
y = 0 + 1x1 + 2x2 + 3x3 + 
195.43
195.19
31
Regression SS
Parameter
0
Variable
Type I SS
Type III SS
Intercept SSE(6)-SSE(7) = 47,250.75
SSE(1)-SSE(5) = 24,226.26
1
x1
SSE(7)-SSE(8) =
588.92
SSE(2)-SSE(5) =
166.58
2
x2
SSE(8)-SSE(4) =
103.90
SSE(3)-SSE(5) =
101.82
3
x3
SSE(4)-SSE(5) =
0.24
SSE(4)-SSE(5) =
0.24
Type I SS = SS for the partial F test (sequential SS)
Type III SS = SS for the multiple partial F test
32
Forward addition
Source
df
Type I SS
F value
Pr > F
x1
1
588.92
24.14
0.0012
x2
x3
1
1
103.90
0.24
4.26
0.01
0.0730
0.9238
Backward elimination
Source
df
Type III SS
F value
Pr > F
x1
1
166.58
6.83
0.0310
x2
1
101.81
4.17
0.0754
x3
1
0.24
0.01
0.9238
33