Regression with 2 IVs

Transcript Regression with 2 IVs

Regression with 2 IVs
Generalization of Regression from 1
to 2 Independent Variables
Questions
• Write a raw score
regression equation
with 2 IVs in it.
• What is the
difference in
interpretation of b
weights in simple
regression vs.
multiple regression?
• What happens to b
weights if we add new
variables to the
regression equation that
are highly correlated
with ones already in the
equation?
• Why do we report beta
weights (standardized b
weights)?
More Questions
• Write a regression
equation with beta
weights in it.
• How is it possible to
have a significant rsquare and nonsignificant b
weights?
• What are the three
factors that
influence the
standard error of the
b weight?
• Describe R-square
in two different
ways, that is, using
two distinct
formulas. Explain
the formulas.
Equations
Y  a  bX  e
1 IV. Define terms.
Y  a  b1 X 1  b2 X 2  ... b k X k  e
Multiple IVs.
One score, 1 intercept; 1 error; many slopes.
Y   a  b1 X 1  b2 X 2  ... b k X k
Predicted value.
Recall slope and intercept for 1 IV:
b  r XY
SD Y

SD X
 xy   ( X
 xy
x
2
a  Y  bX
 X )( Y  Y )
Sum of cross-products.
Equations (2)
(  x 2 )(  x 1 y )  (  x 1 x 2 )(  x 2 y )
2
b1 
(  x 1 )(  x 2 )  (  x 1 x 2 )
2
2
2
(  x 1 )(  x 2 y )  (  x 1 x 2 )(  x 1 y )
Note: b weights use
SSx1, SSx2, and all 3
cross products.
2
b2 
(  x 1 )(  x 2 )  (  x 1 x 2 )
2
2
a  Y  b1 X 1  b2 X 2
2
Unlike slopes, the intercept is a
simple extension of the 1 IV
case.
Numerical Example
Chevy mechanics; mechanical aptitude &
conscientiousness. Find sums.
Job Perf
Mech
Apt
Y
Note. Only
some of the data
are shown. Size
problem in
Powerpoint.
Consc
X1
X2
X1*Y
X2*Y
X1*X2
1
40
25
40
25
1000
2
45
20
90
40
900
1
38
30
38
30
1140
3
58
38
174
114
2204
Y
X1
X2
X1*Y
X2*Y
X1*X2
65
1038
655
3513
2219
34510
20
20
20
20
20
20
N
3.25
51.9
32.75
175.65
110.95
1725.5
M
1.25
7.58
5.24
84.33
54.73
474.60
SD
29.75
1091.8
521.75
x
1
y 

X 1Y 

X 2Y 

x2 y 

x1 x 2 

(  X 1 )(  Y )
N
(  X 2 )(  Y )
X1X2 
N
(  X 1 )(  X 2 )
N
Sum
USS

x1 y  3513 
(1038 )( 65 )
20
 139 . 5
SSCP Matrix
SSCP means sums of squares and cross-products.
Y
X1
X2
Y
29.75
139.5
90.25
X1
1091.8
515.5 521.75
Y
Y
X1
X2
X2
X1

x y
x y
y
1
2
X2
2

x
2
x1
1
x2

2
x2
Find Estimates
Y
X1 (MA)
Y (Perf)
29.75
X1
139.5
1091.8
X2
90.25
515.5
Y '   4 . 1  . 09 X 1  . 09 X 2
X2 (Consc)
SSCP
521.75
(  x 2 )(  x 1 y )  (  x 1 x 2 )(  x 2 y )
2
b1 
b1 
b1 
(  x 1 )(  x 2 )  (  x 1 x 2 )
2
2
2
(521.75)(139 .5)  (515.5)( 90.25)
(1091.8 )(521.75)  (515.5)(515.5)
(  x 1 )(  x 2 y )  (  x 1 x 2 )(  x 1 y )
2
b2 
b2 
( 72784 .13)  ( 46523.88 )
(569646.7 )  ( 265740.3)
b1 
26260 .25
 .086409  .09
b2 
(  x 1 )(  x 2 )  (  x 1 x 2 )
2
2
2
(1091.8 )( 90.25)  (515.5)(139 .5)
(1091.8 )(521.75)  (515.5)(515.5)
26622 .7
 .087602  .09
303906 .4
a  Y  b1 X 1  b2 X 2
a  3.25  .086409 (51.9 )  .087602 ( 32 .75)   4 .10
303906 .4
Predicted job performance as a function of test scores.
Scatterplots
Sc atte rp lo t
Sc atte rp lo t
5
Jo b Pe r fo r m a n ce
4
3
2
1
4
3
2
1
30
40
50
60
Mec han ic al Aptitude
70
Sc atte rp lo t
20
30
40
30
20
30
40
40
C on s c ien tio us n es s
50
Co n scie n tio u sn e ss
Jo b Pe r fo r m a n ce
5
50
Mec han ic al Aptitude
60
70
50
Scatterplot 2
Scatterplot of Predicted vs. Actual Values
6
5
Y
4
3
2
1
0
1
2
3
Y'
4
5
Scatterplot 3
3D Plo t of R eg re s s ion
5
4
3
2
30
40
1
50
60
70
20
30
40
50
Predicted Y is a plane.
Y is linear function of Xs plus error.
2
R
Y  a  b1 X 1  b2 X 2  ... b k X k  e
Use capital R for
multiple regression.
R  rY ,Y '
2
2
R  . 78  rY ,Y ' ; R  . 61
2
R2 is the
proportion of
variance in Y due
to regression.
R 
2
1 . 05
 ~ . 61
1 . 57
Note: N=19; lost 1.
Y
2
1
3
2
3
3
4
4
3
5
3
3
2
4
5
5
5
4
3
X1
45
38
50
48
55
53
55
58
40
55
48
45
55
60
60
60
65
50
58
X2
20
30
30
28
30
34
36
32
34
38
28
30
36
34
38
42
38
34
38
Y'
1.54
1.81
2.84
2.50
3.28
3.45
3.80
3.71
2.33
3.98
2.50
2.41
3.80
4.06
4.41
4.76
4.84
3.19
4.24
Resid
0.46
-0.81
0.16
-0.50
-0.28
-0.45
0.20
0.29
0.67
1.02
0.50
0.59
-1.80
-0.06
0.59
0.24
0.16
0.80
-1.24
M = 3.25
51.9
32.75
3.25
0
V = 1.57
57.46
27.46
1.05
0.52
19.95
9.42
USS=29.83
Correlations Among Data
Y
X1
X2
Pred
Y
1
X1
.73
1
X2
.68
.64
1
Pred
.78
.94
.87
1
0
0
0
Resid .62
Resid
1
Excel Example
Grab from the web under Lecture, Excel Example.
Review
• Write a raw score regression equation with 2
IVs in it. Describe terms.
• Describe a concrete example where you
would use multiple regression to analyze the
data.
• What does R2 mean in multiple regression?
• For your concrete example, what would an R2
of .15 mean?
• With 1 IV, the IV and the predicted values
correlate 1.0. Not so with 2 or more IVs.
Why?
Significance Test for
2
R
2
F 
R /k
When the null is true, result is
(1  R ) / ( N  k  1) distributed as F with k and (N-k-1)
df.
2
In our example, R2 = .61, k = 2 and N = 20.
F 
. 61 / 2
(1  . 61 ) /( 20  2  1)
F(α=.05,2,17)=3.59.

. 305
. 39 / 17
 13 . 29
The Problem of Variable
Importance
With 1 IV, the correlation provides a simple index of the
‘importance’ of that variable. Both r and r2 are good
indices of importance with 1 IV.
With multiple IVs, total R-square will be the sum of the
individual IV r2 values, if and only if the IVs are
mutually uncorrelated, that is, they correlate to some
degree with Y, but not with each other.
When multiple IVs are correlated, there are many
different statistical indices of the ‘importance’ of the IVs,
and they do not agree with one another. There is no
simple answer to questions about the importance of
correlated IVs. Rather there are many reasonable
answers depending on what you mean by importance.
Venn Diagrams {easy but not always right}
Fig 1. IVs uncorrelated.
Fig 2. IVs correlated.
Y
Y
X1
U Y:X1
U Y:X2
Sh are d X
X2
Sh are d Y
X1
X2
r2 for X1, Y.
R
Y
X1
Y
1
X1
.50
1
X2
.60
.00
R2=.52+.62=.61
R
Y
X1
Y
1
X1
.40
1
X2
.50
.30
X2
X2
1
1
R2=.32
R2  .16+.25 = .41
What to do with shared Y?
More Venn Diagrams
Desired state
Typical state
Y
X1
Y
X2
X1
X2
X3
X3
In a regression problem, we want to predict Y from X
as well as possible (maximize R2). To do so, want X
variables correlated with Y but not X. Hard to find,
e.g., cognitive ability tests.
Raw & Standardized
Regression Weights
• Each X has a raw
score slope, b.
• Slope tells expected
change in Y if X
changes 1 unit*.
• Large b weights
should indicate
important variables,
but b depends on
variance of X.
• A b for height in
inches would be 12
times larger than b
for height in feet.
• If we standardize X
and Y, all units of X
are the same.
• Relative size of b
now meaningful.
*strictly speaking, holding other X variables constant.
Computing Standardized
Regression Weights
z y  rxy z x
Standardized regression weight aka beta
weight. Poor choice of names & symbols.
'
zy   zx
'
With 1 IV, r   .
If you have a correlation matrix, you can calculate beta
weights (standardized regression weights).
Y
x1
x2
Y
1
x1
0.73
1
x2
0.68
0.64
1 
1
2 
1 
. 73  . 68 (. 64 )
1  . 64
2
 . 50
2 
ry 1  ry 2 r12
1  r12
2
ry 2  ry 1 r12
1  r12
. 68  . 73 (. 64 )
1  . 64
What is r12?
What impact?
2
2
 . 36
Calculating
R r
2
2
Y ,Y '
R 
2
2
R
V ( reg )
R 
V ( tot )
R  ry 1  ry 2
( only if r12  0 )
2
SS ( reg )
SS ( tot )
Sum of squared simple
(zero order) r.
2
R   1 ry 1   2 ry 2
Product of standardized
regression weight and r.
This is really interesting because the sum of products
will add up to R2 and because r,  , and the product of
the two are all reasonable indices of the importance of
the IV.
2
2
2
Calculating
Y
x1
2
R
1 
x2
Y
1
x1
0.73
1
x2
0.68
0.64
(2)
1
2 
. 73  . 68 (. 64 )
1  . 64
. 68  . 73 (. 64 )
R   1 ry 1   2 ry 2
2
1  . 64
R  . 50 (. 73 )  . 36 (. 68 )  . 365  . 24  . 61
2
. 73
. 50
 1 . 07
. 68
. 36
2
R 
. 24
2
1  r12
2
. 73  . 68  2 (. 73 )(. 68 )(. 64 )
2
2
 1 . 52
ry 1  ry 2  2 ry 1 ry 2 r12
2
R 
 1 . 39
. 365
2
1  . 64
2
2
 . 61
2
 . 50
 . 36
Review
• What is the problem with correlated
independent variables if we want to
maximize variance accounted for in the
criterion?
• Why do we report beta weights
(standardized b weights)?
• Describe R-square in two different
ways, that is, using two distinct
formulas. Explain the formulas.
Tests of Regression
Coefficients (b Weights)
Each slope tells the expected change in Y when X changes 1
unit, but X is controlled for all other X variables. Consider
Venn diagrams. Standard errors of b weights with 2 IVs:
2
sb1 
2
s y .12

x (1  r )
2
1
2
12
sb 2 
s y .12

x 2 (1  r12 )
2
2
Where S2y.12 is the variance of estimate (variance of
residuals), the first term in the denominator is the sum
of squares for X1 or X2, and r212 is the squared
correlation between predictors.
s y .12 
2
ss res
N  k 1
Tests of b Weights (2)
SSres=9.42

S
2
y . 12

x1  1091 . 8
2
S b1 
SSres
N  k 1

. 59
2
1091 . 8 (1  . 64 )

9 . 42
 . 59
17
x 2  521 . 75
2
 . 03 S b 2 
. 59
2
521 . 75 (1  . 64 )
 . 04
For significance of the b weight, compute a t:
t b1 
b1

S b1
t b2 
b2
S b2
. 09
3
. 03

. 09
. 04
 2 . 25
t  .05 ,17  2 . 11
Degrees of freedom for each t
are N-k-1.
Tests of
2
R
vs Tests of b
• Slopes (b) tell about the relation between Y
and the unique part of X. R2 tells about
proportion of variance in Y accounted for by
set of predictors all together.
• Correlations among X variables increase the
standard errors of b weights but not R2.
• Possible to get significant R2, but no or few
significant b weights (see Venn diagrams).
• Possible but unlikely to have significant b but
not significant R2. Look to R2 first. If it is
n.s., avoid interpreting b weights.
Review
• How is it possible to have a significant
R-square and non-significant b
weights?
• Write a regression equation with beta
weights in it. Describe terms.
Testing Incremental
2
R
You can start regression with a set of one or more variables and
then add predictors 1 or more at a time. When you add predictors,
R2 will never go down. It usually goes up, and you can test
whether the increment in R2 is significant or else if likely due to
chance.( R 2  R 2 ) /( k  k )
F 
2
RS
kL
kS
L
S
L
S
(1  R ) /( N  k L  1)
2
L
2
RL
=R-square for the larger
model
=R-square for the smaller model
= number of predictors in the larger model
=number of predictors in the smaller model
Examples of Testing
Increments
Suppose we start with 1 variable and R-square is .52. We
add a second variable and R-square increases to .67. We
have 20 people. Then
F 
(. 67  . 52 ) /( 2  1)
(1  . 67 ) /( 20  2  1)
. 15

 7 . 73
. 33 / 17
F(  .05 ,1,17 )  4 . 45 p<.05
Suppose we start with 3 IVs and R-square is .25. We
add 2 more IVs in a block and R-square climbs to .35.
We have 100 people. Then:
F 
(. 35  . 25 ) /( 5  3 )
(1  . 35 ) /( 100  5  1)

.1 / 2
. 65 / 94
 7 . 23
F(  .05 , 2 , 94 )  3 . 09
p <.05
Another Look at Importance
• In regression problems, the most commonly used indices of
importance are the correlation, r, and the increment to R-square when
the variable of interest is considered last. The second is sometimes
called a last-in R-square change. The last-in increment corresponds to
the Type III sums of squares and is closely related to the b weight.
• The correlation tells about the importance of the variable ignoring
all other predictors.
• The last-in increment tells about the importance of the variable as a
unique contributor to the prediction of Y, above and beyond all
other predictors in the model.
•You can assign shared variance in Y to specific X by adding variable
to equations in order, but then the importance is sort of arbitrary and
under your influence.
•“Importance” is not well defined statistically when IVs are
correlated. Doesn’t include mediated models (path analysis).
Review
• Find data on website – Labs, then 2IV
example
• Find r, beta, r*beta
• Describe importance