Transcript Document

Introduction to
Multiple Regression
James R. Stacks, Ph.D.
[email protected]
The best way to have a good idea is to have lots of ideas
Linus Pauling
Standardized form of a regression
equation with three predictor variables
Z’c = b1Zp1 + b2Zp2 + b3Zp3
Predictor variables
(standardized z scores)
Z’c = b1Zp1 + b2Zp2 + b3Zp3
Z’c = b1Zp1 + b2Zp2 + b3Zp3
Standardized regression
coefficients
Z’c = b1Zp1 + b2Zp2 + b3Zp3
Predicted criterion score
(zc – ze)
Predictor variables
(standardized z scores)
Z’c = b1Zp1 + b2Zp2 + b3Zp3
Predicted criterion score
(zc – ze)
Standardized regression
coefficients
Z’c = b1Zp1 + b2Zp2 + b3Zp3
Predicted criterion score
(zc – ze)
Predicted criterion score
(zc – ze)
Z’c = b1Zp1 + b2Zp2 + b3Zp3
Recall that the predicted criterion score is the
is the actual criterion score minus the error
Zc = b1Zp1 + b2Zp2 + b3Zp3 + Ze
Recall that multiplication of an entire equation by any
value results in an equivalent equation:
y=bx
is the same as yx = bxx
or as yx = bx2
The following demonstration
of solving for standardized
regression coefficients is taken
largely from:
Maruyama, Geoffrey M. (1998).
Basics of structural equation
modeling. Thousand Oaks, CA:
Sage Publications, Inc.
Let’s write three equivalent forms of the previous
multiple regression equation by multiplying the original
equation by each of the three predictor variables:
ZcZp1 = b1Zp1Zp1 + b2Zp2Zp1 + b3Zp3Zp1 + ZeZp1
ZcZp2 = b1Zp1Zp2 + b2Zp2Zp2 + b3Zp3Zp2 + ZeZp2
ZcZp3 = b1Zp1Zp3 + b2Zp2Zp3 + b3Zp3Zp3 + ZeZp3
(Maruyama, Geoffrey M. (1998). Basics of structural equation
modeling. Thousand Oaks, CA: Sage Publications, Inc.)
Now notice all the zz cross products in the equations. Recall that
the expected (mean) cross product is something we are familiar
with. The unbiased estimate of the cross product for paired z
values is:
E(cross product) = S(zz)/(n-1) , or , Pearson r !
ZcZp1 = b1Zp1Zp1 + b2Zp2Zp1 + b3Zp3Zp1 + ZeZp1
ZcZp2 = b1Zp1Zp2 + b2Zp2Zp2 + b3Zp3Zp2 + ZeZp2
ZcZp3 = b1Zp1Zp3 + b2Zp2Zp3 + b3Zp3Zp3 + ZeZp3
(Maruyama, Geoffrey M. (1998). Basics of structural equation
modeling. Thousand Oaks, CA: Sage Publications, Inc.)
The
Pearson product-moment
correlation coefficient
(written as r for sample estimate, r for parameter)
r
S
n
=
Za Zb
n-1
i=1
Where za and zb are z scores for each person on some measure a and some measure b,
and n is the number of people
So, I could just as easily write:
rc p1 = b1r p1 p1 + b2 rp2 p1 + b3 rp3 p1 + re p1
rc p2 = b1r p1 p2 + b2 rp2 p2 + b3 rp3 p2 + re p2
rc p3 = b1r p1 p3 + b2 rp2 p3 + b3 rp3 p3 + re p3
(Maruyama, Geoffrey M. (1998). Basics of structural equation
modeling. Thousand Oaks, CA: Sage Publications, Inc.)
Now, let’s look at some interesting things about the
correlation coefficients we have substituted
rc p1 = b1r p1 p1 + b2 rp2 p1 + b3 rp3 p1 + re p1
rc p2 = b1r p1 p2 + b2 rp2 p2 + b3 rp3 p2 + re p2
rc p3 = b1r p1 p3 + b2 rp2 p3 + b3 rp3 p3 + re p3
Correlations of variables with themselves are necessarily unity,
So the red values are 1
In regression, error by definition is the variance which does not correlate
with any variable, thus the blue values are necessarily 0
(Maruyama, Geoffrey M. (1998). Basics of structural equation
modeling. Thousand Oaks, CA: Sage Publications, Inc.)
rc p1 = b1(1) + b2 rp2 p1 + b3 rp3 p1
rc p2 = b1r p1 p2 + b2 (1) + b3 rp3 p2
rc p3 = b1r p1 p3 + b2 rp2 p3 + b3 (1)
The above system can be written in matrix form:
rc p1
rc p2
rc p3
b1(1) + b2 rp2 p1 + b3 rp3 p1
=
b1r p1 p2 + b2 (1)
+ b3 rp3 p2
b1r p1 p3 + b2 rp2 p3 + b3 (1)
(Maruyama, Geoffrey M. (1998). Basics of structural equation
modeling. Thousand Oaks, CA: Sage Publications, Inc.)
rc p1
rc p2
rc p3
b1(1) + b2 rp2 p1 + b3 rp3 p1
=
b1r p1 p2 + b2 (1) + b3 rp3 p2
b1r p1 p3 + b2 rp2 p3 + b3 (1)
Note that the matrix on the right side above is a vector, and it is a product
of a correlation matrix of the predictor variables and a b vector.
rc p1
rc p2
rc p3
=
(1)
rp2 p1
rp3 p1
b1
r p1 p2
(1)
rp3 p2
b2
r p1 p3
rp2 p3
(1)
b3
(Maruyama, Geoffrey M. (1998). Basics of structural equation
modeling. Thousand Oaks, CA: Sage Publications, Inc.)
rc p1
rc p2
rc p3
1
=
rp2 p1
r p1 p2
1
r p1 p3
rp2 p3
rp3 p1
b1
rp3 p2
b2
1
b3
The moral of this story is:
assuming all the Pearson correlations among
variables are known (they are easily calculated), we
can use the equation above to solve for the b vector,
which is the standardized regression coefficients.
Z’c = b1Zp1 + b2Zp2 + b3Zp3
(Maruyama, Geoffrey M. (1998). Basics of structural equation
modeling. Thousand Oaks, CA: Sage Publications, Inc.)
rc p1
rc p2
rc p3
1
=
rp2 p1
r p1 p2
1
r p1 p3
rp2 p3
rp3 p1
b1
rp3 p2
b2
1
b3
This is a matrix equation which can be symbolized as:
Riy = RiiBi
From algebra, such an equation can obviously be solved
for
Bi by dividing both sides by Rii, but there is no such
thing as division in matrix math
The matrix notation used here corresponds to your text: Tabachnik, Barbara G. & Fidell, Linda S. (2001).
Using multivariate statistics., 4th Edition. Needham Heights, MA: Allyn & Bacon
What is necessary to accomplish the same goal is to
multiply both sides of the equation by the inverse
-1
of ii, written as ii .
R
R
-1
-1
Rii Riy = Rii Rii Bi
therefore
-1
Rii Riy = Bi
If you have studied the appendix assigned on matrix
algebra,you know that, while matrix multiplication is
quite simple, matrix inversion is a real chore!
The matrix notation used here corresponds to your text: Tabachnik, Barbara G. & Fidell, Linda S. (2001).
Using multivariate statistics., 4th Edition. Needham Heights, MA: Allyn & Bacon
rc p1
rc p2
rc p3
=
1
rp2 p1
rp3 p1
b1
r p1 p2
1
rp3 p2
b2
r p1 p3
rp2 p3
1
b3
Riy =
Rii
Bi
To get the solution we must find the inverse of the green
shaded matrix Rii in order to get Rii-1 for the equation :
-1
Rii Riy = Bi
The matrix notation used here corresponds to your text: Tabachnik, Barbara G. & Fidell, Linda S. (2001).
Using multivariate statistics., 4th Edition. Needham Heights, MA: Allyn & Bacon
The following method of inverting a
matrix is taken largely from:
Swokowski, Earl W. (1979)
Fundamentals of College Algebra.
Boston, MA: Prindle, Weber &
Scmidt
The first step is to form a matrix which has the same number of
rows as the original correlation matrix of predictors, but has twice
as many columns. The original predictor correlations are placed
in the left half, and an equal order identity matrix is place in the
Earl W. (1979) Fundamentals of College
right half: (Swokowski,
Algebra. Boston, MA: Prindle, Weber & Scmidt)
(Predictor correlations)
(Identity matrix)
Though a series of calculations called elementary row transformations,
the goal is to change all the numbers in the matrix so that the
identity matrix is on the left, and a new matrix is on the right:
(Swokowski, Earl W. (1979) Fundamentals of College
Algebra. Boston, MA: Prindle, Weber & Scmidt)
Identity Matrix
Inverse Matrix
Swokowski, Earl W. (1979) Fundamentals of College
Algebra. Boston, MA: Prindle, Weber & Scmidt
“
MATRIX ROW TRANSFORMATION THEOREM
Given a matrix of a system of linear equations, each of the
following transformations results in a matrix of an equivalent
system of linear equations:
(i)
Interchanging any two rows
(ii) Multiplying all of the elements in a row by the same nonzero
real number k.
(iii) Adding to the elements in a row k times the corresponding
elements of any other row, where k is any real number.
“
1st transformation:
a2j = a2j + (-.488) a1j
2nd transformation:
a3j = a3j + (-.354) a1j
3rd transformation:
a2j = a2j .
/
1 .761856
4th transformation:
a3j = a3j + (-.199248) a2j
5th transformation:
a3j = a3j .
/
1 .822574723
6th transformation:
a1j = a1j + (-.488) a2j
7th transformation:
a1j = a1j + (-.226373488) a3j
8th transformation:
a2j = a2j + (-.261529737) a3j
Inverted
matrix
on right
Original
matrix
on
left
inverse of predictor
correlations
predictor/criterion
correlations
beta vector
b1
=
b2
b3
-1
Rii
Riy =
Bi
OUR CALCULATIONS
VALUES FROM SPSS
-.257
b1
.873
b2
.150
b3
The difference has to do with rounding error. There are so many transformations in matrix math that all
computations must be carried out with many, many significant figures, because the errors accumulate. I only
used what was visible in my calculator. Good matrix software should use much more precision. This is a
relatively brief equation to solve. Imagine the error that can accumulate with hundreds of matrix
transformations. This is a very important point, and one should always be certain the software is using the
appropriate degree of precision.,
The regression equation can then
be written:
Z’c = (-.255)Zp1 + (.872)Zp2 + (.149) Zp3