Transcript Slide 1

Introduction to Regression
Analysis
Two Purposes
• Explanation
– Explain (or account for) the variance in a
variable (e.g., explain why children’s test
scores vary).
– We’ll cover this later.
• Prediction
– Construction an equation to predict scores
on some variable.
– Construct an equation that can be used in
selecting individuals.
Prediction
• Use a set of scores collected from a sample
to make predictions about individuals in the
population (not in the sample).
• Use the scores to construct a mathematical
(typically linear) equation that allows us to
predict performance.
• Two types of scores are collected:
– Usually a measure on one criterion (outcome,
dependent) variable.
– Scores on one or more predictor (independent)
variables.
The equations
The equation for one
individual’s criterion
score:
The prediction equation
for that individual’s
score
The difference between
the two equations
(called a residual)
Y1  f ( X 11  X 12  ...)  e1
Y1  f ( X11  X12  ...)
e1  Y1  Y1
The function
The linear function has the
form:
Where the βs are weights
(regression weights)
selected such that sum
of squared errors are
minimized (least squares
criterion)
f ( X1 )  1 X11  2 X12  ...
 e 
Min
2
2

 MinY  Y 
Multiple Correlation
Minimizing the sum of squared errors causes
the correlation between the actual criterion
scores and the predicted scores to be
maximized (as large as possible). This
correlation is called a multiple correlation. It
is the correlation between the criterion
variable and a linear composite of the
predictor variables.
Ryyˆ  Maximum
Coefficient of Determinatin
The square of the multiple correlation,
R
2
is called the coefficient of determination. It
gives the proportion of shared variance (i.e.,
covariance) between the criterion variable and
the weighted linear composite.
Hence the larger the, R2, the better the
prediction equation.
Basic regression equation
The parametric regression model is given by
Yi = α + βXi + εi
The model for a sample, by
Yi = a + bXi + ei
Computing the constants in the
regression equation
where
A closer look at the regression
equation
Yˆ  a  bX
 (Y  bX )  bX
 Y  b( X  X )
 Y  bx
SSy is given by
Partitioning the Sum of
Squares (SSy)
2
(
Y

Y
)

Now, Consider the following identity
Y  Y  (Yˆ  Y )  (Y  Yˆ )
Subtracting
from each side gives,
Y  Y  (Yˆ  Y )  (Y  Yˆ )
Squaring and summing gives,
2
2
ˆ
ˆ
(
Y

Y
)

[(
Y

Y
)

(
Y

Y
)]


  (Yˆ  Y )2   (Y  Yˆ )2  2 (Yˆ  Y )(Y  Yˆ )
Simplifying the previous equation
y 2   (Yˆ  Y ) 2   (Y  Yˆ ) 2
 SSreg  SSres
Where SSreg = Sum of squares due to regression, and
SSres = Residual sum of squares.
Dividing through by the total sum of squares,
gives:
, or
Example
Y
X
3
1
0
4
5
1
0
1
-1
2
Calculation of squares and cross-products
Deviation squares and crossproducts
Y
3
1
0
4
5
y
.4
-1.6
-2.6
1.4
2.4
y2
.16
2.56
6.76
1.96
5.76
X
1
0
1
-1
2
x
.4
-.6
-1.6
.4
1.4
x2
.16
.36
2.56
.16
1.96
xy
.16
.96
4.16
.56
3.36
Sums of squares
and crossproducts
Calculation of the coefficients
xy 9.2

The slope, b 

 1.769
 x 5.2
2
the intercept, a  Y  bX
 (2.6  1.769 .6)
 2.6  1.0776
 1.5224
and the regression line. Yˆ  1.522 1.769X
Calculation of SSreg
SSreg   (Y   Y ) 2
From an earlier
equation…
 [(Y  bx)  Y ]2
  (Y  bx  Y ) 2
  (bx) 2
 b2  x2
xy

(
) x
x
( xy)

x
2
2
2
84.64
5.2
 16.28.

2
Some additional equations for SSreg
Hence…
SSreg computed from a correlation
The formula for the Pearson correlation is…
therefore,
A Closer Look at the
Equations in Regression
Analysis
The Variance
x
sx 
n 1
2
2
The standard deviation
Sx 
s
2
x
The covariance
 xy
sxy 
n 1
The Pearson product moment
correlation
rxy 
sxy
sx s y
The normal equations (for the
regressions of y on x)
 xy
bxy 
x
sxy

sx
2
2
a  Y - byx X
The structural model (for an
observation on individual i)
Yi  a  byx X i  ei
The regression equation
Yˆ  a  byx X
 (Y  byx X )  byx X
 Y  byx ( X  X )
 Y  byx X
Partitioning a deviation score, y
y YY
 Y  (Y  Y )  (Y  Y )  Y


 (Y  Y )  (Y  Y )
The score, Y, is partitioned
Hence, Y is partitioned into a deviation of
a predicted score from the mean or the
scores PLUS a deviation of the actual
score from the predicted score.
Our next step is to square the deviation,
y  Y Y
y  Y Y
and sum over all the scores.
Partitioning the sum of squared
deviations (sum of squares, SSy)
y
2
 (Y  Y )
  [(Y  Y )  (Y  Y )]
  (Y  Y )   (Y  Y )

2
2
2
 SS reg  SS res
2
What happened to the term,
2 (Yˆ  Y )(Y  Yˆ )?
ˆ
ˆ
(
Y

Y
)(
Y

Y
) reduces to

Showing that 2
zero requires some complicated algebra,
recalling that
Yˆ  a  byx X
and that
byx  xy / x .
2
Calculation of proportions of sums of
squares due to regression and due to
error (or residual)
y
y
2
2

1
SS reg
y
2
SS reg
y
2
SS res

y

SS res
y
2
2
Alternative formulas for computing the
sums of squares due to regression
 (Y  Y )
  (Y  bx  Y )
  (bx )
b x
(  xy )

x

( x )
(  xy )

x
xy


xy

x
 b xy
SS reg 
2
2
2
2
2
2
2 2
2
2
2
2
Test of the regression coefficient, byx,
(i.e. test the null hypothesis that byx = 0)
First compute the variance of estimate
s
2
y x
 est ( )
2
y
2

(Y  Y )


N k1
SS res

N k1
Test of the regression coefficient, byx,
(i.e. test the null hypothesis that byx = 0)
Then obtain the standard error of estimate
s y x 
2
s y x
Then compute the standard error of the regression
coefficient, Sb
sb 
s
2
y x
( x ) / (n  1)
2

s y x
( x ) / ( N  1)
2
The test of significance of the regression
coefficient (byx)
The significance of the regression coefficient is tested
using a t test with (N-k-1) degrees of freedom:
t

b yx
sb
b yx
S y x
Sx n  1
Computing regression using
correlations
The correlation, in the
population, is given by
The population correlation
coefficient, ρxy, is
estimated by the sample
correlation coefficient, rxy
 xy

N x  y
 zx z y
rxy 
N
sxy

sx s y
 xy

2
2
x
y
 
Sums of squares, regression (SSreg)
Recalling that R2 gives the proportion of
variance of Y accounted for (or explained) by
X, we can obtain
SS reg  r
2
y
2
SS res  (1  r 2 ) y 2
or, in other words, SSreg is that portion of SSy
predicted or explained by the regression of Y
on X.
Standard error of estimate
From SSres we can compute the variance
of estimate and standard error of
estimate as
(1  r ) y

N  k 1
 s y x
2
2
s y x
s y x
2
(Note alternative formulas were given
earlier.)
Testing the Significance of r
The significance of a correlation coefficient,
r, is tested using a t test:
t
r N 2
1 r
2
With N-2 degrees of freedom.
Testing the difference between
two correlations
To test the difference between two
Pearson correlation coefficients, use
the “Comparing two correlation
coefficients” calculator on my web site.
Testing the difference between
two regression coefficients
This, also, is a t test:
t
Where
b1  b2
S b21  S b22
S
2
b
was given earlier.
Point-biserial and Phi correlation
These are both Pearson Product-moment
correlations
The Point-biserial correlation is used when on
variable is a scale variable and the other
represents a true dichotomy.
For instance, the correlation between a
performance on an item—the dichotomous
variable—and the total score on a test—the
scaled variable.
Point-biserial and Phi correlation
The Phi correlation is used when both
variables represent a true dichotomy.
For instance, the correlation between two
test items.
Biserial and Tetrachoric
correlation
These are non-Pearson correlations.
Both are rarely used anymore.
The biserial correlation is used when one
variable is truly a scaled variable and
the other represents an artificial
dichotomy.
The Tetrachoric correlation is used when
both variables represent an artificial
dichotomy.
Spearman’s Rho Coefficient and
Kendall’s Tau Coefficient
Spearman’s rho is used to compute the
correlation between two ordinal (or
ranked) variables.
It is the correlation between two sets of
ranks.