Transcript Chapter 11

11-1
Chapter 11
Correlation and
Regression
© The McGraw-Hill Companies, Inc., 2000
11-2
Outline

11-1 Introduction

11-2 Scatter Plots

11-3 Correlation

11-4 Regression
© The McGraw-Hill Companies, Inc., 2000
11-3
Outline

11-5 Coefficient of
Determination and
Standard Error of Estimate
© The McGraw-Hill Companies, Inc., 2000
11-4
Objectives




Draw a scatter plot for a set of
ordered pairs.
Find the correlation coefficient.
Test the hypothesis H0:  = 0.
Find the equation of the
regression line.
© The McGraw-Hill Companies, Inc., 2000
11-5
Objectives


Find the coefficient of
determination.
Find the standard error of
estimate.
© The McGraw-Hill Companies, Inc., 2000
11-6
11-2 Scatter Plots

A scatter plot is a graph of the
ordered pairs (x, y) of numbers
consisting of the independent
variable, x, and the dependent
variable, y.
© The McGraw-Hill Companies, Inc., 2000
11-7
11-2 Scatter Plots - Example


Construct a scatter plot for the data
obtained in a study of age and systolic
blood pressure of six randomly selected
subjects.
The data is given on the next slide.
© The McGraw-Hill Companies, Inc., 2000
11-8
11-2 Scatter Plots - Example
Subject
Age, x
Pressure, y
A
43
128
B
48
120
C
56
135
D
61
143
E
67
141
F
70
152
© The McGraw-Hill Companies, Inc., 2000
11-2 Scatter Plots - Example
Positive Relationship
Pressure
Pressure
11-9
150
150
140
140
130
130
120
120
40
40
50
50
Age
Age
60
60
70
70
© The McGraw-Hill Companies, Inc., 2000
11-2 Scatter Plots - Other Examples
Negative Relationship
90
90
Finalgrade
grade
Final
11-10
80
80
70
70
60
60
50
50
40
40
55
10
10
Number
Numberofofabsences
absences
15
15
© The McGraw-Hill Companies, Inc., 2000
11-2 Scatter Plots - Other Examples
No Relationship
10
10
y
Y
11-11
55
00
00
10
10 20
20 30
30 40
40 50
50 60
60 70
70
xX
© The McGraw-Hill Companies, Inc., 2000
11-12
11-3 Correlation Coefficient



The correlation coefficient
computed from the sample data
measures the strength and direction
of a relationship between two
variables.
Sample correlation coefficient, r.
Population correlation coefficient, 
© The McGraw-Hill Companies, Inc., 2000
11-13
11-3 Range of Values for the
Correlation Coefficient
Strong negative
relationship

No linear
relationship

Strong positive
relationship

© The McGraw-Hill Companies, Inc., 2000
11-14
11-3 Formula for the Correlation
Coefficient r
r
n xy   x y
n x    x n y    y

2

2
2
2

Where n is the number of data pairs
© The McGraw-Hill Companies, Inc., 2000
11-15
11-3 Correlation Coefficient Example (Verify)

Compute the correlation coefficient
for the age and blood pressure data.
 x  345,  y = 819,  xy = 47 634
2
2
x

20
399
,
y

  112 443.
Substituting in the form ula for r gives
r  0.897.
© The McGraw-Hill Companies, Inc., 2000
11-16
11-3 The Significance of the
Correlation Coefficient

The population correlation
coefficient, , is the correlation
between all possible pairs of
data values (x, y) taken from a
population.
© The McGraw-Hill Companies, Inc., 2000
11-17
11-3 The Significance of the
Correlation Coefficient


H0: = 0 H1:  0
This tests for a significant
correlation between the variables
in the population.
© The McGraw-Hill Companies, Inc., 2000
11-18
11-3 Formula for the t-tests for the
Correlation Coefficient
n2
t
1 r
with d .f .  n  2
2
© The McGraw-Hill Companies, Inc., 2000
11-19
11-3 Example



Test the significance of the correlation
coefficient for the age and blood
pressure data. Use  = 0.05 and
r = 0.897.
Step 1: State the hypotheses.
H0:  = 0 H1:  0
© The McGraw-Hill Companies, Inc., 2000
11-20
11-3 Example


Step 2: Find the critical values. Since
 = 0.05 and there are 6 – 2 = 4 degrees
of freedom, the critical values are
t = +2.776 and t = –2.776.
Step 3: Compute the test value.
t = 4.059 (verify).
© The McGraw-Hill Companies, Inc., 2000
11-21
11-3 Example


Step 4: Make the decision. Reject the
null hypothesis, since the test value
falls in the critical region (4.059 > 2.776).
Step 5: Summarize the results. There is
a significant relationship between the
variables of age and blood pressure.
© The McGraw-Hill Companies, Inc., 2000
11-22
11-4 Regression




The scatter plot for the age and blood
pressure data displays a linear pattern.
We can model this relationship with a
straight line.
This regression line is called the line of
best fit or the regression line.
The equation of the line is y  = a + bx.
© The McGraw-Hill Companies, Inc., 2000
11-23
11-4 Formulas for the Regression
Line y  = a + bx.
 y x    x xy

a

n x    x 
n xy   x  y 
b
n x    x 
2
2
2
2
2
Where a is the y  intercept and b is
the slope of the line.
© The McGraw-Hill Companies, Inc., 2000
11-24
11-4 Example




Find the equation of the regression line
for the age and the blood pressure data.
Substituting into the formulas give
a = 81.048 and b = 0.964 (verify).
Hence, y  = 81.048 + 0.964x.
Note, a represents the intercept and b
the slope of the line.
© The McGraw-Hill Companies, Inc., 2000
11-4 Example
150
150
Pressure
Pressure
11-25
140
140
y  = 81.048 + 0.964x
130
130
120
120
40
40
50
50
Age
Age
60
60
70
70
© The McGraw-Hill Companies, Inc., 2000
11-26
11-4 Using the Regression Line to
Predict


The regression line can be used to
predict a value for the dependent
variable (y) for a given value of the
independent variable (x).
Caution: Use x values within the
experimental region when
predicting y values.
© The McGraw-Hill Companies, Inc., 2000
11-27
11-4 Example



Use the equation of the regression line
to predict the blood pressure for a
person who is 50 years old.
Since y  = 81.048 + 0.964x, then
y  = 81.048 + 0.964(50) = 129.248 129.2
Note that the value of 50 is within the
range of x values.
© The McGraw-Hill Companies, Inc., 2000
11-28
11-5 Coefficient of Determination
and Standard Error of Estimate

The coefficient of determination,
denoted by r2, is a measure of
the variation of the dependent
variable that is explained by the
regression line and the
independent variable.
© The McGraw-Hill Companies, Inc., 2000
11-29
11-5 Coefficient of Determination
and Standard Error of Estimate



r2 is the square of the correlation
coefficient.
The coefficient of
nondetermination is (1 – r2).
Example: If r = 0.90, then
r2 = 0.81.
© The McGraw-Hill Companies, Inc., 2000
11-30
11-5 Coefficient of Determination
and Standard Error of Estimate


The standard error of estimate,
denoted by sest, is the standard
deviation of the observed y
values about the predicted y 
values.
The formula is given on the next
slide.
© The McGraw-Hill Companies, Inc., 2000
11-31
11-5 Formula for the Standard
Error of Estimate
 y  y
s 
n2
or


2
est
 y  a  y  b xy
s 
n2
2
est
© The McGraw-Hill Companies, Inc., 2000
11-32
11-5 Standard Error of Estimate Example



From the regression equation,
y  = 55.57 + 8.13x and n = 6, find sest.
Here, a = 55.57, b = 8.13, and n = 6.
Substituting into the formula gives
sest = 6.48 (verify).
© The McGraw-Hill Companies, Inc., 2000