Transcript Chapter 11
11-1
Chapter 11
Correlation and
Regression
© The McGraw-Hill Companies, Inc., 2000
11-2
Outline
11-1 Introduction
11-2 Scatter Plots
11-3 Correlation
11-4 Regression
© The McGraw-Hill Companies, Inc., 2000
11-3
Outline
11-5 Coefficient of
Determination and
Standard Error of Estimate
© The McGraw-Hill Companies, Inc., 2000
11-4
Objectives
Draw a scatter plot for a set of
ordered pairs.
Find the correlation coefficient.
Test the hypothesis H0: = 0.
Find the equation of the
regression line.
© The McGraw-Hill Companies, Inc., 2000
11-5
Objectives
Find the coefficient of
determination.
Find the standard error of
estimate.
© The McGraw-Hill Companies, Inc., 2000
11-6
11-2 Scatter Plots
A scatter plot is a graph of the
ordered pairs (x, y) of numbers
consisting of the independent
variable, x, and the dependent
variable, y.
© The McGraw-Hill Companies, Inc., 2000
11-7
11-2 Scatter Plots - Example
Construct a scatter plot for the data
obtained in a study of age and systolic
blood pressure of six randomly selected
subjects.
The data is given on the next slide.
© The McGraw-Hill Companies, Inc., 2000
11-8
11-2 Scatter Plots - Example
Subject
Age, x
Pressure, y
A
43
128
B
48
120
C
56
135
D
61
143
E
67
141
F
70
152
© The McGraw-Hill Companies, Inc., 2000
11-2 Scatter Plots - Example
Positive Relationship
Pressure
Pressure
11-9
150
150
140
140
130
130
120
120
40
40
50
50
Age
Age
60
60
70
70
© The McGraw-Hill Companies, Inc., 2000
11-2 Scatter Plots - Other Examples
Negative Relationship
90
90
Finalgrade
grade
Final
11-10
80
80
70
70
60
60
50
50
40
40
55
10
10
Number
Numberofofabsences
absences
15
15
© The McGraw-Hill Companies, Inc., 2000
11-2 Scatter Plots - Other Examples
No Relationship
10
10
y
Y
11-11
55
00
00
10
10 20
20 30
30 40
40 50
50 60
60 70
70
xX
© The McGraw-Hill Companies, Inc., 2000
11-12
11-3 Correlation Coefficient
The correlation coefficient
computed from the sample data
measures the strength and direction
of a relationship between two
variables.
Sample correlation coefficient, r.
Population correlation coefficient,
© The McGraw-Hill Companies, Inc., 2000
11-13
11-3 Range of Values for the
Correlation Coefficient
Strong negative
relationship
No linear
relationship
Strong positive
relationship
© The McGraw-Hill Companies, Inc., 2000
11-14
11-3 Formula for the Correlation
Coefficient r
r
n xy x y
n x x n y y
2
2
2
2
Where n is the number of data pairs
© The McGraw-Hill Companies, Inc., 2000
11-15
11-3 Correlation Coefficient Example (Verify)
Compute the correlation coefficient
for the age and blood pressure data.
x 345, y = 819, xy = 47 634
2
2
x
20
399
,
y
112 443.
Substituting in the form ula for r gives
r 0.897.
© The McGraw-Hill Companies, Inc., 2000
11-16
11-3 The Significance of the
Correlation Coefficient
The population correlation
coefficient, , is the correlation
between all possible pairs of
data values (x, y) taken from a
population.
© The McGraw-Hill Companies, Inc., 2000
11-17
11-3 The Significance of the
Correlation Coefficient
H0: = 0 H1: 0
This tests for a significant
correlation between the variables
in the population.
© The McGraw-Hill Companies, Inc., 2000
11-18
11-3 Formula for the t-tests for the
Correlation Coefficient
n2
t
1 r
with d .f . n 2
2
© The McGraw-Hill Companies, Inc., 2000
11-19
11-3 Example
Test the significance of the correlation
coefficient for the age and blood
pressure data. Use = 0.05 and
r = 0.897.
Step 1: State the hypotheses.
H0: = 0 H1: 0
© The McGraw-Hill Companies, Inc., 2000
11-20
11-3 Example
Step 2: Find the critical values. Since
= 0.05 and there are 6 – 2 = 4 degrees
of freedom, the critical values are
t = +2.776 and t = –2.776.
Step 3: Compute the test value.
t = 4.059 (verify).
© The McGraw-Hill Companies, Inc., 2000
11-21
11-3 Example
Step 4: Make the decision. Reject the
null hypothesis, since the test value
falls in the critical region (4.059 > 2.776).
Step 5: Summarize the results. There is
a significant relationship between the
variables of age and blood pressure.
© The McGraw-Hill Companies, Inc., 2000
11-22
11-4 Regression
The scatter plot for the age and blood
pressure data displays a linear pattern.
We can model this relationship with a
straight line.
This regression line is called the line of
best fit or the regression line.
The equation of the line is y = a + bx.
© The McGraw-Hill Companies, Inc., 2000
11-23
11-4 Formulas for the Regression
Line y = a + bx.
y x x xy
a
n x x
n xy x y
b
n x x
2
2
2
2
2
Where a is the y intercept and b is
the slope of the line.
© The McGraw-Hill Companies, Inc., 2000
11-24
11-4 Example
Find the equation of the regression line
for the age and the blood pressure data.
Substituting into the formulas give
a = 81.048 and b = 0.964 (verify).
Hence, y = 81.048 + 0.964x.
Note, a represents the intercept and b
the slope of the line.
© The McGraw-Hill Companies, Inc., 2000
11-4 Example
150
150
Pressure
Pressure
11-25
140
140
y = 81.048 + 0.964x
130
130
120
120
40
40
50
50
Age
Age
60
60
70
70
© The McGraw-Hill Companies, Inc., 2000
11-26
11-4 Using the Regression Line to
Predict
The regression line can be used to
predict a value for the dependent
variable (y) for a given value of the
independent variable (x).
Caution: Use x values within the
experimental region when
predicting y values.
© The McGraw-Hill Companies, Inc., 2000
11-27
11-4 Example
Use the equation of the regression line
to predict the blood pressure for a
person who is 50 years old.
Since y = 81.048 + 0.964x, then
y = 81.048 + 0.964(50) = 129.248 129.2
Note that the value of 50 is within the
range of x values.
© The McGraw-Hill Companies, Inc., 2000
11-28
11-5 Coefficient of Determination
and Standard Error of Estimate
The coefficient of determination,
denoted by r2, is a measure of
the variation of the dependent
variable that is explained by the
regression line and the
independent variable.
© The McGraw-Hill Companies, Inc., 2000
11-29
11-5 Coefficient of Determination
and Standard Error of Estimate
r2 is the square of the correlation
coefficient.
The coefficient of
nondetermination is (1 – r2).
Example: If r = 0.90, then
r2 = 0.81.
© The McGraw-Hill Companies, Inc., 2000
11-30
11-5 Coefficient of Determination
and Standard Error of Estimate
The standard error of estimate,
denoted by sest, is the standard
deviation of the observed y
values about the predicted y
values.
The formula is given on the next
slide.
© The McGraw-Hill Companies, Inc., 2000
11-31
11-5 Formula for the Standard
Error of Estimate
y y
s
n2
or
2
est
y a y b xy
s
n2
2
est
© The McGraw-Hill Companies, Inc., 2000
11-32
11-5 Standard Error of Estimate Example
From the regression equation,
y = 55.57 + 8.13x and n = 6, find sest.
Here, a = 55.57, b = 8.13, and n = 6.
Substituting into the formula gives
sest = 6.48 (verify).
© The McGraw-Hill Companies, Inc., 2000