Powerpoint 3.2 C
Download
Report
Transcript Powerpoint 3.2 C
AP Statistics Section 3.2 C
Coefficient of Determination
A residual plot is a graphical tool for
evaluating how well a linear model fits
the data. The numerical quantity that
tells us how well the least-squares line
(LSL) does at predicting values of the
response variable y is called the
__________________________
coefficient of determination
2
The symbol is ____.
r Some computer
packages call it “_____”.
R-sq
We have seen instances where the
least-squares regression line does not
fit the data, and therefore, does not
help predict the values of the
response variable, y, as x changes. In
such cases, our “best guess” for the
value of y at any given value of x is
simply ___,
the mean of the y values.
y _____________________
2
The idea of r is this: How much
better is the LSL at predictions then
if we just used y as our prediction
each time?
Once again we consider the NEA vs Fat Gain
example from section 3.2 A. The LSL and the
y lines have been drawn in the residual plot to
the right. We would like to know which line
comes closer to the actual y-values?
We know that the LSL minimizes the sum
of the squared residuals.
For this data:
residual
2
( y yˆ ) 7 . 663
2
We will call this ____,
SSE for sum of squared
errors.
If we use y y to make predictions, then our
prediction errors would be the vertical distances
of the points away from the horizontal line.
For this data: ( y y ) 2 _________
19 . 4575 We will call
this _____,
SST for sum of squared total variation.
The difference SST-SSE (in this case
________
11
. 7545 ) shows how much the
LSL reduces the total variation in
the responses y.
We define the coefficient of
determination, r2, as the fraction of
the variation in the values of y that is
explained by the least-squares
regression line. We can calculate r2 as
follows:
r
2
SST - SSE
SST
For the NEA vs Fat Gain data:
. 606
We have already seen how to
2
calculate r on our calculators (i.e.
the same way we found r). Find r2
on your calculator for the NEA vs
Fat Gain data.
. 606
A lot of factors, such as metabolism for example,
affect the variation in the y-values. We can say
_______
60 . 6 % of the variation in fat gain is explained
by the least-squares regression line relating fat
gain and non-exercise activity. The other 39% is
individual variation among the subjects that is
not explained by the linear relationship.
Facts about Least-Squares
Regression
The distinction between
explanatory and response variables
is essential in regression. This
means we cannot reverse the roles
of the two variables to make
predictions. Be sure you know
which variable is the explanatory.
There is a close connection
between correlation and the slope
of the least-squares line. We know
S
b
.
This
equation
says
that
S
along the regression line, a change
in one standard deviation in x
corresponds to a change of r
standard deviations in y.
y
x
The least-squares regression line of
y on x always passes through the
point ( __,
x __
y ).
The correlation r describes the
strength of a straight-line relationship.
In the regression setting, the square of
the correlation, r2, is the fraction of
the variation in the values of y that is
explained by the least-squares
regression of y on x.