Linear Regression

Download Report

Transcript Linear Regression

Linear Regression
Chapter 8
Linear Regression
AP Statistics โ€“ Chapter 8
๐‘ฆ = ๐‘0 + ๐‘1 ๐‘ฅ
We are predicting
the y-values, thus
the โ€œhatโ€ over the โ€œyโ€.
We use actual
values for โ€œxโ€โ€ฆ
so no hat here.
slope
y-intercept
Is a linear model appropriate?
Check 2 things:
โ€ข Is the scatterplot fairly
linear?
pattern
โ€ข Is there a
in the plot of the
residuals?
Residuals
(difference between observed value and predicted value)
Believe it or not, our
โ€œbest fit lineโ€ will actually
MISS most of the points.
Residual:
Observed y โ€“ Predicted y
e=๐‘ฆโˆ’๐‘ฆ
Every point has
a residual...
and if we plot them all, we
have a residual
plot.
We do NOT
want a pattern in
the residual plot!
This residual plot has
no distinct patternโ€ฆ
so it looks like a linear model
is appropriate.
Does a linear model seem
appropriate?
OOPS!!!
Although the scatterplot is
fairly linearโ€ฆ the
residual plot has a
clear curved pattern.
A linear model is NOT
appropriate here.
Is a linear model appropriate?
A residual plot that has no distinct pattern is
an indication that a linear model might be appropriate.
Residuals
Residuals
x
Linear
x
Not linear
Note about
residual plots
residuals vs. ๐’™
and
residuals vs. ๐’š
will look the same
but donโ€™t plot
residuals vs. ๐’š
(that will look different)
Least Squares Regression Line
Consider the following 4 points:
(1, 3) (3, 5) (5, 3) (7, 7)
How do we find the best fit line?
Least Squares Regression Line
is the line (model) which
minimizes the sum of the
squared residuals.
Facts about
LSRL
โ€ข sum of all residuals is zero
(some are positive, some negative)
โ€ข sum of all squared residuals is
the lowest possible value (but not 0).
(since we square them, they are all positive)
โ€ข goes through the point (๐‘ฅ, ๐‘ฆ)
Regression line always contains (x-bar, y-bar)
๐‘ฅ
๐‘ฆ
๐‘ ๐‘ฆ
slope = ๐‘Ÿ
๐‘ ๐‘ฅ
Regression Wisdom
Chapter 9
Another look at
height vs. age:
(this is cm vs months!)
โ„Ž๐‘’๐‘–๐‘”โ„Ž๐‘ก = 64.93 + 0.635 โˆ— ๐‘Ž๐‘”๐‘’
What does the model predict about the height of a
180-month (15-year) old person?
โ„Ž๐‘’๐‘–๐‘”โ„Ž๐‘ก = 64.93 + 0.635(180)
โ„Ž๐‘’๐‘–๐‘”โ„Ž๐‘ก = 179.23 cmโ€ฆ or about 70.56 inches!
(thatโ€™s 6 feet, 8 inches!)
THATโ€™S A TALL 15-YEAR OLD!!!
โ€ฆwhat about a 40year old humanโ€ฆ
โ„Ž๐‘’๐‘–๐‘”โ„Ž๐‘ก = 64.93 + 0.635 โˆ— ๐‘Ž๐‘”๐‘’
โ„Ž๐‘’๐‘–๐‘”โ„Ž๐‘ก = 64.93 + 0.635(480)
โ„Ž๐‘’๐‘–๐‘”โ„Ž๐‘ก = 369.73 cmโ€ฆ or 145.56 inches!
(thatโ€™s 12 feet, 1.56 inches!)
Extrapolation
(going beyond the useful ends of our mathematical model)
Whenever we go beyond
the ends of our data
(specifically the x-values), we
are extrapolating.
Extrapolation leads us to results
that may be unreliable.
Outliersโ€ฆ
Leverageโ€ฆ
Influential
pointsโ€ฆ
Outliers, leverage, and influence
๏ฎ
If a pointโ€™s x-value is far from the
mean of the x-values, it is said to
have high leverage.
(it has the potential to change the
regression line significantly)
๏ฎ
A point is considered influential
if omitting it gives a very
different model.
Outlier or Influential point? (or neither?)
Outlier:
Low leverage
- Weakens โ€œrโ€
WITH
โ€œoutlierโ€
(model does not
change drastically)
WITHOUT
โ€œoutlierโ€
Outlier or Influential point? (or neither?)
Influential
Point:
- HIGH
leverage
- Weakens โ€œrโ€
WITH
โ€œoutlierโ€
(slope changes drastically!)
WITHOUT
โ€œoutlierโ€
Outlier or Influential point? (or neither?)
- HIGH leverage
- STRENGTHENS โ€œrโ€
Linear model
WITH and
WITHOUT
โ€œoutlierโ€
fin~