Ch 9 PP - Lyndhurst Schools

Transcript Ch 9 PP - Lyndhurst Schools

Chapter 9
Estimating ABILITY with Confidence
Intervals
Objectives
Students will be able to:
1) Construct confidence intervals to estimate a proportion or
a mean
2) Construct confidence intervals to estimate a difference
between two proportions or a difference between two
means
3) Interpret confidence intervals in the context of the data
4) Calculate the margin of error for a confidence interval
5) Use technology to calculate confidence intervals
• In the 2013 NFL regular season, quarterback
Matt Ryan completed 439 out of 651 pass
attempts, for a completion percentage of
67.4%.
• Would you be confident in making a claim that
in the 2013 NFL season, Matt Ryan had an
ABILITY to complete a pass of exactly 67.4%?
Why or why not? Discuss using previous
concepts learned in this course.
• Would you be more or less confident if you
made the claim that Ryan’s ABILITY to
complete a pass fell between two values, say
between 55% and 76%?
• In this chapter, we’re going to look at how we
can use an athlete’s PERFORMANCES to
estimate their ABILITY by creating a interval of
values that their ABILITY will be between.
• Some examples:
– In 2010, Josh Hamilton’s ABILITY to get a hit was
somewhere between 0.317 and 0.401.
– NFL teams have the ABILITY to score somewhere
between 2.1 and 4.5 more points at home than on
the road.
The Idea of a Confidence Interval
• Remember that the law of large numbers says
it is impossible to know an athlete’s ABILITY
exactly unless we could observe an infinite
number of PERFORMANCES in the same
context.
• Instead of claiming to know an athlete’s exact
ABILITY, we can provide an interval of values
that their ABILITY is between.
• The interval of plausible values for an athlete’s
ABILITY or an interval of plausible values for
the difference in an athlete’s ABILITY in two
different contexts is known as a confidence
interval.
• We use an interval of plausible values rather
than a single value to increase our chances of
arriving at the correct estimate of an athlete’s
ABILITY.
• Obviously we would be pretty confident in this
interval. However, it doesn’t really tell us
anything about the weather.
• We don’t know if it will be hot or cold.
• Similarly, we wouldn’t want to create a confidence
interval saying something like LeBron’s ABILITY to
make a three-pointer is between 0% and 100%.
This really doesn’t tell us much about what
LeBron can do on a basketball court.
• Later on we will learn about how to calculate a
confidence interval.
• Right now, let’s concentrate on what information
a confidence interval provides.
Interpreting Confidence Intervals
• A confidence interval is constructed so that we
know how much confidence we should have in
the interval.
• All of the intervals we construct in this course will
utilize a 95% confidence level. Meaning, if we
were to calculate intervals for lots and lots of
athletes using our methods, about 95% of the
intervals will succeed in containing the ABILITY of
the athlete for whom the interval was calculated.
• In 2010, Josh Hamilton had a batting average
of 0.359 (186 hits in 518 at-bats).
• Since his average is based on only 518 at-bats
(not an infinite amount), it is very unlikely that
his ABILITY to get a hit was 0.359.
• We would probably have close to 0%
confidence that his ABILITY to get a hit would
be exactly 0.359 if Hamilton had millions and
millions of at-bats under the same conditions.
• We can calculate a 95% confidence interval of
Hamilton’s ABILITY to get a hit.
• Using his 2010 PERFORMANCES, the interval is
0.317 to 0.401.
• This means that in millions and millions of atbats under the same conditions, we would
expect his batting average to end up being
between 0.317 and 0.401.
• In short, we can say that we are 95% confident
that the interval of plausible values from
0.317 to 0.401 contains Hamilton’s ABILITY to
get a hit in 2010.
• The generic statement you can use to
interpret a confidence interval for an athlete’s
ABILITY is:
We are 95% confident that the interval of
plausible values from
to
includes
‘s ABILITY to
.
• We can also use confidence intervals to
estimate the difference in an athlete’s ABILITY
in two different contexts.
• Example: In 2010, Hamilton hit 0.271 when
facing lefties, as opposed to 0.401 when facing
righties. He PERFORMED 0.130 batting
average points better when facing righthanded pitchers.
• Again, since this is based off of 518 total atbats, it’s unlikely that this is the exact
difference in his ABILITY.
• A 95% confidence interval for his difference in
ABILITY is from 0.043 to 0.217.
• Interpretation: We are 95% confident that the
interval of plausible values from 0.043 to 0.217
contains the true difference in Hamilton’s ABILITY
to get a hit against right-handed pitchers and his
ABILITY to get a hit against left-handed pitchers.
• Generic statement:
We are 95% confident that the interval of plausible
values from
to
includes the
difference in
‘s ABILITY to
in context 1 and
context 2.
Using Confidence Intervals to Make Decisions
• Do Josh Hamilton’s PERFORMANCES in 2010
provide convincing evidence that he has a
greater ABILITY to get a hit against righthanded pitchers than against left-handed
pitchers? What would our hypotheses be for
this question?
• Remember, his average against lefties was
.271 and his average against righties was .401.
The test-statistic will be the difference in
batting averages (0.130).
• The p-value for this test is approximately 0.
What does that tell us?
– Reject the null, and we have convincing
evidence that Hamilton was a better hitter
against right-handed pitchers.
• We can also use confidence intervals to
address the hypotheses.
• The null hypothesis says that Hamilton has the
same ABILITY to get a hit vs righties and
lefties. Therefore, if this null is correct, what
should be the true difference in his ABILITY
(righty – lefty)?
• 0. If there is no difference in his ABILITY, then
when you take the difference, you should get
0.
• If the alternative hypothesis is correct, then
the true difference (righty – lefty) should be
greater than 0.
• The confidence interval for Hamilton’s
difference in ABILITY was between 0.043 and
0.217.
• This entire interval is positive. 0 is not a part
of the interval. Every value in the interval
suggests that his ABILITY to get a hit was
higher against righties than lefties.
• The hypothesis test and the confidence
interval gave us the same conclusion.
However, the confidence interval gives us
more information. It tells us just how better
he was against righties (between .043 and
.217 batting average points).
• Was Hamilton a better hitter at home than
on the road?
• His home PERFORMANCE was .390.
• His road PERFORMANCE was .327.
• A 95% confidence interval for the true
difference in his ABILITY to get a hit at
home and his ABILITY to get a hit on the
road is -0.021 to .063.
• This interval includes the value 0, so it is
possible there is no difference in his
ABILITY.
The Structure of Confidence Intervals
• Confidence intervals contain two components:
– 1) A single-value estimate (a single number that
represents our best guess for an athlete’s ABILITY)
– 2) A margin of error (value that is added and
subtracted from the single-value estimate)
• A confidence interval is:
• A 95% confidence interval for LeBron James’s
ABILITY to make a three-point shot in the 20072008 regular season is:
• This means that the interval goes from 0.266 to
0.364.
• Interpretation: We are 95% confident that the
interval of plausible values from 0.266 to 0.364
contains LeBron’s ABILITY to make a three-point
shot in the 2007-2008 regular season.
• The single-value estimate of 0.315 is LeBron’s
observed three-point shooting percentage in
the 2007-2008 regular season (his
PERFORMANCE during the season).
• His observed PERFORMANCE is certainly our
best guess for his true ABILITY. However, it’s
likely incorrect due to RANDOM CHANCE
being involved.
• To compensate for RANDOM CHANCE, we
include the margin of error.
• Keep in mind we are making a 95% confidence
interval. Let’s think about a Normal
distribution for a minute. What do we know
about 95% of the data in a Normal
distribution?
– 95% of the data is within two standard deviations
of the mean.
• Therefore, to calculate the margin of error, we
need to estimate the standard deviation, and
multiply that value by 2.
• For our purposes, we’ll use a formula to
estimate the standard deviation.
• However, the formula for standard deviation
changes based on the quantity we are trying
to estimate.
– For example, the formula to estimate the standard
deviation for LeBron’s ABILITY to make a threepointer is different from the formula for
estimating the difference in ABILITY to make a
three-point shot when playing at home and when
playing on the road.
Calculating a Confidence Interval for a
Proportion
• Let’s say we want to estimate LeBron’s ABILITY
to make a three-point shot. We would have to
try and estimate the proportion of three-point
shots that he would make if he could keep
shooting three-pointers indefinitely under the
same conditions as in the 2007-2008 regular
season.
Notation and Formula
• In the 2007-2008 regular season, James made
113 out of 359 three-point shots.
• Therefore, we now know our variables.
• From the previous slide, we can see our
confidence interval goes from 0.266 to 0.364.
• Thus, we are 95% confident that the interval
of plausible values from 0.266 to 0.364
contains LeBron’s ABILITY to make a threepointer during the 2007-2008 season.
• Keep in mind that this formula will only work
when we are estimating ABILITY with a single
proportion (think categorical variables: Ch 2).
• It will not work for estimating ABILITY that is
measured by a mean or even a difference in
proportions.
• Also, for the formula you must use the
proportion (decimal) equivalent for the
PERFORMANCE; not the percent equivalent.
• Note: there must be at least 15 successes and
15 failures to use this interval.
• Since this is so much fun, let’s try some more!
In the first two games of the 2009-2010 regular season,
Kobe Bryant had made 17 of 45 shot attempts. Calculate a
95% confidence interval for Kobe’s ABILITY to make a shot in
the 2009-2010 regular season.
• We are 95% confident that the interval of
plausible values from 0.233 to 0.523 contains
Kobe’s ABILITY to make a shot during the
2009-2010 regular season.
• For this interval we only used Kobe’s
PERFORMANCES for two games. What do you
think would happen to the width of the
interval if we used Kobe’s PERFORMANCES
from the entire season? Would the interval
get wider or narrower?
For the entire season, Kobe made 716 of 1569 shot
attempts. Calculate a new 95% confidence interval.
• When using n=45, our interval went from
0.233 to 0.523.
• When using n=1569, our interval went from
0.431 to 0.481.
• The margin of error will be smaller when the
interval is calculated using a larger number of
observations.
• Increasing sample size decreases variability,
resulting in a more precise interval.
Calculating a Confidence Interval for a
Mean
• In the previous section, we used categorical
data to create a confidence interval.
• We can also use numerical data to create
confidence intervals for a player’s ABILITY.
• One of the most common numerical measures
of ABILITY in basketball is scoring average, or
mean points per game.
Notation and Formula
• Keep in mind that just as with calculating a
95% confidence interval for a proportion, we
want to have a large enough sample size in
order for our “95% confidence” to be
accurate.
• Generally speaking, a sample size of at least
30 is recommended, especially for a
distribution that is not Normal.
• Let’s create a confidence interval for LeBron’s
ABILITY to score points in 2007-2008 regular
season.
• During the regular season, he scored 2250
points in 75 games. That’s an average of 30.0
points per game. This is our best estimate at
his ABILITY.
• His observed standard deviation was 8.04
points. We now have all the information we
need.
• We are 95% confident that the interval of
plausible values from 28.14 points per game
to 31.86 points per game contains LeBron
James’s ABILITY to score points in the 20072008 regular season.
• Let’s try another.
Here are the rushing totals (in yards) for each
game of the 1985-1986 NFL season for running
back Walter Payton, of the Chicago Bears.
120 39 62 6
63 132 112 118
192 107 132 102 121 111 53 81
Calculate a 95% confidence interval for Payton’s
rushing ABILITY in the 1985-1986 regular
season.
We need to calculate our mean and standard
deviation.
120 39 62 6
63 132 112 118
192 107 132 102 121 111 53 81
Create a list to calculate the mean and st. dev.
• We are 95% confident that the interval of
plausible values from 74.6 to 119.2 yards
contains Payton’s rushing ABILITY in the 19851986 regular season.
Confidence Intervals for Difference of
Two Proportions
• We will create a confidence interval for a
difference in two proportions when 1) we are
using categorical data and 2) we are interested
in comparing an athlete’s ABILITY in two
different contexts (examples: home vs road;
day vs night; grass vs turf).
95% Confidence Interval Formula
Note: As with confidence intervals for a single
proportion, this formula will only work if we have
at least 15 successes and 15 failures in each
setting.
• In football, does “icing the kicker” really work?
• From 2000 to 2009, kickers in the NFL made 377
of 488 field goal attempts (77.3%) without a
timeout being called before the attempt (without
being iced). This compares to kickers making 157
of 197 field goal attempts (79.7%) after being
iced.
• Based on these numbers, it looks like icing the
kicker might actually be a bad strategy.
• Let’s construct a 95% confidence interval for
the true difference in the ABILITY of kickers to
make a field goal when they are iced and
when they are not.
• Let’s first determine our values.
• Instead of using “A” and “B”, I’ll use “I” for iced
and “N” for not iced.
• We are 95% confident that the interval of plausible
values from -0.045 to 0.093 includes the true
difference in the ABILITY of kickers to make a field
goal when iced and their ABILITY to make a field goal
when not iced.
• The negative value would indicate that kickers have a
lower ABILITY when iced. So this interval means the
true success rate for iced kickers could be lower by up
to 0.045 or higher by up to 0.093.
• You could have reversed the order of subtraction
when constructing the interval. If you did, the
interval would have went from -0.093 to 0.045, but
the interpretation would be the same.
• Let’s try creating another interval.
• In the 2009 regular season, first baseman and
left-handed hitter Ryan Howard of the
Philadelphia Phillies had 126 hits in 394 atbats when facing a right-handed pitcher and
only 46 hits in 222 at-bats when facing a lefthanded pitcher. Calculate a 95% confidence
interval for the difference in Howard’s ABILITY
to get a hit against a right-handed pitcher and
his ABILITY to get a hit against a left-handed
pitcher.
• Let’s determine our values. Remember,
Howard has 126 hits in 394 at-bats when
facing a right-handed pitcher and only 46 hits
in 222 at-bats when facing a left-handed
pitcher.
• We are 95% confident that the interval of
plausible values from 0.041 to 0.185 contains
the true difference in Howard’s ABILITY to get
a hit against right-handed pitchers and his
ABILITY to get a hit against left-handed
pitchers.
Confidence Intervals for the Difference
of Two Means
• We will create a confidence interval for the
difference in two means when 1) we are using
numerical data and 2) we are interested in
comparing an athlete’s ABILITY in two
different contexts (examples: home vs road;
day vs night; grass vs turf).
• The difference with confidence intervals for
two proportions is that they deal with
categorical data.
95% Confidence Interval Formula
• Note: We need both sample sizes to be large
enough for the 95% to be accurate. Samples
are considered large if they each have 30 or
more observations.
• In the modern NFL, the passing game rules. But
was this always the case? Let’s comparing using
data from 2009 and 1979.
• In 2009, the mean passing PERFORMANCE of the
32 NFL teams was 218.5 yards, with a standard
deviation of 44.1 yards.
• In 1979, the mean passing PERFORMANCE of the
28 NFL teams was 180.4 yards, with a standard
deviation of 35.7 yards.
• We are 95% confident that the interval of
plausible values from 17.5 to 58.7 contains the
true difference in passing ABILITY for NFL
teams in 2009 and 1979.
• Because all of the plausible values are greater
than 0, we have convincing evidence that
teams in 2009 have a greater ABILITY to pass
than did the teams in 1979.
• Let’s try another.
• Here are the points allowed by the New England
Patriots in their 2009 regular season games at
home and on the road:
Home: 24 10 21 0
17 14 10 7
Away: 16 20 7
35 38 22 10 34
• Calculate and interpret a 95% confidence
interval for the difference in the Patriots’
ABILITY to play defense at home and their
ABILITY to play defense on the road.
• Let’s get our numbers. Again, here is the
distribution:
Home: 24 10 21 0
17 14 10 7
Away: 16 20 7
35 38 22 10 34
• We are 95% confident that the interval of
plausible values from -19.869 to 0.119
includes the true difference in the Patriots’
ABILITY to play defense at home and their
ABILITY to play defense on the road.
Using Technology to Calculate
Confidence Intervals
• The TI-84 calculator can calculate confidence
intervals for us. Let’s look at how we can do
this.
• Note: Confidence intervals calculated with
technology will be slightly different from
confidence intervals calculated by hand due to
technology being more precise.
Confidence Interval for a Proportion
• Let’s use Josh Hamilton’s numbers in 2010 for our
example:
– 186 hits in 518 at-bats
1) Press STAT, scroll to TESTS, choose A: 1-PropZInt
2) Enter the number of successful PERFORMANCES for “x”,
the total number of attempts for “n”, and the desired
confidence level for “C-level”.
3) Press calculate
Confidence Interval for Mean
Let’s use Walter Payton’s numbers from a previous
example. We have to enter the data into a list.
120 39
62
6
63
132 112 118
192 107 132 102 121 111 53
81
1) Press STAT, scroll to TESTS, choose 8: Tinterval.
2) Inpt: Data, select the list, freq: 1, C-Level: .95 (if you
have the mean and st. dev., choose Stats instead of
Data). Then press calculate.
Confidence Interval for a Difference in Proportions
• Hamilton vs righties: 141 hits in 352 at-bats
• Hamilton vs lefties: 45 hits in 166 at-bats
1) Press STAT, scroll to TESTS, choose B: 2-PropZInt
2) Enter the first context data for x1 and n1, and the
second context data for x2 and n2.
3) Press calculate.
Confidence Interval for a Difference in Means
1)Enter data sets into two lists.
2) Press STAT, scroll to TESTS, choose 0: 2-SampTInt
3) Choose Data, select the lists, make sure both Freq
are 1, choose “No” for pooled data.
4) Press Calculate