Introducing Hypothesis Tests

Download Report

Transcript Introducing Hypothesis Tests

STAT 101
Dr. Kari Lock Morgan
10/4/12
Hypothesis Testing,
Synthesis
SECTION 4.5, Essential Synthesis B
• Connecting intervals and tests (4.5)
• Statistical versus practical significance (4.5)
• Multiple testing (4.5)
• Synthesis activities
Statistics: Unlocking the Power of Data
Lock5
Exam 1
• Exam 1: Thursday 10/11
•
•
Open only to a calculator and one double sided
page of notes prepared by you
Emphasis on conceptual understanding
Statistics: Unlocking the Power of Data
Lock5
Practice
• Last year’s midterm, with solutions, are
available on the course website (under
documents)
• Review problems are posted for you to work
through
• Doing problems is the key to success!!!
Statistics: Unlocking the Power of Data
Lock5
Keys to In-Class Exam Success
• Work lots of practice problems!
• Take last year’s exams under realistic
conditions (time yourself, do it all before
looking at the solutions, etc.)
• Prepare a good cheat sheet and use it when
working problems
• Read the corresponding sections in the book if
there are concepts you are still confused about
Statistics: Unlocking the Power of Data
Lock5
Office Hours Next Week
Monday
• Heather 4 – 6pm, Old Chem 211A
• Sam, 6 – 9pm, Old Chem 211A
• Tuesday
• Kari 1:30 – 2:30 pm, Old Chem 216
• Tracy 5 – 7 pm, Old Chem 211A
• Wednesday
• Kari 1 – 3pm, Old Chem 216
• Tracy 4:30 – 5:30 pm, Old Chem 211A
• Heather 8 – 9pm, Old Chem 211A
• Thursday
• Kari 1 – 2:30 pm, Old Chem 216
•
Statistics: Unlocking the Power of Data
Lock5
Clickers
Reminder: sharing clickers is a case of academic
dishonesty and will be treated as such.
If caught clicking in with two clickers, everyone
involved will
• receive a 0 for their entire clicker grade (10%
of the final grade)
• be reported to the dean to follow up
regarding academic dishonesty
Statistics: Unlocking the Power of Data
Lock5
Body Temperature
 We created a bootstrap distribution for average
body temperature by resampling with
replacement from the original sample (𝑥 =
92.26):
Statistics: Unlocking the Power of Data
Lock5
Body Temperature
 We also created a randomization distribution to see
if average body temperature differs from 98.6F by
adding 0.34 to every value to make the null true, and
then resampling with replacement from this
modified sample:
Statistics: Unlocking the Power of Data
Lock5
Body Temperature
 These two distributions are identical (up to
random variation from simulation to
simulation) except for the center
 The bootstrap distribution is centered around
the sample statistic, 98.26, while the
randomization distribution is centered around
the null hypothesized value, 98.6
 The randomization distribution is equivalent
to the bootstrap distribution, but shifted over
Statistics: Unlocking the Power of Data
Lock5
Body Temperature
Bootstrap
Distribution
98.26
98.6
Randomization
Distribution
H0:  = 98.6
Ha:  ≠ 98.6
Statistics: Unlocking the Power of Data
Lock5
Body Temperature
Bootstrap
Distribution
98.26
98.4
Randomization
Distribution
H0:  = 98.4
Ha:  ≠ 98.4
Statistics: Unlocking the Power of Data
Lock5
Intervals and Tests
If a 95% CI contains the parameter in H0,
then a two-tailed test should not reject H0
at a 5% significance level.
If a 95% CI misses the parameter in H0,
then a two-tailed test should reject H0
at a 5% significance level.
Statistics: Unlocking the Power of Data
Lock5
Intervals and Tests
 A confidence interval represents the range of
plausible values for the population parameter
 If the null hypothesized value IS NOT within
the CI, it is not a plausible value and should be
rejected
 If the null hypothesized value IS within the CI,
it is a plausible value and should not be
rejected
Statistics: Unlocking the Power of Data
Lock5
Body Temperatures
• Using bootstrapping, we found a 95%
confidence interval for the mean body
temperature to be (98.05, 98.47)
• This does not contain 98.6, so at α = 0.05 we
would reject H0 for the hypotheses
H0 :  = 98.6
Ha :  ≠ 98.6
Statistics: Unlocking the Power of Data
Lock5
Both Father and Mother
“Does a child need both a father and a mother to
grow up happily?”
•
Let p be the proportion of adults aged 18-29 in
2010 who say yes. A 95% CI for p is (0.487, 0.573).
•
Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, we
a) Reject H0
b) Do not reject H0
c) Reject Ha
d) Do not reject Ha
0.5 is within the
CI, so is a plausible
value for p.
http://www.pewsocialtrends.org/2011/03/09/formillennials-parenthood-trumps-marriage/#fn-7199-1
Statistics: Unlocking the Power of Data
Lock5
Both Father and Mother
“Does a child need both a father and a mother to
grow up happily?”
•
Let p be the proportion of adults aged 18-29 in
1997 who say yes. A 95% CI for p is (0.533, 0.607).
•
Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, we
a) Reject H0
b) Do not reject H0
c) Reject Ha
d) Do not reject Ha
0.5 is not within
the CI, so is not a
plausible value for p.
http://www.pewsocialtrends.org/2011/03/09/formillennials-parenthood-trumps-marriage/#fn-7199-1
Statistics: Unlocking the Power of Data
Lock5
Intervals and Tests
 Confidence intervals are most useful when you
want to estimate population parameters
 Hypothesis tests and p-values are most useful
when you want to test hypotheses about
population parameters
 Confidence intervals give you a range of
plausible values; p-values quantify the strength
of evidence against the null hypothesis
Statistics: Unlocking the Power of Data
Lock5
Interval, Test, or Neither?
Is the following question best assessed using a
confidence interval, a hypothesis test, or is
statistical inference not relevant?
On average, how much more do adults who played
sports in high school exercise than adults who did
not play sports in high school?
a) Confidence interval
b) Hypothesis test
c) Statistical inference not relevant
Statistics: Unlocking the Power of Data
Lock5
Interval, Test, or Neither?
Is the following question best assessed
using a confidence interval, a hypothesis
test, or is statistical inference not relevant?
Do a majority of adults riding a bicycle wear
a helmet?
a) Confidence interval
b) Hypothesis test
c) Statistical inference not relevant
Statistics: Unlocking the Power of Data
Lock5
Interval, Test, or Neither?
Is the following question best assessed using a
confidence interval, a hypothesis test, or is
statistical inference not relevant?
On average, were the 23 players on the 2010
Canadian Olympic hockey team older than the 23
players on the 2010 US Olympic hockey team?
a) Confidence interval
b) Hypothesis test
c) Statistical inference not relevant
Statistics: Unlocking the Power of Data
Lock5
Statistical vs Practical Significance
• With small sample sizes, even large
differences or effects may not be significant
• With large sample sizes, even a very small
difference or effect can be significant
• A statistically significant result is not always
practically significant, especially with large
sample sizes
Statistics: Unlocking the Power of Data
Lock5
Statistical vs Practical Significance
• Example: Suppose a weight loss program
recruits 10,000 people for a randomized
experiment.
• A difference in average weight loss of only 0.5
lbs could be found to be statistically significant
• Suppose the experiment lasted for a year. Is a
loss of ½ a pound practically significant?
Statistics: Unlocking the Power of Data
Lock5
Diet and Sex of Baby
•Are certain foods in your diet associated with
whether or not you conceive a boy or a girl?
•To study this, researchers asked women about
their eating habits, including asking whether or
not they ate 133 different foods regularly
•A significant difference was found for breakfast
cereal (mothers of boys eat more), prompting
the headline “Breakfast Cereal Boosts
Chances of Conceiving Boys”.
http://www.newscientist.com/article/dn13754-breakfast-cereals-boost-chances-of-conceiving-boys.html
Statistics: Unlocking the Power of Data
Lock5
“Breakfast Cereal Boosts Chances
of Conceiving Boys”
I’m pregnant (with identical twins!), and am very
curious about whether I’m going to have boys or
girls!
I eat breakfast cereal every morning. Do you
think this boosts my chances of having boys?
a) yes
b) no
c) impossible to tell
Statistics: Unlocking the Power of Data
Lock5
Hypothesis Tests
For each of the 133 foods studied, a hypothesis test
was conducted for a difference between mothers
who conceived boys and girls in the proportion
who consume each food
 State the null and alternative hypotheses
 If there are NO differences (all null hypotheses
are true), about how many significant differences
would be found using α = 0.05?
 A significant difference was found for breakfast
cereal (mothers of boys eat more), prompting the
headline “Breakfast Cereal Boosts Chances of
Conceiving Boys”. How might you explain this?
Statistics: Unlocking the Power of Data
Lock5
Hypothesis Tests
 State the null and alternative hypotheses
pb: proportion of mothers who have boys that consume the food regularly
pg: proportion of mothers who have girls that consume the food regularly
H0: pb = pg
Ha: pb ≠ pg
 If there are NO differences (all null hypotheses are true),
about how many significant differences would be found
using α = 0.05?
133  0.05 = 6.65
 A significant difference was found for breakfast cereal
(mothers of boys eat more), prompting the headline
“Breakfast Cereal Boosts Chances of Conceiving Boys”. How
might you explain this?
Random chance; several tests (about 6 or 7) are going
to be significant, even if no differences exist
Statistics: Unlocking the Power of Data
Lock5
Multiple Testing
When multiple hypothesis tests are
conducted, the chance that at least one test
incorrectly rejects a true null hypothesis
increases with the number of tests.
If the null hypotheses are all true, α of the
tests will yield statistically significant
results just by random chance.
Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
www.causeweb.org
Author: JB Landers
Lock5
Multiple Comparisons
• Consider a topic that is being
investigated by research teams all over
the world
 Using α = 0.05, 5% of teams are going
to find something significant, even if the
null hypothesis is true
Statistics: Unlocking the Power of Data
Lock5
Multiple Comparisons
•Consider a research team/company
doing many hypothesis tests
 Using α = 0.05, 5% of tests are going
to be significant, even if the null
hypotheses are all true
Statistics: Unlocking the Power of Data
Lock5
Multiple Comparisons
• This is a serious problem
• The most important thing is to be aware of this
issue, and not to trust claims that are obviously
one of many tests (unless they specifically
mention an adjustment for multiple testing)
•There are ways to account for this (e.g.
Bonferroni’s Correction), but these are beyond
the scope of this class
Statistics: Unlocking the Power of Data
Lock5
Publication Bias
• publication bias refers to the fact that
usually only the significant results get
published
• The one study that turns out significant gets
published, and no one knows about all the
insignificant results
• This combined with the problem of multiple
comparisons, can yield very misleading results
Statistics: Unlocking the Power of Data
Lock5
Jelly Beans Cause Acne!
http://xkcd.com/882/
Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
Statistics: Unlocking the Power of Data
Lock5
http://xkcd.com/882/
Statistics: Unlocking the Power of Data
Lock5
Summary
 If a null hypothesized value lies inside a 95% CI, a
two-tailed test using α = 0.05 would not reject H0
 If a null hypothesized value lies outside a 95% CI,
a two-tailed test using α = 0.05 would reject H0
 Statistical significance is not always the same as
practical significance
 Using α = 0.05, 5% of all hypothesis tests will lead
to rejecting the null, even if all the null
hypotheses are true
Statistics: Unlocking the Power of Data
Lock5
Synthesis
 You’ve now learned how to successfully collect
and analyze data to answer a question!
 Let’s put that to use…
Statistics: Unlocking the Power of Data
Lock5
Exercise and Pulse
 Does just 5 seconds of exercise increase pulse rate?
What are the cases and variables? Are they categorical or
quantitative? Identify explanatory and response.
 Does the question imply causality? How would you collect data
to answer it?
 Merge with 3 other groups to collect data. (check pulse rate)
 Visualize and summarize your data. Before doing any formal
inference, take a guess at answering the question.
 Conduct a hypothesis test to answer the question. State your
hypotheses, calculate the p-value, make a conclusion in context.
 How much does 5 seconds of exercise increase pulse rate by?
State the parameter of interest and give and interpret a
confidence interval.

Statistics: Unlocking the Power of Data
Lock5
Tongue Curling
 What proportion of people can roll
their tongue?
Can you roll your tongue? (a) Yes (b) No
 Visualize and summarize the data. What is your
point estimate?
 Give and interpret a confidence interval.
 Tongue rolling has been said to be a dominant
trait, in which case theoretically 75% of all people
should be able to roll their tongues. Do our data
provide evidence otherwise?

Statistics: Unlocking the Power of Data
Lock5
Tuesday
 Tuesday’s class with be a review session
 There will be no clicker questions and no new
material, so attendance is optional
 I’ll spend the first half reviewing the key
topics we’ve covered so far, and then will have
open Q and A
Statistics: Unlocking the Power of Data
Lock5
To Do
 Read Essential Synthesis A, B
 Prepare for Exam 1 (Thursday, 10/11)
 Study
 Make
 Do
page of notes for Exam 1
review problems
 Take
practice exam
 Solutions
under documents on course webpage
Statistics: Unlocking the Power of Data
Lock5