Lecture 10: p-value functions and likelihoods

Download Report

Transcript Lecture 10: p-value functions and likelihoods

Lecture 9:
p-value functions and intro to
Bayesian thinking
Matthew Fox
Advanced Epidemiology
If you are a hypothesis tester,
which pvalue is more likely to
avoid a type I error
P = 0.049
P = 0.005
What is the p-value fallacy?
If you go to the doctor with a set
of symptoms, does the doctor
develop a hypothesis and test it?
Anyone heard of Bayesian
statistics?
After completing a study, would you rather
know the probability of the data given the
null hypothesis, or the probability of the null
hypothesis given the data?
Last Session

Randomization
–
–
–

P-values
–
–

Leads to average confounding, gives meaning to p-values
Provides a known distribution for all possible observed results
Observational data does not have average 0 confounding
Probability under the null that a test statistic would be greater
than or equal to its observed value, assuming no bias.
Not the probability of the data, the null, a significance level
Confidence intervals
–
–
Calculated assuming infinite repetitions of the data
Don’t give a probability of containing the true value
Today


The p-value fallacy
p-value functions
–

Shows p-values for the data at a range of hypotheses
Bayesian statistics
–
–
–
The difference between Frequentists and Bayesians
Bayesian Theory
How to apply Bayes Theory in Practice
P-value fallacy

P-value developed by Fisher as informal
measure of compatibility of data with null
–
–

Provides no guidance on significance
Should be interpreted in light of what we know
Hypothesis testing developed by Neyman,
Pearson to minimize errors in the long run
–

RA Fisher
Jerzy
Egon Pearson
(not Karl, his father) Neyman
P = 0.04 is no more evidence than p = 0.00001
Fallacy is that p-value can do both
Different goals

The goal of epidemiology:
–

The goal of policy:
–

To measure precisely and accurately the
effect of an exposure on a disease
To make decisions
Given our goal:
–
–
Why hypothesis testing?
Why compare to the null?
p-value functions (1)

Recall that a p-value is:
–

“The probability under the test hypothesis (usually
the null) that a test statistic would be ≥ to its
observed value, assuming no bias.”
We can calculate p-values under test
hypotheses other than the null
–
–
Particularly easy if we use a normal approximation
If we assume we can fix the margins with
observational data
p-value functions (1)
Exposed
Unexposed
Disease
6
3
No disease
14
17
Total
20
20
Risk
0.3
0.15
2.00 RR Observed
0.63 SE(ln(RR))
p-value functions (2a)
To calculate a test statistics (Z score)
for a p-value, usually given:
z
ln( RR )
SE ( RR )

ln( 2 . 0 )
0 . 63
p ( z  1 . 1)  0 . 27
 1 .1
Ln(1) = 0
p-value functions (2b)
z
ln( RR o )  ln( RR H )
,
SE ( RR o )
SE ( RR ) 
c
a Ne
abs ( z )
p  2  (1 


1
2

e
d
bNE
 21 z
2
dz )
p-value functions (3)
Hypothesis RR
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Hypothesis
Observed
RR a
RR b
RR c
RR d
RR e
RR f
RR g
RR h
RR I
0.1
0.20
0.33
0.5
1
2.00
3
5
10
z-value
p-value
4.737 0.0E+00
3.641
0.0000
2.849
0.0040
2.192
0.0280
1.096
0.2730
0.000
1.0000
-0.641
0.5210
-1.449
0.1470
-2.545
0.0110
p-value functions (4)
2-sided p-value
p-value function for example
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Null pval
0.27
UCLM
LCLM
0.1
0.58
Point
estimate
1
RR hypothesis
2
6.9 10
p-value functions (4)
2-sided p-value
p-value function for example
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
0.1
1
RR hypothesis
10
Case-control study of spermicides
and Down Syndrome
Interpretation
Introduction to Bayesian Thinking
What is the best estimate of the
true effect of E on D?


Given a disease D and
An exposure E+ vs. unexposed E-:
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 1.5 to 2.7
OK, but why did you say what you said?
–
Have no other info to go on
What is the best estimate of the
true effect of E on D?


Given a disease D and
An exposure E+ vs. unexposed E-:
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 0.5 to 8.0
Note that the width of the interval doesn’t
affect the best estimate in this case
What is the best estimate of the
true effect of E on D?


Given a disease D (breast cancer) and
An exposure E+ (ever-smoking) vs.
unexposed E- (never-smoking):
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 1.0 to 4.0
Most previous studies of smoking and BC
have shown no association
What is the best estimate of the
true effect of E on D?


Given a disease D (lung cancer) and
An exposure E+ (ever-smoking) vs.
unexposed E- (never-smoking):
–
–

The relative risk associating D w/ E+ (vs E-) = 2.0
with 95% confidence interval 1.0 to 4.0
Most previous studies of smoking and LC
have shown much larger effects
What is the best estimate of the
true probability of heads?


Given a 100 flips of a fair coin flipped in a fair way
Observed number of heads = 40
–
–


The probability of heads equals 0.40
with 95% confidence interval 0.304 to 0.496
Given what we know about a fair coin, why should
this data override what we know?
So why would we interpret our study data as if it
existed in a vacuum?
The Monty Hall Problem
An alternative to frequentist
statistics



Something important about the data is not
being captured by the p-value and confidence
interval, or at least in the way they’re used
What is missing is a measure of the evidence
provided by the data
Evidence is a property of data that makes us
alter our beliefs
Frequentist statistics fail as
measures of evidence

The logical underpinning of frequentist
statistics is that
–

“if an observation is rare under a hypothesis, then
the observation can be used as evidence against
the hypothesis.”
Life is full of rare events we accord little
attention
–
What makes us react is a plausible competing
hypothesis under which the data are more probable
Frequentist statistics fail as
measures of evidence
• Null p-value provides no information about
the probability of alternatives to the null
• Measurement of evidence requires 3 things:
• The observations (data)
• 2 competing hypotheses (often null and alternative)
• The p-value incorporates data & 1
hypothesis
• usually the null
The likelihood as a measure of
evidence


Likelihood = c*Probability(dataH)
Data are fixed and hypotheses variable
–

p-values calculated with fixed (null) hypothesis and
assuming data are randomly variable.
Evidence supporting one hypothesis
versus another = ratio of their likelihoods
–
Log of the ratio is an additive measure of support
Evidence versus belief

The hypothesis with the higher likelihood is
better supported by the evidence
–

Belief also depends on prior knowledge,
and can be incorporated w/ Bayes Theorem
–

But that does not make it more likely to be true.
It is the likelihood ratio that represents the data
Priors can be subjective or empirical
–
But not arbitrary
Bayesian analysis (1)

Given (1) the prior odds that a hypothesis is
true, and (2) data to measure the effect
–
–
Update the prior odds using the data to calculate the
posterior odds that the hypothesis is true.
A formal algorithm to accomplish what many do
informally
Bayesian analysis (2)
p (H 1 D )
p (H 1 ) p (D H 1 )


p (H 0 ) p (D H 0 ) p (H 0 D )


Prior odds times the likelihood ratio equals the posterior
odds
Only for people with an ignorant prior distribution
(uniform) can we say that the frequentist 95% CI covers
the true value with 95% certainty
Bayesian analysis (3): Environmental
tobacco smoke and breast cancer
likelihood prior odds posterior
study
observation ratio
(A)
odds
Sandler
1.6 (0.8-3.4)
1.78
1.0
1.78
Hirayama 1.3 (0.8-2.1)
0.38
1.8
0.7
Smith
1.6 (0.8-3.1)
2.12
0.7
1.4


H1 = [OR = 2]; H0 = [OR = 1]
Initially, the analyst has no preference (prior odds = 1)
Bayesian analysis (4): concepts

Keep these concepts separate:
–
–
–
The hypotheses under comparison
(e.g., RR=2 vs RR=1)
The prior odds (>1 favors 1st hypothesis (RR=2),
<1 favors the 2nd hypothesis (RR=1))
The estimate of effect for a study
(this is the data used to modify the prior odds)
Bayesian analysis (5): concepts

Keep these concepts separate:
–
–
The likelihood ratio (probability of data under 1st
hypothesis versus under 2nd hypothesis. >1 favors
1st hypothesis, <1 favors the 2nd hypothesis)
The posterior odds (compares the hypotheses
after observing the data. >1 favors 1st hypothesis,
<1 favors the 2nd hypothesis)
Bayesian analysis:
P art V : P rob lem 3 (20 p oin ts total)
T he follow ing table show s odds ratios, 95% confidence intervals, and the standard error of the
ln(O R ) from three studies of the association betw een passive exposu re to tobacco sm oke and the
occurrence of breast cancer. T he fourth colum n show s the likelihood ratio calculated as the
1
0
likelihood under the hypothesis that the true relative risk equals 2.0 divided by the likelihood
under the hypothesis that the true relative risk equals 1.0.
H = RR = 2.0
study
H = RR = 1.0
S E (ln(O R ))
likelihood ratio
prior odds
posterior odds
S andler
observation
95% C I
1.6 (0.8-3.4)
0.38
1.8
0.75
1.34
H irayam a
1.3 (0.8-2.1)
0.24
0.4
1.34
0.50
S m ith
1.6 (0.8-3.1)
0.34
2.1
0.50
1.07
A . (7 points) A ssum e that som eone favors the hypothesis that the true odds ratio equals 1.0 over
the hypothesis that the true odds ratio equals 2.0. T he person quantifies their preference by
stating that their prior odds for the hypothesis of a relative risk equal to 2.0 versus t he hypothesis
of a relative risk equal to 1.0 is 0.75 (see the first row of the table). C om plete the shaded cells of
the table using B ayesian analysis. A fter seeing these three studies, should the person favor (circle
one):
the hypothesis of 2.0
th e hypothesis of 1.0
http://statpages.org/bayes.html
Connection between p-values and
the likelihood ratio
Bayesian intervals

Bayesian intervals require specification of
prior odds for entire distribution of
hypotheses, not just two hypotheses
–

Update the distribution with data
–

Distribution will look like a p-value function, but
incorporate only prior knowledge.
Posterior distribution
Choose interval limits
Priors
1
0 .9
p ro b a b ility d e n s ity
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
0 .1
1
10
R R H y p o th e s is
A d vo ca te
S ke p tic
Ig n o ra n t
B im o d a l
+ Sandler
1
0 .9
p ro b a b ility d e n s ity
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
0 .1
1
10
R R H y p o th e s is
a d vo ca te
ske p tic
ig n o ra n t
b im o d a l
+ Hirayama
1
0 .9
p ro b a b ility d e n s ity
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
0 .1
1
10
R R H y p o th e s is
a d vo ca te
ske p tic
ig n o ra n t
b im o d a l
+ Smith
1
0 .9
p ro b a b ility d e n s ity
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
0 .1
1
10
R R H y p o th e s is
a d vo ca te
ske p tic
ig n o ra n t
b im o d a l
+ Morabia
1
0 .9
p ro b a b ility d e n s ity
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
0 .1
1
10
R R H y p o th e s is
a d vo ca te
ske p tic
ig n o ra n t
b im o d a l
+ Johnson
1
0 .9
p ro b a b ility d e n s ity
0 .8
0 .7
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
0 .1
1
10
R R H y p o th e s is
a d vo ca te
ske p tic
ig n o ra n t
b im o d a l
Conclusion

Pvalue fallacy
–

Pvalue functions
–

Cannot serve both the long run perspective and the
individual study perspective
Can help see the entire distribution of probabilities
Bayesian analysis
–
Allows us to change our beliefs with new information
and measure the probability of hypothesis