Estimating the Proportion of False Null Hypotheses

Download Report

Transcript Estimating the Proportion of False Null Hypotheses

Needles in Haystacks:
Are There Any?
How Many Are There?
Where Are They?
John Rice
University of California, Berkeley
1
Outline
• Classical testing: significance levels, pvalues, power
• Testing many hypotheses: issues and recent
developments (false discovery rate)
• “Higher Criticism: are any null hypotheses
false?
• Motivation: the Taiwanese American
Occultation Survey
• Estimating the proportion of false nulls
2
Classical Testing
H0: null hypothesis vs HA: alternative hypothesis
T: test statistic
Reject H0 for large values of T, say T > t0 (threshold)
Type I error: reject H0 when it holds
Significance level  = Prob(Type I error)=P(T>t0 |H0)
Fix  and find t0 by considering only the null distribution
of T
P-value: If observe T=t, P-value = Prob(T>t|H0). Under
H0, with continuity, the distribution of P-value is
uniform on [0,1]
Type II error: reject HA when it holds
Power: 1 - Prob(Type II error)
3
Multiple Testing
Many null and alternative hypotheses; e.g.
source detection --- each pixel is either
background or source
Collection of test statistics and P-values, one for
each hypothesis. May or may not be
independent random variables
Possible questions: Are any null hypotheses
false? How many are false? Which ones are
false? Or probabilities of such.
4
Analogues of Type I Error
# not
rejected
# rejected
True Null
U
V
m0
False Null
T
S
m1
m-R
R
m = m0 + m1
Per-Comparison Error Rate (PCER): E(V/m)
Ignores multiplicity and use significance level
, e.g. 
Per-Family Error Rate (PFER): E(V)
Family-Wise Error Rate (FWER): P(V>0)
Latter two can be controlled by Bonferroni,
e.g. m
5
Recent Analogues
# not
rejected
# rejected
True Null
U
V
m0
False Null
T
S
m1
m-R
R
m = m0 + m1
False Discovery Proportion: FDP = V/R
False Discovery Rate: FDR = E(FDP)
Positive FDR: p-FDR = E(FDP| R>0)
Exceedance Control: P(FDP > c)
The probability of at most k false rejections given at
least k hypotheses are true: k-FWER
6
False Discovery Rate
# not
rejected
# rejected
True Null
U
V
m0
False Null
T
S
m1
m-R
R
m = m0 + m1
Determination of FDR threshold for desired level  = E(V/m)
Order P-values P1 < P2 < …< Pm
Find d = max { j: Pj < j/m}
Reject all hypotheses with Pk Pd
Quantity controlled by FRD can be more meaningful
than that controlled by PCER which treats 10 false
detections out of 20 detections the same as 10 out of
2000.
7
FDR line t(p)= p/
Empirical distribution of P-values
Uniform distribution
Note that threshold is chosen adaptively, compared to
threshold for PCER which controls E(V/m), by, say, a k
threshold. For example, adapts to distribution of source
intensity relative to background intensity
8
Hopkins et al.
9
Higher Criticism
Are there any false nulls, any sources? Are there any
needles in the haystack?
Test statistic is based on comparing the distribution of
P-values to a uniform distribution -- are there too many
small ones? Expect i P-values  i/n
10
Donoho & Jin
strength
Consider a large number of tests for a rare but moderately strong
signal. There are scenarios in which it can be determined that
there are signals but not determine which tests correspond to
signals. The smallest few P-values will not correspond to signals.
sparsity
11
Estimating the Proportion
Seemingly harder question: what is the proportion of
needles in the haystack?
Motivation: The Taiwanese-American Occultation Survey
(TAOS) will search for Kuiper Belt Objects (KBOs), by
monitoring star fields for occultations.
12
Occultations
13
Time series of flux
Occultation by an asteroid on two cameras
Thousands of stars will be simultaneously
monitored every night, searching for rare events
lasting about 1/5 second.
In the course of a year, will try to detect 10-1000(?)
occultations among 1010-1012 measurements!
Simulated
occultation
14
Proposed Detection Scheme
Consider basing test on flux from a single hold. Consider
a particular star
Initial data: fkh = flux from star on telescope k, hold
h=1,…,n will be used for calibrating subsequent test
statistics.
New observation to be tested for possible occultation: Yk
Rk = rank of Yk among the fkh
Test statistic: the product of the Rk
15
Construction is based on the following fact: If Y1,…,Yn
are iid and Y is independent of them with the same
distribution, then
Thus, the null distribution of the product of the ranks
can be calculated explicitly. Or an approximation to
the log of the product can be made by treating the
ranks as independent uniform random variables.
16
Retrospective Estimation of
Occultation Rate
Suppose have a year of data. What can we say about
the occultation rate (and thus the abundance of
KBOs)?
Note distinction between this question and identifying
individual occultations in real time.
17
The problem:
•Given a very large number of independent
hypothesis tests, where in the vast majority of cases
the null hypothesis is true, estimate the proportion of
false null hypotheses.
•The power of the test is unknown and varies from
test to test.
•The distribution of the test statistic under the
alternative is not known.
•We would like to be able to state at a specified level
of confidence that there are at least a specified
number of false null hypotheses.
18
Suppose a proportion of the tests correspond
to false null hypotheses. Then the distribution
of the p-values is
Lower bound:
Empirical version of numerator:
19
Motivation for construction: want to bound the contributions from the
true nulls to
Suppose there exists
Since
than
such that
the proportion of p-values greater
can be attributed to false nulls.
Thus a (biased) estimate of the proportion of false nulls:
20
21
Lower confidence bound:
Thus can state, for example, that with 90% confidence
there were at least 777 occultations.
Note that there is no meaningful upper bound, because
occultations could be arbitrarily shallow.
Analysis shows that there are scenarios in which the
proportion of false nulls can be consistently estimated
but in which one cannot identify which nulls were false.
22
Surprise! You would think that estimating the proportion of
false nulls is harder than testing whether any nulls were
false, but for the normal model presented earlier, when you
can do one, you can do the other.
Cai, Jin & Low
23
References
Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate. J. Royal Stat. Soc. B. 57, 289.
T. Cai, J. Jin, and M. Low (2005). Estimation and confidence sets for sparse normal mixtures.
www.stat.purdue.edu/~jinj/Research/ESTEPS.pdf
D. Donoho & J. Jin (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics 32, 962
C. Genovese and L. Wasserman (2005). Exceedence control of the false discovery proportion.
http://www.stat.cmu.edu/~genovese/papers/exceedance.pdf
A. Hopkins et al (2002). A new source detection algorithm using the false discovery rate. Astr. J. 123, 1086
N. Meinshausen and J. Rice (2006). Estimating the proportion of false null hypotheses among a large number of
independently tested hypotheses. Annals of Statistics 32(1), in press.
Sofware(R ): cran.r-project.org/doc/packages/howmany.pdf
C. Miller et al (2001). Controlling the false discovery rate in astrophysical data analysis. Astr. J 122, 3492
J. Shaffer (2005). Recent developments towards optimality in multiple hypothesis testing. Contact
[email protected]
J. Storey (2002). A direct approach to false discovery rates. J. Roy. Stat. Soc. B.64, 479
M. van der Laan, S. Dudoit, and K. Pollard (2004). Augmentation procedures for control of the generalized familywise
error rate and tail probabilities for the proportion of false positives. Statistical Applications in Genetics and Molecular
Biology 3, Article 15
There are many additional relevant references and the literature is rapidly evolving. Those given above are for
starters and contain further references.
24