Transcript Title 1

Parametric Thresholding
Methods
(Random Field Theory &
False Discovery Rate)
Thomas Nichols, Ph.D.
Assistant Professor
Department of Biostatistics
University of Michigan
http://www.sph.umich.edu/~nichols
SPM Course
May 13, 2004
1
image data
parameter
estimates
design
matrix
kernel
realignment &
motion
correction
General Linear Model
smoothing
model fitting
statistic image
Thresholding &
Random Field
Theory
normalisation
anatomical
reference
Statistical
Parametric Map
2
Corrected thresholds & p-values
Assessing Statistic Images
Where’s the signal?
High Threshold
t > 5.5
Good Specificity
Poor Power
(risk of false negatives)
Med. Threshold
t > 3.5
Low Threshold
t > 0.5
Poor Specificity
(risk of false positives)
Good Power
...but why threshold?!
Blue-sky inference:
What we’d like
• Don’t threshold, model the signal!
– Signal location?
• Estimates and CI’s on
(x,y,z) location
ˆMag.
– Signal magnitude?
• CI’s on % change
– Spatial extent?
ˆLoc.
ˆExt.
space
• Estimates and CI’s on activation volume
• Robust to choice of cluster definition
• ...but this requires an explicit spatial model
4
Blue-sky inference:
What we need
• Need an explicit spatial model
• No routine spatial modeling methods exist
– High-dimensional mixture modeling problem
– Activations don’t look like Gaussian blobs
– Need realistic shapes, sparse representation
• Some work by Hartvig et al., Penny et al.
5
Real-life inference:
What we get
• Signal location
– Local maximum – no inference
– Center-of-mass – no inference
• Sensitive to blob-defining-threshold
• Signal magnitude
– Local maximum intensity – P-values (& CI’s)
• Spatial extent
– Cluster volume – P-value, no CI’s
• Sensitive to blob-defining-threshold
6
Voxel-level Inference
• Retain voxels above -level threshold u
• Gives best spatial specificity
– The null hyp. at a single voxel can be rejected
u
space
Significant
Voxels
No significant
Voxels
7
Cluster-level Inference
• Two step-process
– Define clusters by arbitrary threshold uclus
– Retain clusters larger than -level threshold k
uclus
space
Cluster not
significant
k
k
Cluster
significant
8
Cluster-level Inference
• Typically better sensitivity
• Worse spatial specificity
– The null hyp. of entire cluster is rejected
– Only means
that one or more of voxels in
cluster active
uclus
space
Cluster not
significant
k
k
Cluster
significant
9
Set-level Inference
• Count number of blobs c
– Minimum blob size k
• Worst spatial specificity
– Only can reject global null hypothesis
uclus
space
k
k
Here c = 1; only 1 cluster larger than k
10
Multiple comparisons…
11
Hypothesis Testing
• Null Hypothesis H0
• Test statistic T
u
– t observed realization of T
•  level

– Acceptable false positive rate
Null Distribution of T
– Level  = P( T>u | H0 )
– Threshold u controls false positive rate at level 
• P-value
– Assessment of t assuming H0
– P( T > t | H0 )
• Prob. of obtaining stat. as large
or larger in a new experiment
– P(Data|Null) not P(Null|Data)
t
P-val
12
Null Distribution of T
Multiple Comparisons Problem
• Which of 100,000 voxels are sig.?
– =0.05  5,000 false positive voxels
• Which of (random number, say) 100 clusters significant?
– =0.05  5 false positives clusters
t > 0.5
t > 1.5
t > 2.5
t > 3.5
t > 4.5
t > 5.5
t > 6.5
13
MCP Solutions:
Measuring False Positives
• Familywise Error Rate (FWER)
– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)
– FDR = E(V/R)
– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
14
MCP Solutions:
Measuring False Positives
• Familywise Error Rate (FWER)
– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)
– FDR = E(V/R)
– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
15
FWE Multiple comparisons
terminology…
• Family of hypotheses
– Hk k   = {1,…,K}
– H =  Hk
• Familywise Type I error
– weak control – omnibus test
• Pr(“reject” H  H)  
• “anything, anywhere” ?
– strong control – localising test
• Pr(“reject” HW  HW)  
 W: W   & HW
• “anything, & where” ?
• Adjusted p–values
– test level at which reject Hk
16
FWER MCP Solutions:
Controlling FWER w/ Max
• FWER & distribution of maximum
FWER = P(FWE)
= P( i {Ti  u} | Ho)
= P( maxi Ti  u | Ho)
• 100(1-)%ile of max distn controls FWER
– where
.
FWER = P( maxi Ti  u | Ho) = 
u = F-1max (1-)

u
17
Voxel-level test…
p = 0.05
• Threshold u 
– tk > u  reject Hk
– reject any Hk  reject H
reject H if tmax > u
p = 0.0001
• Valid test
– weak control
Pr(Tmax > u  H )  
– strong control
p = 0.0000001
since W  
Pr(TWmax > u  HW )  
• Adjusted p –values
– Pr(Tmax > tk  H)
u
18
FWE MCP Solutions:
Bonferroni
• For a statistic image T...
– Ti
ith voxel of statistic image T
• ...use  = 0/V
– 0 FWER level (e.g. 0.05)
– V number of voxels
– u -level statistic threshold, P(Ti  u) = 
• By Bonferroni inequality...
FWER = P(FWE)
= P( i {Ti  u} | H0)
 i P( Ti  u| H0 )
= i 
= i 0 /V = 0
Conservative under correlation
Independent:
Some dep.:
Total dep.:
V tests
? tests
1 test
19
Random field theory…
20
SPM approach:
Random fields…
• Consider statistic image as lattice representation of
a continuous random field
• Use results from continuous random field theory

lattice represtntation
21
FWER MCP Solutions:
Random Field Theory
• Euler Characteristic u
– Topological Measure
• #blobs - #holes
– At high thresholds,
Threshold
just counts blobs
Random Field
– FWER = P(Max voxel  u | Ho)
No holes
= P(One or more blobs | Ho)
 P(u  1 | Ho)
Never more
than 1 blob
 E(u | Ho)
22 Sets
Suprathreshold
RFT Details:
Expected Euler Characteristic
E(u)  () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2
– 
 Search region   R3
– (  volume
– ||1/2  roughness
• Assumptions
– Multivariate Normal
– Stationary*
– ACF twice differentiable at 0
Only very
upper tail
approximates
1-Fmax(u)
* Stationary
– Results valid w/out stationary
– More accurate when stat. holds
23
Random Field Theory
Smoothness Parameterization
• E(u) depends on ||1/2
–  roughness matrix:
• Smoothness
parameterized as
Full Width at Half Maximum
– FWHM of Gaussian kernel
needed to smooth a white
noise random field to
roughness 
FWHM
Autocorrelation Function
24
Random Field Theory
Smoothness Parameterization
• RESELS
– Resolution Elements
– 1 RESEL = FWHMx FWHMy FWHMz
– RESEL Count R
• R = ()  || = (4log2)3/2 () / ( FWHMx FWHMy FWHMz )
• Volume of search region in units of smoothness
• Eg: 10 voxels, 2.5 FWHM 4 RESELS
1
2
1
3
4
2
5
6
7
3
8
9
10
4
• Beware RESEL misinterpretation
– RESEL are not “number of independent ‘things’ in the image”
• See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res.
.
25
• Smoothness est’d
from standardized
residuals
– Variance of
gradients
– Yields resels per
voxel (RPV)
voxels
data matrix
scans
=
design matrix
Random Field Theory
Smoothness Estimation

Y = X
?
parameters
+
b
+
errors
?
e
s2
variance
 estimate

parameter
estimates

^
b
¸
=
residuals
estimated variance
estimated
component
fields
• RPV image
– Local roughness est.
– Can transform in to local smoothness est.
• FWHM Img = (RPV Img)-1/D
• Dimension D, e.g. D=2 or 3
26
Random Field Intuition
• Corrected P-value for voxel value t
Pc = P(max T > t)
 E(t)
 () ||1/2 t2 exp(-t2/2)
• Statistic value t increases
– Pc decreases (but only for large t)
• Search volume increases
– Pc increases (more severe MCP)
• Roughness increases (Smoothness decreases)
– Pc increases (more severe MCP)
27
RFT Details:
Unified Formula
• General form for expected Euler characteristic
• 2, F, & t fields • restricted search regions • D dimensions •
E[u()] = Sd
Rd (): d-dimensional Minkowski
functional of 
– function of dimension,
space  and smoothness:
R0()
R1()
R2()
R3()
=
=
=
=
() Euler characteristic of 
resel diameter
resel surface area
resel volume
Rd () rd (u)
rd (): d-dimensional EC density of Z(x)
– function of dimension and threshold,
specific for RF type:
E.g. Gaussian RF:
r0(u)
r1(u)
r2(u)
r3(u)
r4(u)

= 1- (u)
= (4 ln2)1/2 exp(-u2/2) / (2)
= (4 ln2)
exp(-u2/2) / (2)3/2
= (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2)2
= (4 ln2)2 (u3 -3u) exp(-u2/2) / (2)5/2
28
Random Field Theory
Cluster Size Tests
• Expected Cluster Size
– E(S) = E(N)/E(L)
– S cluster size
– N suprathreshold volume
({T > uclus})
– L number of clusters
• E(N) = () P( T > uclus )
• E(L)  E(u)
– Assuming no holes
5mm FWHM
10mm FWHM
15mm FWHM
29
Random Field Theory
Cluster Size Distribution
• Gaussian Random Fields (Nosko, 1969)
S 2/ D
2 / D





E( N )
~ Exp



 ( D / 21) E ( L)





– D: Dimension of RF
• t Random Fields (Cao, 1999)
2/ D
– B: Beta distn
D


U
2
0
– U’s:  ’s
S ~ cB1/ 2 

D
– c chosen s.t.
  b 0U b 
E(S) = E(N) / E(L)
30
Random Field Theory
Cluster Size Corrected P-Values
• Previous results give uncorrected P-value
• Corrected P-value
– Bonferroni
• Correct for expected number of clusters
• Corrected Pc = E(L) Puncorr
– Poisson Clumping Heuristic (Adler, 1980)
• Corrected Pc = 1 - exp( -E(L) Puncorr )
31
Review:
Levels of inference & power
T h is E P S im a g e d o e s n o t c o n t a in a s c re e n p re v ie w .
I t w ill p r in t c o r r e c tly to a P o s tS c rip t p r in te r .
F ile N a m e : re c a p _ te s ts . e p s
T itle : r e c a p _ te s ts . e p s
C re a to r : C L A R IS E P S F E x p o rt F ilte r V 1 . 0
C re a tio n D a te : 5 /1 2 /9 6 2 :1 3 :3 0 p . m .
32
Random Field Theory
Limitations
• Sufficient smoothness
Lattice Image
Data
– FWHM smoothness 3-4× voxel size (Z)
– More like ~10× for low-df T images

• Smoothness estimation
Random
– Estimate is biased when images not sufficiently Continuous
Field
smooth
• Multivariate normality
– Virtually impossible to check
• Several layers of approximations
• Stationary required for cluster size results
33
SPM results...
34
SPM results...
36
37
SPM results...
39
SPM results...
40
Real Data
• fMRI Study of Working Memory
– 12 subjects, block design
– Item Recognition
Active
D
Marshuetz et al (2000)
• Active:View five letters, 2s pause,
view probe letter, respond
• Baseline: View XXXXX, 2s pause,
view Y or N, respond
UBKDA
Baseline
• Second Level RFX
– Difference image, A-B constructed
for each subject
– One sample t test
yes
N
XXXXX
no
41
Real Data:
RFT Result
• Threshold
• Result
– 5 voxels above
the threshold
– 0.0063 minimum
FWE-corrected
p-value
-log10 p-value
– S = 110,776
– 2  2  2 voxels
5.1  5.8  6.9 mm
FWHM
– u = 9.870
42
MCP Solutions:
Measuring False Positives
• Familywise Error Rate (FWER)
– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)
– FDR = E(V/R)
– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
43
False Discovery Rate
• For any threshold, all voxels can be cross-classified:
Accept Null
Reject Null
Null True
V0A
V0R
m0
Null False
V1A
V1R
m1
NA
NR
V
• Realized FDR
rFDR = V0R/(V1R+V0R) = V0R/NR
– If NR = 0, rFDR = 0
• But only can observe NR, don’t know V1R & V0R
– We control the expected rFDR
FDR = E(rFDR)
44
False Discovery Rate
Illustration:
Noise
Signal
Signal+Noise
45
Control of Per Comparison Rate at 10%
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2%
Percentage of Null Pixels that are False Positives
9.5%
Control of Familywise Error Rate at 10%
Occurrence of Familywise Error
FWE
Control of False Discovery Rate at 10%
6.7%
10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2%
Percentage of Activated Pixels that are False Positives
8.7%
46
Benjamini & Hochberg
Procedure
• Select desired limit q on FDR
• Order p-values, p(1)  p(2)  ...  p(V)
• Let r be largest i such that
1
p-value
• Reject all hypotheses
corresponding to
p(1), ... , p(r).
p(i)
i/V  q/c(V)
0
p(i)  i/V  q/c(V)
JRSS-B (1995)
57:289-300
0
1
i/V
47
Adaptiveness of
Benjamini & Hochberg FDR
P-value
threshold
when no
signal:
/V
Ordered p-values p(i)
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
Fractional index i/V
1
P-value
threshold
when all
signal:
48
Benjamini & Hochberg
Procedure Details
• c(V) = 1
– Positive Regression Dependency on Subsets
P(X1c1, X2c2, ..., Xkck | Xi=xi) is non-decreasing in xi
• Only required of test statistics for which null true
• Special cases include
– Independence
– Multivariate Normal with all positive correlations
– Same, but studentized with common std. err.
• c(V) = Si=1,...,V 1/i  log(V)+0.5772
– Arbitrary covariance structure
Benjamini &
Yekutieli (2001).
Ann. Stat.
49
29:1165-1188
Benjamini & Hochberg:
Key Properties
• FDR is controlled
E(rFDR)  q m0/V
– Conservative, if large fraction of nulls false
• Adaptive
– Threshold depends on amount of signal
• More signal, More small p-values,
More p(i) less than i/V  q/c(V)
50
Controlling FDR:
Varying Signal Extent
p=
Signal Intensity 3.0
z=
Signal Extent 1.0
51
Noise Smoothness 3.0
1
Controlling FDR:
Varying Signal Extent
p=
Signal Intensity 3.0
z=
Signal Extent 2.0
52
Noise Smoothness 3.0
2
Controlling FDR:
Varying Signal Extent
p=
Signal Intensity 3.0
z=
Signal Extent 3.0
53
Noise Smoothness 3.0
3
Controlling FDR:
Varying Signal Extent
p = 0.000252
Signal Intensity 3.0
z = 3.48
Signal Extent 5.0
54
Noise Smoothness 3.0
4
Controlling FDR:
Varying Signal Extent
p = 0.001628
Signal Intensity 3.0
z = 2.94
Signal Extent 9.5
55
Noise Smoothness 3.0
5
Controlling FDR:
Varying Signal Extent
p = 0.007157
Signal Intensity 3.0
z = 2.45
Signal Extent 16.5
56
Noise Smoothness 3.0
6
Controlling FDR:
Varying Signal Extent
p = 0.019274
Signal Intensity 3.0
z = 2.07
Signal Extent 25.0
57
Noise Smoothness 3.0
7
Controlling FDR:
Benjamini & Hochberg
• Illustrating BH under dependence
8 voxel image
1
– Extreme example of positive dependence
p-value
p(i)
i/V  q/c(V)
0
32 voxel image
0
(interpolated from 8 voxel image)
1
i/V
58
Other FDR Methods
• James Troendle
JSPI (2000) 84:139-158
– Normal theory FDR
• More powerful than BH FDR
• Requires numerical integration to obtain thresholds
– Exactly valid if whole correlation matrix known
• John Storey
JRSS-B (2002) 64:479-498
– pFDR “Positive FDR”
• FDR conditional on one or more rejections
• Critical threshold is fixed, not estimated
• pFDR and Emperical Bayes
– Asymptotically valid under “clumpy” dependence
59
Real Data: FDR Example
• Threshold
– Indep/PosDep
u = 3.83
– Arb Cov
u = 13.15
• Result
– 3,073 voxels above
Indep/PosDep u
– <0.0001 minimum
FDR-corrected
p-value
FDR Threshold = 3.83
3,073 voxels
FWER Perm. Thresh. = 9.87
60
7 voxels
Conclusions
• Must account for multiplicity
– Otherwise have a fishing expedition
• FWER
– Very specific, not very sensitive
• FDR
– Less specific, more sensitive
– Sociological calibration still underway
61
References
• Most of this talk covered in these papers
TE Nichols & S Hayasaka, Controlling the Familywise
Error Rate in Functional Neuroimaging: A Comparative
Review. Statistical Methods in Medical Research, 12(5):
419-446, 2003.
TE Nichols & AP Holmes, Nonparametric Permutation
Tests for Functional Neuroimaging: A Primer with
Examples. Human Brain Mapping, 15:1-25, 2001.
CR Genovese, N Lazar & TE Nichols, Thresholding of
Statistical Maps in Functional Neuroimaging Using the
False Discovery Rate. NeuroImage, 15:870-878, 2002.
62