Inference on SPMs: Random Field Theory & Alternatives Thomas Nichols, Ph.D. Department of Statistics & Warwick Manufacturing Group University of Warwick http://www.fmrib.ox.ac.uk/~nichols FIL SPM Course 22 October, 2009

Download Report

Transcript Inference on SPMs: Random Field Theory & Alternatives Thomas Nichols, Ph.D. Department of Statistics & Warwick Manufacturing Group University of Warwick http://www.fmrib.ox.ac.uk/~nichols FIL SPM Course 22 October, 2009

Inference on SPMs:
Random Field Theory &
Alternatives
Thomas Nichols, Ph.D.
Department of Statistics &
Warwick Manufacturing Group
University of Warwick
http://www.fmrib.ox.ac.uk/~nichols
FIL SPM Course
22 October, 2009
1
image data
parameter
estimates
design
matrix
kernel
realignment &
motion
correction
General Linear Model
smoothing
model fitting
statistic image
Thresholding &
Random Field
Theory
normalisation
anatomical
reference
Statistical
Parametric Map
2
Corrected thresholds & p-values
Assessing Statistic
Images…
3
Assessing Statistic Images
Where’s the signal?
High Threshold
t > 5.5
Good Specificity
Poor Power
(risk of false negatives)
Med. Threshold
t > 3.5
Low Threshold
t > 0.5
Poor Specificity
(risk of false positives)
Good Power
...but why threshold?!
Blue-sky inference:
What we’d like
• Don’t threshold, model the signal!
– Signal location?
• Estimates and CI’s on
(x,y,z) location
ˆMag.
– Signal magnitude?
• CI’s on % change
– Spatial extent?
ˆLoc.
ˆExt.
space
• Estimates and CI’s on activation volume
• Robust to choice of cluster definition
• ...but this requires an explicit spatial model
5
Real-life inference:
What we get
• Signal location
– Local maximum – no inference
– Center-of-mass – no inference
• Sensitive to blob-defining-threshold
• Signal magnitude
– Local maximum intensity – P-values (& CI’s)
• Spatial extent
– Cluster volume – P-value, no CI’s
• Sensitive to blob-defining-threshold
7
Voxel-level Inference
• Retain voxels above -level threshold u
• Gives best spatial specificity
– The null hyp. at a single voxel can be rejected
u
space
Significant
Voxels
No significant
Voxels
8
Cluster-level Inference
• Two step-process
– Define clusters by arbitrary threshold uclus
– Retain clusters larger than -level threshold k
uclus
space
Cluster not
significant
k
k
Cluster
significant
9
Cluster-level Inference
• Typically better sensitivity
• Worse spatial specificity
– The null hyp. of entire cluster is rejected
– Only means
that one or more of voxels in
cluster active
uclus
space
Cluster not
significant
k
k
Cluster
significant
10
Set-level Inference
• Count number of blobs c
– Minimum blob size k
• Worst spatial specificity
– Only can reject global null hypothesis
uclus
space
k
k
Here c = 1; only 1 cluster larger than k
11
Multiple comparisons…
12
Hypothesis Testing
• Null Hypothesis H0
• Test statistic T
u
– t observed realization of T
•  level

– Acceptable false positive rate
Null Distribution of T
– Level  = P( T>u | H0 )
– Threshold u controls false positive rate at level 
• P-value
– Assessment of t assuming H0
– P( T > t | H0 )
• Prob. of obtaining stat. as large
or larger in a new experiment
– P(Data|Null) not P(Null|Data)
t
P-val
13
Null Distribution of T
Multiple Comparisons Problem
• Which of 100,000 voxels are sig.?
– =0.05  5,000 false positive voxels
• Which of (random number, say) 100 clusters significant?
– =0.05  5 false positives clusters
t > 0.5
t > 1.5
t > 2.5
t > 3.5
t > 4.5
t > 5.5
t > 6.5
14
MCP Solutions:
Measuring False Positives
• Familywise Error Rate (FWER)
– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)
– FDR = E(V/R)
– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
15
MCP Solutions:
Measuring False Positives
• Familywise Error Rate (FWER)
– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)
– FDR = E(V/R)
– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
16
FWE MCP Solutions:
Bonferroni
• For a statistic image T...
– Ti
ith voxel of statistic image T
• ...use  = 0/V
– 0 FWER level (e.g. 0.05)
– V number of voxels
– u -level statistic threshold, P(Ti  u) = 
• By Bonferroni inequality...
FWER = P(FWE)
= P( i {Ti  u} | H0)
 i P( Ti  u| H0 )
= i 
= i 0 /V = 0
Conservative under correlation
Independent:
Some dep.:
Total dep.:
V tests
? tests
1 test
18
Random field theory…
19
SPM approach:
Random fields…
• Consider statistic image as lattice representation of
a continuous random field
• Use results from continuous random field theory

lattice represtntation
20
FWER MCP Solutions:
Random Field Theory
• Euler Characteristic u
– Topological Measure
• #blobs - #holes
– At high thresholds,
Threshold
just counts blobs
Random Field
– FWER = P(Max voxel  u | Ho)
No holes
= P(One or more blobs | Ho)
 P(u  1 | Ho)
Never more
than 1 blob
 E(u | Ho)
22 Sets
Suprathreshold
RFT Details:
Expected Euler Characteristic
E(u)  () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2
– 
 Search region   R3
– (  volume
– ||1/2  roughness
• Assumptions
– Multivariate Normal
– Stationary*
– ACF twice differentiable at 0
Only very
upper tail
approximates
1-Fmax(u)
* Stationary
– Results valid w/out stationary
– More accurate when stat. holds
23
Random Field Theory
Smoothness Parameterization
• E(u) depends on ||1/2
–  roughness matrix:
• Smoothness
parameterized as
Full Width at Half Maximum
– FWHM of Gaussian kernel
needed to smooth a white
noise random field to
roughness 
FWHM
Autocorrelation Function
24
Random Field Theory
Smoothness Parameterization
• RESELS
– Resolution Elements
– 1 RESEL = FWHMx FWHMy FWHMz
– RESEL Count R
• R = ()  || = (4log2)3/2 () / ( FWHMx FWHMy FWHMz )
• Volume of search region in units of smoothness
• Eg: 10 voxels, 2.5 FWHM 4 RESELS
1
2
1
3
4
2
5
6
7
3
8
9
10
4
• Beware RESEL misinterpretation
– RESEL are not “number of independent ‘things’ in the image”
• See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res.
.
25
• Smoothness est’d
from standardized
residuals
– Variance of
gradients
– Yields resels per
voxel (RPV)
voxels
data matrix
scans
=
design matrix
Random Field Theory
Smoothness Estimation

Y = X
?
parameters
+
b
+
errors
?
e
s2
variance
 estimate

parameter
estimates

^
b
¸
=
residuals
estimated variance
estimated
component
fields
• RPV image
– Local roughness est.
– Can transform in to local smoothness est.
• FWHM Img = (RPV Img)-1/D
• Dimension D, e.g. D=2 or 3
26
Random Field Intuition
• Corrected P-value for voxel value t
Pc = P(max T > t)
 E(t)
 () ||1/2 t2 exp(-t2/2)
• Statistic value t increases
– Pc decreases (better signal!)
• Search volume increases
( () ↑ )
– Pc increases (more severe MCP)
• Smoothness increases
( ||1/2 ↓ )
– Pc decreases (less severe MCP)
27
RFT Details:
Unified Formula
• General form for expected Euler characteristic
• 2, F, & t fields • restricted search regions • D dimensions •
E[u()] = Sd
Rd (): d-dimensional Minkowski
functional of 
– function of dimension,
space  and smoothness:
R0()
R1()
R2()
R3()
=
=
=
=
() Euler characteristic of 
resel diameter
resel surface area
resel volume
Rd () rd (u)
rd (): d-dimensional EC density of Z(x)
– function of dimension and threshold,
specific for RF type:
E.g. Gaussian RF:
r0(u)
r1(u)
r2(u)
r3(u)
r4(u)

= 1- (u)
= (4 ln2)1/2 exp(-u2/2) / (2)
= (4 ln2)
exp(-u2/2) / (2)3/2
= (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2)2
= (4 ln2)2 (u3 -3u) exp(-u2/2) / (2)5/2
28
Random Field Theory
Cluster Size Tests
• Expected Cluster Size
– E(S) = E(N)/E(L)
– S cluster size
– N suprathreshold volume
({T > uclus})
– L number of clusters
• E(N) = () P( T > uclus )
• E(L)  E(u)
– Assuming no holes
5mm FWHM
10mm FWHM
15mm FWHM
29
Random Field Theory
Limitations
• Sufficient smoothness
Lattice Image
Data
– FWHM smoothness 3-4× voxel size (Z)
– More like ~10× for low-df T images

• Smoothness estimation
Random
– Estimate is biased when images not sufficiently Continuous
Field
smooth
• Multivariate normality
– Virtually impossible to check
• Several layers of approximations
• Stationary required for cluster size results
33
Real Data
• fMRI Study of Working Memory
– 12 subjects, block design
– Item Recognition
Active
D
Marshuetz et al (2000)
• Active:View five letters, 2s pause,
view probe letter, respond
• Baseline: View XXXXX, 2s pause,
view Y or N, respond
UBKDA
Baseline
• Second Level RFX
– Difference image, A-B constructed
for each subject
– One sample t test
yes
N
XXXXX
no
34
Real Data:
RFT Result
• Threshold
• Result
– 5 voxels above
the threshold
– 0.0063 minimum
FWE-corrected
p-value
-log10 p-value
– S = 110,776
– 2  2  2 voxels
5.1  5.8  6.9 mm
FWHM
– u = 9.870
35
Real Data:
SnPM Promotional
uRF = 9.87
uBonf = 9.80
5 sig. vox.
t11 Statistic, RF & Bonf. Threshold
• Nonparametric method more
powerful than RFT for low DF
• “Variance Smoothing” even
more sensitive
• FWE controlled all the while!
uPerm = 7.67
378 sig. vox.
58 sig. vox.
t11 Statistic, Nonparametric Threshold
Smoothed Variance t Statistic, 36
Nonparametric Threshold
False Discovery Rate…
37
MCP Solutions:
Measuring False Positives
• Familywise Error Rate (FWER)
– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)
– FDR = E(V/R)
– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
38
False Discovery Rate
• For any threshold, all voxels can be cross-classified:
Accept Null
Reject Null
Null True
V0A
V0R
m0
Null False
V1A
V1R
m1
NA
NR
V
• Realized FDR
rFDR = V0R/(V1R+V0R) = V0R/NR
– If NR = 0, rFDR = 0
• But only can observe NR, don’t know V1R & V0R
– We control the expected rFDR
FDR = E(rFDR)
39
False Discovery Rate
Illustration:
Noise
Signal
Signal+Noise
40
Control of Per Comparison Rate at 10%
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2%
Percentage of Null Pixels that are False Positives
9.5%
Control of Familywise Error Rate at 10%
Occurrence of Familywise Error
FWE
Control of False Discovery Rate at 10%
6.7%
10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2%
Percentage of Activated Pixels that are False Positives
8.7%
41
Benjamini & Hochberg
Procedure
• Select desired limit q on FDR
• Order p-values, p(1)  p(2)  ...  p(V)
• Let r be largest i such that
1
p-value
• Reject all hypotheses
corresponding to
p(1), ... , p(r).
p(i)
i/V  q
0
p(i)  i/V  q
JRSS-B (1995)
57:289-300
0
1
i/V
42
Adaptiveness of
Benjamini & Hochberg FDR
P-value
threshold
when no
signal:
/V
Ordered p-values p(i)
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
Fractional index i/V
1
P-value
threshold
when all
signal:
43
Real Data: FDR Example
• Threshold
– Indep/PosDep
u = 3.83
– Arb Cov
u = 13.15
• Result
– 3,073 voxels above
Indep/PosDep u
– <0.0001 minimum
FDR-corrected
p-value
FDR Threshold = 3.83
3,073 voxels
FWER Perm. Thresh. = 9.87
44
7 voxels
FDR Changes
• Before SPM8
– Only voxel-wise FDR
• SPM8
– Cluster-wise FDR
– Peak-wise FDR
Item Recognition data
Cluster-forming threshold P = 0.001
Cluster-wise FDR: 40 voxel cluster, PFDR = 0.07
Peak-wise FDR: t = 4.84, PFDR = 0.836
Cluster-forming threshold P = 0.01
Cluster-wise FDR: 1250 - 4380 voxel clusters, PFDR < 0.001
Cluster-wise FDR: 80 voxel cluster, PFDR = 0.516
Peak-wise FDR: t = 4.84, PFDR = 0.027
45
Conclusions
• Must account for multiplicity
– Otherwise have a fishing expedition
• FWER
– Very specific, not very sensitive
• FDR
– Less specific, more sensitive
– Sociological calibration still underway
56
References
• Most of this talk covered in these papers
TE Nichols & S Hayasaka, Controlling the Familywise
Error Rate in Functional Neuroimaging: A Comparative
Review. Statistical Methods in Medical Research, 12(5):
419-446, 2003.
TE Nichols & AP Holmes, Nonparametric Permutation
Tests for Functional Neuroimaging: A Primer with
Examples. Human Brain Mapping, 15:1-25, 2001.
CR Genovese, N Lazar & TE Nichols, Thresholding of
Statistical Maps in Functional Neuroimaging Using the
False Discovery Rate. NeuroImage, 15:870-878, 2002.
57
Backup Slides
• Permutation & FWE
58
Nonparametric Inference
• Parametric methods
– Assume distribution of
statistic under null
hypothesis
– Needed to find P-values, u
5%
Parametric Null Distribution
• Nonparametric methods
– Use data to find
distribution of statistic
under null hypothesis
– Any statistic!
5%
Nonparametric Null Distribution
59
Controlling FWE:
Permutation Test
• Parametric methods
– Assume distribution of
max statistic under null
hypothesis
• Nonparametric methods
5%
Parametric Null Max Distribution
– Use data to find
distribution of max statistic
5%
under null hypothesis
– Again, any max statistic! Nonparametric Null Max Distribution
60
RFT vs Bonf. vs Perm.
Verbal Fluency
Location Switching
Task Switching
Faces: Main Effect
Faces: Interaction
Item Recognition
Visual Motion
Emotional Pictures
Pain: Warning
Pain: Anticipation
df
4
9
9
11
11
11
11
12
22
22
t Threshold
(0.05 Corrected)
RF
Bonf
Perm
4701.32 42.59 10.14
11.17
9.07
5.83
10.79 10.35
5.10
10.43
9.07
7.92
10.70
9.07
8.26
9.87
9.80
7.67
11.07
8.92
8.40
8.48
8.41
7.15
5.93
6.05
4.99
5.87
6.05
5.05
RFT vs Bonf. vs Perm.
Verbal Fluency
Location Switching
Task Switching
Faces: Main Effect
Faces: Interaction
Item Recognition
Visual Motion
Emotional Pictures
Pain: Warning
Pain: Anticipation
df
4
9
9
11
11
11
11
12
22
22
RF
0
0
4
127
0
5
626
0
127
74
No. Significant Voxels
(0.05 Corrected)
t
SmVar t
Bonf Perm
Perm
0
0
0
0
158
354
6
2241
3447
371
917
4088
0
0
0
5
58
378
1260
1480
4064
0
0
7
116
221
347
55
182
402
Reliability with Small Groups
• Consider n=50 group study
– Event-related Odd-Ball paradigm, Kiehl, et al.
• Analyze all 50
– Analyze with SPM and SnPM, find FWE thresh.
• Randomly partition into 5 groups 10
– Analyze each with SPM & SnPM, find FWE
thresh
• Compare reliability of small groups with full
Skip
– With and without variance smoothing
.
63
SPM t11: 5 groups of 10 vs all 50
5% FWE Threshold
T>10.93
T>11.04
T>11.01
10 subj
10 subj
10 subj
2 8 11 15 18 35 41 43 44 50
1 3 20 23 24 27 28 32 34 40
9 13 14 16 19 21 25 29 30 45
T>10.69
T>10.10
T>4.66
10 subj
10 subj
all 50
4 5 10 22 31 33 36 39 42 47
6 7 12 17 26 37 38 46 48 49
64
SnPM t: 5 groups of 10 vs. all 50
5% FWE Threshold
T>7.06
T>8.28
10 subj
10 subj
2 8 11 15 18 35 41 43 44 50
T>6.49
10 subj
4 5 10 22 31 33 36 39 42 47
1 3 20 23 24 27 28 32 34 40
T>6.3
10 subj
9 13 14 16 19 21 25 29 30 45
T>6.19
T>9.00
T>4.09
10 subj
all 50
6 7 12 17 26 37 38 46 48 49
Arbitrary thresh 65of 9.0
SnPM SmVar t: 5 groups of 10 vs. all 50
5% FWE Threshold
T>4.69
T>5.04
T>4.57
10 subj
10 subj
10 subj
2 8 11 15 18 35 41 43 44 50
1 3 20 23 24 27 28 32 34 40
9 13 14 16 19 21 25 29 30 45
T>9.00
T>4.84
10 subj
4 5 10 22 31 33 36 39 42 47
T>4.64
10 subj
6 7 12 17 26 37 38 46 48 49
all 50
Arbitrary thresh 66of 9.0