Hierarchical Models and Variance Components Will Penny Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course, London, May 2003

Download Report

Transcript Hierarchical Models and Variance Components Will Penny Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course, London, May 2003

Hierarchical Models and
Variance Components
Will Penny
Wellcome Department of Imaging Neuroscience,
University College London, UK
SPM Course, London, May 2003
Outline

Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level)

General Framework
Multiple variance components and Hierarchical models

Multiple variance components
F-tests and conjunctions @2nd level
Modelling fMRI serial correlation @1st level

Hierarchical models for Bayesian Inference
SPMs versus PPMs
Outline

Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level)

General Framework
Multiple variance components and Hierarchical models

Multiple variance components
F-tests and conjunctions @2nd level
Modelling fMRI serial correlation @1st level

Hierarchical models for Bayesian Inference
SPMs versus PPMs
Random Effects Analysis:Summary-Statistic Approach
1st Level
Data
2nd Level
Design Matrix
t
cˆ
Vaˆr (cˆ )
Contrast Images
^

1
SPM(t)
^ 1
^

2
^ 
^

11
^ 11
^

^

12
^ 1
One-sample
t-test @2nd level
Validity of approach

Gold Standard approach is EM – see later –
estimates population mean effect as MEANEM
the variance of this estimate as VAREM

For N subjects, n scans per subject and equal within-subject variance
we have
VAREM = Var-between/N + Var-within/Nn

In this case, the SS approach gives the same results, on average:
^  MEANEM
Avg[]
^ =VAREM
Avg[Var()]

In other cases, with N~12, and typical ratios of between-subject to within-subject
variance found in fMRI, the SS approach will give very similar results to EM.
Example: Multi-session study of
auditory processing
SS results
EM results
Friston et al. (2003) Mixed effects and fMRI studies, Submitted.
Two populations
Contrast images
Estimated
population
means
Two-sample
t-test @2nd level
Outline

Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level)

General Framework
Multiple variance components and Hierarchical models

Multiple variance components
F-tests and conjunctions @2nd level
Modelling fMRI serial correlation @1st level

Hierarchical models for Bayesian Inference
SPMs versus PPMs
The General Linear Model
y=X  +e
N1
NL
L 1
Ce  I
N1
Error covariance
N
2 Basic Assumptions
 Identity
 Independence
We assume ‘sphericity’
N
Multiple variance components
C   Q
y=X  +e
N1
NL
L 1
e
N1
k
k
k
Error covariance
N
Errors can now have
different variances and
there can be correlations
We allow for ‘nonsphericity’
N
Non-Sphericity
Error Covariance

Errors are independent
but not identical

Errors are not
independent
and not identical
General Framework
Hierarchical Models
y  X (1) (1)  e (1)
 (1)  X ( 2) ( 2)  e ( 2)

Multiple variance components
at each level
C   Q
(i)
(i)
e
k
k
(i)
k
 ( n 1)  X ( n ) ( n )  e ( n )
With hierarchical models we can define priors and make
Bayesian inferences.
If we know the variance components we can compute
the distributions over the parameters at each level.
Estimation
yX 
(1)
(1)
e
(1)
 (1)  X ( 2) ( 2)  e ( 2)
EM algorithm
E-Step

 ( n 1)  X ( n ) ( n )  e ( n )
M-Step
C
y
 (X T C e1 X
h
y
 C  X T C e1 y
y
r  y  Xh
)
1
y
for i and j {





g i  tr {Q i C e 1 }  r T C e 1Q i C e 1 r  tr {C  X T C e 1Q i C e 1 X }
y
C   Q
(i)
(i)
e
k
k


J ij  tr {Q j C e 1Q i C e 1 }
(i)
k
}
    J 1 g
C e  C 

k
Qk
Friston, K. et al. (2002), Neuroimage
Algorithm Equivalence
Hierarchical
model
y  X (1) (1)  e (1)
 (1)  X ( 2) ( 2)  e ( 2)

Parametric
Empirical
Bayes (PEB)
 ( n 1)  X ( n ) ( n )  e ( n )
EM=PEB=ReML
Single-level
model
y  e (1)  X (1)e (2)    X (1)  X (n1)e (n)  X (1)  X (n) (n)
Restricted
Maximimum
Likelihood
(ReML)
Outline

Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level)

General Framework
Multiple variance components and Hierarchical models

Multiple variance components
F-tests and conjunctions @2nd level
Modelling fMRI serial correlation @1st level

Hierarchical models for Bayesian Inference
SPMs versus PPMs
Non-Sphericity
Error Covariance

Errors are independent
but not identical

Errors are not
independent
and not identical
Non-Sphericity
Error can be Independent but Non-Identical when…
1) One parameter but from different groups
e.g. patients and control groups
2) One parameter but design matrices differ across subjects
e.g. subsequent memory effect
Non-Sphericity
Error can be Non-Independent and Non-Identical when…
1) Several parameters per subject
e.g. Repeated Measurement design
2) Conjunction over several parameters
e.g. Common brain activity for different cognitive processes
3) Complete characterization of the hemodynamic response
e.g. F-test combining HRF, temporal derivative and dispersion regressors
Example I
Stimuli:
Subjects:
U. Noppeney et al.
Auditory Presentation (SOA = 4 secs) of
(i) words and (ii) words spoken backwards
(i) 12 control subjects
(ii) 11 blind subjects
“click”
jump
touch
koob
Scanning: fMRI, 250 scans per subject, block design
Q. What are the regions that activate for real words relative to
reverse words in both blind and control groups?
Independent but Non-Identical Error
1st Level
2nd Level
Controls
Blinds
Controls and Blinds
Conjunction
between the
2 groups
Example 2
Stimuli:
Auditory Presentation (SOA = 4 secs) of words
motion
“jump”
jump
Subjects:
U. Noppeney et al.
sound
“click”
“click”
visual
action
“pink”
“turn”
touch
(i) 12 control subjects
Scanning: fMRI, 250 scans per subject, block design
Q. What regions are affected by the semantic content of
the words ?
Non-Independent and Non-Identical Error
1st Leve
visual
sound
?
=
hand
?
=
motion
?
=
2nd Level
F-test
Example III
Stimuli:
U. Noppeney et al.
(i) Sentences presented visually
(ii) False fonts (symbols)
Some of the sentences are syntactically primed
Scanning: fMRI, 250 scans per subject, block design
Q. Which brain regions of the “sentence reading system”
are affected by Priming?
Non-Independent and Non-Identical Error
1st Level
Sentence > Symbols
No-Priming>Priming
Orthogonal
contrasts
2nd Level
Conjunction
of 2 contrasts
Left
Anterior
Temporal
Example IV
Modelling serial correlation in fMRI time series
Model errors for each subject
as AR(1) + white noise.
Outline

Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level)

General Framework
Multiple variance components and Hierarchical models

Multiple variance components
F-tests and conjunctions @2nd level
Modelling fMRI serial correlation @1st level

Hierarchical models for Bayesian Inference
SPMs versus PPMs
Bayes Rule
Example 2:Univariate model
Likelihood and Prior
y   (1)  e (1)
(2)
( 2)




e
(1)
p( y |  )  N ( ,  )
(1)
(1)
(1)
p( )  N ( ,  )
(1)
( 2)
( 2)
Posterior
p ( (1) | y )  N (m(1) , P(1))
P  
(1)
(1)
(1)
m 

P
( 2)
(1)
(1)


(1)

 ( 2)
m(1)  (1)
( 2)
P
(1) 
( 2)
Relative Precision Weighting
Example 2:Univariate model
Likelihood and Prior
y   (1)  e (1)
AIM: Make inferences based on
posterior distribution
(2)
( 2)




e
(1)
p( y |  )  N ( ,  )
(1)
(1)
(1)
p( )  N ( ,  )
(1)
( 2)
( 2)
Similar expressions exist
for posterior distributions
in multivariate models
Posterior
p ( (1) | y )  N (m(1) , P(1))
P  
(1)
(1)
(1)
m 

P
( 2)
(1)
(1)


(1)

( 2)
P
(1) 
( 2)
But how do we compute the
variance components or
‘hyperparameters’ ?
Estimation
yX 
(1)
(1)
e
(1)
 (1)  X ( 2) ( 2)  e ( 2)
EM algorithm
E-Step

 ( n 1)  X ( n ) ( n )  e ( n )
M-Step
C
y
 (X T C e1 X
h
y
 C  X T C e1 y
y
r  y  Xh
)
1
y
for i and j {





g i  tr {Q i C e 1 }  r T C e 1Q i C e 1 r  tr {C  X T C e 1Q i C e 1 X }
y
C   Q
(i)
(i)
e
k
k


J ij  tr {Q j C e 1Q i C e 1 }
(i)
k
}
    J 1 g
C e  C 

k
Qk
Friston, K. et al. (2002), Neuroimage
Estimating mean and variance
Maximum Likelihood (ML), maximises p(Y|,)
1 N
   yn
N n1
1 N
2
  ( yn   )
 N n1
1
Expectation-Maximisation (EM),
maximises p(Y |  )   p(Y |  ,  ) p(  )d
1 N
   yn
N n1
for ‘vague’ prior on 
1 N
2

( yn   )

 N  1 n1
1
Estimating mean and variance
For a prior on  with prior mean 0 and prior precision 
Expectation-Maximisation (EM) gives


N
yn

N
n 1
where
N
0 
1
N  
Larger  more shrinkage
1
1

 N 
N
 ( yn
n 1
)
2
Estimating mean and variance at
multiple voxels
For a prior on  over voxels with prior mean 0 and prior precision 
Expectation-Maximisation (EM) gives at voxel i=1..V, scan n=1..N
i
N
 i  N  yi ,n
1
i
n 1
1 N
2

( yi , n   i )

N   i n1
where
0  i 
N i
N  i 
1
1


1
V
 i
2
i  i i1
Prior precision can be estimated from data. If mean activation over
all voxels is 0 then these EM estimates are more accurate than ML
The Interface
WLS
Parameters,
REML
Hyperparameters
PEB
Parameters
and
Hyperparameters
No Priors
Shrinkage
priors
Bayesian Inference
1st level = within-voxel
y  X (1) (1)  e (1)
(1)
( 2)
( 2) ( 2)
  X  e
2nd level = between-voxels
Likelihood
Shrinkage Prior
In the absence of evidenc
to the contrary parameter
will shrink to zero
Bayesian Inference: Posterior Probability Maps
p( | y)  p( y |  ) p( )
PPMs
Posterior
Likelihood
Prior
SPMs

u
p(t |   0)
p( | y)

t  f ( y)
SPMs and PPMs
rest [2.06]
rest
contrast(s)
<
PPM 2.06
SPMresults: C:\home\spm\analysis_PET
Height threshold P = 0.95
Extent threshold k = 0 voxels
SPMmip
[0, 0, 0]
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
60
<
SPM{T39.0}
SPMresults: C:\home\spm\analysis_PET
Height threshold T = 5.50
Extent threshold k = 0 voxels
1 4 7 10 13 16 19 22
Design matrix
PPMs: Show activations
of a given size
<
3
<
4
<
SPMmip
[0, 0, 0]
<
contrast(s)
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
60
1 4 7 10 13 16 19 22
Design matrix
SPMs: show voxels with
non-zero activations
PPMs
Advantages
One can infer a cause
DID NOT elicit a response
SPMs conflate effect-size
and effect-variability
P-values don’t change with
search volume
For reasonable thresholds have
intrinsically high specificity
Disadvantages
Use of shrinkage
priors over voxels
is computationally
demanding
Utility of Bayesian
approach is yet
to be established
Summary

Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level)

Multiple variance components
F-tests and conjunctions @2nd level
Modelling fMRI serial correlation @1st level

Hierarchical models for Bayesian Inference
SPMs versus PPMs

General Framework
Multiple variance components and Hierarchical models