PowerPoint 演示文稿 - Computer Science Department

Download Report

Transcript PowerPoint 演示文稿 - Computer Science Department

Computational Biology VI
Jianfeng Feng
Warwick University
Our module
Data Processing Pipeline
Experimental
Design
Raw Data
Acquisition
Reconstruction
Preprocessing
Data Analysis
Slice-time
Correction
Localizing
Brain Activity
Motion Correction,
Co-registration &
Normalization
Spatial
Smoothing
Linear
Model
Granger
causality
Connectivity
Applications:
Clinical
Outline
1. GLM
1. Model building
2. Noise module
3. Inferences
Multi-comparison correction
( after next week)
1: The General Linear Model
Statistical Analysis
• There are multiple goals in the statistical
analysis or machine learning side of
fMRI data.
• They include:
– localizing brain areas activated by the task;
– determining networks corresponding to brain
function; and
– making predictions about psychological or disease
states.
Human Brain Mapping
• The most common use of fMRI to date has been
to localize areas of the brain that activate in
response to a certain task.
• These types of human brain
mapping studies are necessary
for the development of
biomarkers and increasing our
understanding of brain
function.
Massive Univariate Approach
• Typically analysis is performed by constructing a
separate model at each voxel
– The ‘massive univariate approach’.
– Assumes an improbable independence between voxel
pairs......
• Typically dependencies between voxels are dealt
with later using random field theory, which makes
assumptions about the spatial dependencies
between voxels.
General Linear Model
• The general linear model (GLM) approach treats
the data as a linear combination of model
functions (predictors) plus noise (error).
• The model functions are assumed to have known
shapes, but their amplitudes are unknown and
need to be estimated.
y(t) = b0 + b1 f1(t) +… + + bp fp(t) + e(t)
• The GLM framework encompasses many of the
commonly used techniques in fMRI data analysis
(and big data analysis more generally).
Illustration
• Consider an experiment of alternating blocks of
finger-tapping and rest.
• Construct a model to study data from a single
voxel for a single subject.
• We seek to determine
whether activation is higher
during finger-tapping
compared with rest.
fMRI Data
Design matrix
=
BOLD signal
Intercept
Residuals
Model
parameters
X
𝛽0
𝛽1
+
Predicted task
response
1 0 


y1 
1 0 




y 2 
1 0 


1 1 
 
 y   1 1 x b b   e
0 1


 300 
1 1 
 y 301 




1 0 
 
 
y 
 N 


1 0 


fMRI Data
Design matrix
=
Model
parameters
X
𝛽0
𝛽1
In simple formula: y(t) = b0 + b1 f1(t) + e1(t)
Residuals
+
fMRI Data
Design matrix
=
BOLD signal
Intercept
Model
parameters
X
𝛽0
𝛽1
Predicted task
response
Residuals
+
H0 : b1  0
Null hypothesis
GLM
A standard GLM can be written:
y(t) = b0 + b1 f1(t) +… + + bp fp(t) + e(t) or
Y  Xβ  ε
ε ~ N(0, V)
where
1 𝑋11
𝑌1
1 𝑋21
𝑌2
=
⋮
⋮
⋮
𝑌𝑇
1 𝑋𝑇1
fMRI Data
⋯
⋯
𝑋1𝑝
𝜀1
𝛽1
𝜀2
𝑋2𝑝
𝛽2
×
+ ⋮
⋮
⋮
𝜀𝑇
𝛽𝑝
⋯ 𝑋𝑇𝑝
Design matrix
Noise
Regression coefficients
V is the covariance
matrix whose format
depends on the
noise model.
The quality of the
model depends on our
choice of X and V.
In matrix format (lecture 4, time series)
For k dimensional case
Y (t) = AY (t -1) + e (t)
A
= Y (T )Y (T -1)'[Y (T -1)Y (T -1)']-1
where the recorded data (Y(1), Y(2), Y(3), …, Y(T)), denoting
Y[T ]
æ y1 (2) y1 (3) ... y1 (T ) ö
ç
÷
= ç...
...
... ... ÷
ç y (2) y (3) ... Y (T )÷
è k
ø
k
k
æ y1 (1) y1 (2) ... y1 (T -1) ö
ç
÷
Y[T -1] = ç...
...
... ...
÷
ç y (1) y (2) ... Y (T -1)÷
è k
ø
k
k
Problem Formulation
• Assume the model:
Y  Xβ  ε
ε ~ N(0,I 2 )
• The matrices X and Y are assumed to be known,
and the noise is considered to be uncorrelated.
• Our goal is to find the value of b that minimizes:
y  Xβ T y  Xβ 
OLS Solution
Ordinary least squares solution
βˆ  (XT X) 1 XT y
Can we do better?
Properties:
Maximum likelihood estimate

E bˆ  b

Var(bˆ )   2 XT X
1

Gauss Markov Theorem
• The Gauss-Markov Theorem states that any other
unbiased estimator of b will have a larger variance
than the OLS solution.
• Assume 𝛽 is an unbiased estimator of b.
• Then according to G-M Theorem,
Var( 𝛽)  Var( 𝛽)
• bˆ is the best linear unbiased estimator (BLUE) of b.
Estimation
• If e is i.i.d., then Ordinary Least Square (OLS)
estimate is optimal
model
estimate
Y  Xβ  ε
βˆ  (X' X) 1 X'Y
• If Var(e) =V2  I2, then Generalized Least
Squares (GLS) estimate is optimal
model
Y  Xβ  ε
estimate
βˆ  (X'V 1X) 1 X'V 1Y
Summary
estimate
model
Y  Xβ  ε
fitted values
ˆ  Xbˆ
Y
βˆ  (X'V 1X) 1 X'V 1Y
residuals
ˆ
r  Y Y
 (I  (X'V1X)1 X'V1 )Y
 RY
Estimating the Variance
• Even if we assume e is i.i.d., we still need to
estimate the residual variance, 2.
T
• For OLS:
rT r
2
ˆ  N  p
• Estimating V  I more difficult, using iterative methods.
Estimation
• If e is i.i.d., then Ordinary Least Square (OLS)
estimate is optimal
model
Y  Xβ  ε
estimate
βˆ  (X' X) 1 X'Y
• If Var(e) =V2  I2, then Generalized Least
Squares (GLS) estimate is optimal
model
Y  Xβ  ε
estimate
βˆ  (X'V 1X) 1 X'V 1Y
Model Refinement
• This model has a number of shortcomings.
• We want to use our understanding of the
signal and noise properties of BOLD fMRI to
aid us in constructing appropriate models.
• This includes deciding on an appropriate
design matrix, as well as an appropriate noise
model.
Issues
1.
BOLD responses have a
delayed and dispersed form.
clear all
close all
for i=1:400
h=0.1;
x(i)=i*h;
y(i)=x(i)*x(i)*x(i)*x(i)*exp(-x(i));
end
for i=41:350
m(i)=y(i-40);
y(i)=y(i)-m(i)*0.2;
end
plot(x([1:300]),y([1:300])/max(y))
2.
The fMRI signal includes
substantial amounts of lowfrequency noise.
3.
The data are serially correlated which needs to be
considered in the model.
2: Model Building
General Linear Model
A standard GLM can be written:
Y  Xβ  ε
ε ~ N(0, V)
where
1 𝑋11
𝑌1
1 𝑋21
𝑌2
=
⋮
⋮
⋮
𝑌𝑇
1 𝑋𝑇1
fMRI Data
⋯
⋯
⋯
𝑋1𝑝
𝜀1
𝛽1
𝜀2
𝑋2𝑝
𝛽2
×
+ ⋮
⋮
⋮
𝜀𝑇
𝛽𝑝
𝑋𝑇𝑝
Design matrix
Noise
Model parameters
V is the covariance
matrix whose format
depends on the
noise model.
The quality of the
model depends on our
choice of X and V.
Model Building
• Proper construction of the design matrix is critical for effective use of
the GLM.
• This process can be complicated by the following properties of the
BOLD response:
– It includes low-frequency noise and artifacts related to
head movement and cardiopulmonary-induced brain
movement.
– The neural response shape may not be known.
– The hemodynamic response varies in shape across the
brain.
BOLD Response
• Predict the shape of the BOLD response to a
given stimulus pattern. Assume the shape is
known and the amplitude is unknown.
• The relationship between stimuli and the BOLD
response is typically modeled using a linear time
invariant (LTI) system.
• In an LTI system an impulse (i.e., neuronal
activity) is convolved with an impulse response
function (i.e., HRF).
Convolution
Examples
Experimental
Stimulus Function
Hemodynamic
Response
Function
Predicted
Response
Block Design
HRF Models
• Often a fixed canonical HRF is used to model
the response to neuronal activity
- Linear combination of 2 gamma functions.
- Optimal if correct.
- If wrong, leads to bias
and power loss.
Unlikely that the same HRF is
valid for all voxels.
True response may be faster/slower
True response may have smaller/bigger undershoot
fMRI Data
Design matrix
=
BOLD signal
Intercept
Model
parameters
X
𝛽0
𝛽1
Predicted task
response
Residuals
+
H 0 : b1  0
Example
Model
Single HRF
Image of
predictors
Data & Fitted
fMRI Data
Design matrix
=
BOLD signal
Intercept
Model
parameters
X
𝛽0
𝛽1
Predicted task
response
Residuals
+
H 0 : b1  0
Example
In mathematical term, we have
y(t) = b0 + b1 f1(t) + e(t) or
Y  Xβ  ε
Problems
The HRF shape
depends both on the
vasculature and the
time course of
neural activity.
Assuming a fixed
HRF is usually not
appropriate.
Checkerboard
Thermal pain
Stimulus
On
Aversive picture,
Aversive anticipation
Temporal Basis Functions
• To allow for different types of HRFs in different
brain regions, it is typically better to use temporal
basis functions.
• A linear combination of functions can be used to
account for delays and dispersions in the HRF.
– The stimulus function is convolved with each of the
basis functions to give a set of regressors.
– The parameter estimates give the coefficients that
determine the combination of basis functions that best
models the HRF for the trial type and voxel in question.
Temporal Basis Functions
• In an LTI system the BOLD response is modeled
x(t)  (s  h)(t)
where s(t) is a stimulus function and h(t) the HRF and * is the convolution.
• Model the HRF as a linear combination of temporal basis functions,
fi(t), such that
h(t) 
 b f (t)
i i
Temporal Basis Functions
h(t) 
h(t) 
b1
 b2
b3
 b f (t)
i i
Temporal Basis Functions
• The BOLD response can be rewritten:
x(t)   bi (s  fi )(t)
• In the GLM framework the convolution of the
stimulus function with each basis function makes
up a separate column of the design matrix.
• Each corresponding bi describes the weight of
that component.
Temporal Basis Functions
• Typically-used models vary in the degree they make a priori
assumptions about the shape of the response.
• In the most extreme case, the shape of the HRF is fixed and
only the amplitude is allowed to vary.
• By contrast, a finite impulse response (FIR) basis set,
contains one free parameter for every time- point following
stimulation for every cognitive event type.
Finite Impulse Response
The model estimates an HRF of arbitrary shape for
each event type in each voxel of the brain
Basis sets
Model
Single HRF
HRF +
derivatives
Finite
Impulse
Response
(FIR)
Time (s)
Image of
predictors
Data & Fitted
Basis sets
Model
Image of
predictors
Data & Fitted
Single HRF
HRF +
derivatives
Finite
Impulse
Response
(FIR)
Overfitting ?????
Time (s)
Nuisance Covariates
• Often model factors associated with known
sources of variability, but that are not related to
the experimental hypothesis, need to be included
in the GLM.
• Examples of possible ‘nuisance regressors’:
– Signal drift
– Physiological (e.g., respiration) artifacts
– Head motion, e.g. six regressors comprising of three
translations and three rotations.
Sometimes transformations of the six regressors also
included.
Drift
• Slow changes in voxel intensity over time (lowfrequency noise) is present in the fMRI signal.
• Scanner instabilities and not motion or
physiological noise may be the main cause of the
drift, as drift has been seen in cadavers.
• Need to include drift parameters in our models.
-
Use splines, polynomial basis or discrete cosine basis
Drift
• Blue curve is the data
• Red curve is what we expected
Model with Drift
• Selecting fi as cos(i w t) and sin(i w t) for example (as in MP3, only cos is used)
b1
b
b
b
b
=
b
+
b
b
b
b1
b11
Y
=
X

b
+
e
High Pass Filtering
data
blue =
black = mean + low-frequency drift
green = predicted response, taking into
account low-frequency drift
red =
predicted response (with lowfrequency drift explained away)
Physiological Noise
•
Respiration and heart beat give rise to highfrequency noise.
•
This type of noise is difficult to remove and
is often left in the data giving rise to
temporal autocorrelations.
3: Noise Models
GLM
A standard GLM can be written:
Y  Xβ  ε
ε ~ N(0, V)
where
1 𝑋11
𝑌1
1 𝑋21
𝑌2
=
⋮
⋮
⋮
𝑌𝑛
1 𝑋𝑛1
fMRI Data
⋯
⋯
𝑋1𝑝
𝜀1
𝛽1
𝜀2
𝑋2𝑝
𝛽2
×
+ ⋮
⋮
⋮
𝜀𝑛
𝛽𝑝
𝑋𝑛𝑝
⋯
Design matrix
Noise
Regression coefficients
V is the covariance
matrix whose format
depends on the
noise model.
The quality of the
model depends on our
choice of X and V.
Design Matrix
• We has previously discussed various signal and
nuisance components that can be included in the
design matrix to improve the model.
– Temporal Basis functions
Allows for flexible HRF
– Motion parameters
Corrects for ‘spin history’ artifacts
fMRI Noise
• Functional MRI data typically exhibit significant
autocorrelation.
– Caused by physiological noise and low frequency drift,
that has not been appropriately modeled.
– Typically modeled using either an AR(p)
process.
– Single subject statistics are not valid without an
accurate model of the noise.
AR(1) model
• Serial correlation can be modeled using
a first-order autoregressive model, i.e.
et  et1  ut
ut ~ N (0,  )
2
• The error term et depends on the previous
error term et-1 and a new disturbance term ut.
AR(1) model
• The autocorrelation function (ACF) for an AR(1) process:
1, if h  0,
𝜌 ℎ = |h|
 if h  0
1.
0
0.
5
0.
0
0.
5
1.
0
=0.7
5
10
1 :1 6
15
Error Term
• The format of V will depend on what noise
model is used.
IID Case
1
0
𝐕∝ 0
⋮
0
0
1
0
⋮
0
0
0
1
⋮
0
⋯ 0
⋯ 0
⋯ 0
⋮
⋯ 1
AR(1) Case
1
𝜙
𝐕 ∝ 𝜙2
⋮
𝜙 𝑛−1
𝜙
1
𝜙
⋮
𝜙 𝑛−2
𝜙2
𝜙
1
⋮
𝜙 𝑛−3
⋯
⋯
⋯
⋯
𝜙 𝑛−1
𝜙 𝑛−2
𝜙 𝑛−3
⋮
1
GLM Summary
estimate
model
Y  Xβ  ε
fitted values
ˆ  Xbˆ
Y
βˆ  (X'V 1X) 1 X'V 1Y
residuals
ˆ
r  Y Y
 (I  (X'V1X)1 X'V1 )Y
 RY
Estimating V
• In general the form of the covariance matrix is
unknown, which means it has to be
estimated.
• Estimating V depends on b’s, and estimating
b’s depends on V. Need iterative procedure.
Iterative Procedure
1. Assume that V=I and calculate the OLS
solution.
2. Estimate the parameters of V using the
residuals.
3. Re-estimate the b values using the
estimated covariance matrix 𝐕 from step 2.
4. Iterate until convergence.
Spatio-temporal Behavior
• The spatiotemporal behavior of these noise
processes is complex.
Spatial maps of the model parameters from an AR(2)
model estimated for each voxel’s noise data.
4: Inference
GLM Summary
estimate
model
Y  Xβ  ε
fitted values
ˆ  Xbˆ
Y
βˆ  (X'V 1X) 1 X'V 1Y
residuals
ˆ
r  Y Y
 (I  (X'V1X)1 X'V1 )Y
 RY
Inference
• After fitting the GLM use the estimated
parameters to determine whether there is
significant activation present in the voxel.
• Inference is based on the fact that:
bˆ ~ N(b, (XT V1X)1 )
• Use t or F test to perform tests on effects
of interest.
Contrasts
• It is often of interest to see whether a linear
combination of the parameters are significant.
• The term cTb specifies a linear combination of the
estimated parameters, i.e.
c β  c1b1  c2 b2 …  cn b n
T
• Here c is called a contrast vector.
Example
• Experiment with two types of stimuli.
H 0 : b 2  b3
=
b1
b2
b3
+ Noise
H0 : c β  0
T
cT  0,1, 1
T-test
• To test
H0 : c β  0
Ha : c β  0
T
T
use the t-statistic:
T
cT βˆ
 
Var cT βˆ
• Under H0, T is approximately t() with  
(tr(RV))2
2
tr((RV) )
Multiple Contrasts
• We often want to make simultaneous tests of
several contrasts at once.
• c is now a contrast matrix.
• Suppose
𝑪=
1 0 0
0 1 0
then
𝒄𝑇𝛽
𝛽1
=
𝛽2
0 0
0 0
Example
Consider a model with box-car shaped activation and
drift modeled using the discrete cosine basis.
b1
b
b
=
b
+
b
b
b
b
b
Y
=
X

b
+
e
Example
Do the drift components add anything to the model?
Test: H 0 : c b  0
T
where
𝑪=
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
Example
This is equivalent to testing:
H0 : b3 b4 b5 b6 b7 b8 b9   0
T
To understand what this implies, we split the design
matrix into two parts:
1
1
⋮
1
𝑋11
𝑋21
⋮
𝑋𝑛1
X0
𝑋12
𝑋22
⋮
𝑋𝑛2
⋯
⋯
𝑋19
𝑋29
⋮
⋯ 𝑋𝑛9
X1
Example
• Do the drift components add anything to the
model?
• The X1 matrix explains the drift. Does it
contribute in a significant way to the model?
• Compare the results using the full model, with
design matrix X, with those obtained using a
reduced model, with design matrix X0.
F-test
• Test the hypothesis using the F-statistic:
r r

r r
F 2
ˆ tr((R  R 0 )V 
T
0 0
T
• Assuming the errors are normally distributed,
F has an approximate F-distribution with (0,
) degrees of freedom, where
0 
tr (R  R 0 )V 2

tr (R  R 0)V 
2

and  
tr(RV)2
tr((RV)2 )
Statistical Images
• For each voxel a hypothesis test is performed.
The statistic corresponding to that test is used to
create a statistical image over all voxels.
T-value
Localizing Activation
1. Construct a model for each voxel of the brain.
–
–
“Massive univariate approach”
Regression models (GLM) commonly used.
b1
b
b
=
b
+
b
b
b
b
b
Y
=
X

b
+
e
Localizing Activation
2. Perform a statistical test to determine whether
task related activation is present in the voxel.
H0 : c β  0
T
Statistical image:
Map of t-tests
across all voxels
(a.k.a t-map).
Localizing Activation
3. Choose an appropriate threshold for
determining statistical significance.
Statistical parametric map:
(SPM)
Each significant voxel is
color-coded according to
the size of its p-value.
Statistical Images
How do we determine which voxels are actually active?
Problems:
• The statistics are obtained by performing a large
number of hypothesis tests.
• Many of the test statistics will be artificially inflated
due to the noise.
• This leads to many false positives.
Multiple Comparisons
• Which of 100,000 voxels are significant?
– =0.05  5,000 false positive voxels
• Choosing a threshold is a balance
between sensitivity (true positive rate)
and specificity (true negative rate).
t>1
t>2
t>3
t>4
t>5