Jody Culham Brain and Mind Institute Department of Psychology Western University http://www.fmri4newbies.com/ fMRI Techniques to Investigate Neural Coding: Multivoxel Pattern Analysis (MVPA) Last Update: January 18, 2012 LastCourse: Update: November9223, 24, 2014 Last Psychology W2010,

Download Report

Transcript Jody Culham Brain and Mind Institute Department of Psychology Western University http://www.fmri4newbies.com/ fMRI Techniques to Investigate Neural Coding: Multivoxel Pattern Analysis (MVPA) Last Update: January 18, 2012 LastCourse: Update: November9223, 24, 2014 Last Psychology W2010,

Jody Culham
Brain and Mind Institute
Department of Psychology
Western University
http://www.fmri4newbies.com/
fMRI Techniques to
Investigate Neural Coding:
Multivoxel Pattern Analysis (MVPA)
Last Update: January 18, 2012
LastCourse:
Update:
November9223,
24, 2014
Last
Psychology
W2010, University of Western Ontario
Last Course: Psychology 9223, F2014
Limitations of Subtraction Logic
• Example: We know that neurons in the brain can be tuned
for individual faces
“Jennifer Aniston” neuron in human medial temporal lobe
Quiroga et al., 2005, Nature
high
activity
fMRI spatial resolution: 1 voxel
3 mm
Fusiform Face Area (FFA)
mm
33mm
low
activity
3 mm
3 mm
A voxel might contain millions of neurons, so the fMRI signal
represents the population activity
Limitations of Subtraction Logic
• fMRI resolution is typically around 3 x 3 x 6 mm so each sample comes
from millions of neurons. Let’s consider just three neurons.
Even though
there are
neurons tuned to
each object, the
population as a
whole shows no
preference
Activation
Neuron 3
“likes”
Brad Pitt
Firing Rate
Neuron 2
“likes”
Julia Roberts
Firing Rate
Firing Rate
Neuron 1
“likes”
Jennifer Aniston
Two Techniques with “Subvoxel Resolution”
•
“subvoxel resolution” = the ability to
investigate coding in neuronal populations
smaller than the voxel size being sampled
1. Multi-Voxel Pattern Analysis (MVPA or
decoding or “mind reading”)
2. fMR Adaptation
(or repetition suppression or priming)
Multivoxel Pattern Analyses
(or decoding or “mind reading”)
high
activity
fMRI spatial resolution: 1 voxel
3 mm
low
activity
3 mm
high
activity
Region Of Interest (ROI): group of voxels
3 mm
3 mm
low
activity
3 mm
Voxel Pattern Information
3 mm
Condition 1
3 mm
3 mm
L
R
Condition 2
Spatial Smoothing
No smoothing
4 mm FWHM
7 mm FWHM
10 mm FWHM
• most conventional fMRI studies spatially
smooth (blur) the data
– increases signal-to-noise
– facilitates intersubject averaging
• loses information about the patterns across
voxels
Effect of Spatial Smoothing
and Intersubject Averaging
3 mm
3 mm
3 mm
Standard fMRI Analysis
FACES
HOUSES
trial 1
trial 1
trial 2
trial 2
trial 3
trial 3
Average
Summed
Activation
Perhaps voxels contain useful information
• In traditional fMRI analyses, we average across
the voxels within an area, but these voxels may
contain valuable information
• In traditional fMRI analyses, we assume that an
area encodes a stimulus if it responds more, but
perhaps encoding depends on pattern of high
and low activation instead
• But perhaps there is information in the pattern of
activation across voxels
Decoding for Dummies
Kerri Smith, 2013, Nature, “Reading Minds”
Approaches to Multi-Voxel Pattern Analysis
1. MVPA classifier
2. MVPA correlation: Basic approach
3. MVPA correlation: Representational similarity
analysis
Preparatory Steps
Initial Steps
• Step 1: Select a region of interest (ROI)
– e.g. a cube centred on an activation hotspot
• [15 mm (5 functional voxels)]3 = 3,375 mm3
= 125 functional voxels
• DO NOT SPATIALLY SMOOTH THE DATA
• Step 2 : Extract a measure of brain activation
from each of the functional voxels within the ROI
• β weights
– z-normalized
– %-transformed
• % BOLD signal change
– minus baseline
• t-values
– β/error
MVPA Methods
• block or event-related data
• resolution
– works even with moderate resolution (e.g., 3 mm
isovoxel)
– tradeoff between resolution and coverage, SNR
– preprocessing
• usually steps apply (slice scan time correction,
motion correction, low pass temporal filter)
– EXCEPT: No spatial smoothing!
• Model single subjects, not combined group data
(at least initially)
Classifier Approach
Classifier Approach
FACES
Training
Trials
trial 1
trial 1
trial 2
trial 2
trial 3
trial 3
…
Test
Trials
(not in training set)
HOUSES
…
Can an algorithm
correctly “guess” trial
identity better than
chance (50%)?
Voxel 1
Voxel 2
Activity in Voxel 1
Each dot is one
measurement (trial)
from one condition
(red circles) or the
other (green circles)
Activity in Voxel 2
Faces
Houses
Test set
Activity in Voxel 1
Training set
Activity in Voxel 2
Faces
Houses
Classifier
Can the classifier generalize to untrained data?
Activity in Voxel 1
Test set
Activity in Voxel 2
Classifier Accuracy =
Correct
Incorrect
= 6 = 75 %
8
Faces
Houses
Classifier
Iterative testing (“folds”)
• Example: Leave one-pair out
– 10 trials of faces + 10 trials of houses
– There are 100 possible combinations of trial pairs
•
•
•
•
•
•
•
F1, H1
F1, H2
…
F2, H1
F2, H2
…
F10, H10
– We can train on 9/10 trials of each with 1/10 excluded for 100
iterations
– Average the accuracy across the 100 iterations
• Many options: e.g., Leave one run out; classify the
average of several trials left out
9 voxels  9 dimensions
decision boundary
Each dot is one
measurement (trial)
from one trial type
(red circles) or the
other (blue squares)
simple 2D example
Classifier can act on
single voxels.
Conventional fMRI
analysis would detect
the difference.
Classifier would
require curved
decision boundary
Haynes & Rees, 2006,
Nat Rev Neurosci
Classifier can not act
on single voxels
because distributions
overlap
Classifier can act on
combination of voxels
using a linear
decision boundary
White and black
circles show
examples of correct
and erroneous
classification in the
test set
Where to “Draw the Line”?
• There are different approaches to determining what
line/plane/hyperplane to use as the boundary between classes
• We want an approach with good generalization to untrained data
The most common
approach is the linear
support vector machine
(SVM)
Support Vector Machine (SVM)
• SVM finds a linear decision boundary
that discriminates between two sets of
points
• constrained to have a the largest
possible distance from the closest
points on both sides.
• response patterns closest to the
decision boundary (yellow circles) that
defined the margins are called
“support vectors”.
Mur et al., 2009
Mur et al., 2009
Is decoding better than chance?
Two options
1.Use intersubject variability to determine significance
+/- 95% CI
chance
Mean
Permutation Testing
•
•
•
•
•
randomize all the condition labels
run SVMs on the randomized data
repeat this many times (e.g., 1000X)
get a distribution of expected decoding accuracy
test the null hypothesis (H0) that the decoding
accuracy you found came from this permuted
distribution
Is decoding better than chance?
Two options
2.Permutation Testing
our data  reject H0
upper bound of 95% confidence limits on
permuted distribution
upper quartile of permuted distribution
median of permuted distribution (should be 33.3%)
lower quartile of permuted distribution
Example of MVPA classifier approach:
decoding future actions
Gallivan et al., 2013, eLife
Conditions
Hand and Tool Decoding
+/- 1 SEM
Cross-decoding Logic
• Task-Across-Effector
– Train Grasp vs. Reach for one effector (e.g. Hand)
– Test Grasp vs. Reach for other effector (e.g., Tool)
– If (Accuracy > chance), then area codes task
regardless of effector
% Decoding Accuracy
Hand and Tool Decoding
L SPOC
+/- 1 SEM
% Decoding Accuracy
Hand and Tool Decoding
L SPOC
L SMG
L M1
L aIPS
L PMd
L PMv
+/- 1 SEM
% Decoding Accuracies
Single TR Decoding
Time (volumes)
+/- 1 SEM
Basic Correlation Approach
First Demonstration
MVPA correlation approach
Faces
Houses
trial
1
trial 1
trial
1
trial
2
trial 2
trial
2
trial
3
trial 3
trial
trial 3
3
Average
Summed
Activation
MVPA correlation approach
Houses
Faces
trial
1
trial
trial 1
1
trial
1
trial
2
trial
trial 2
2
trial
2
trial
3
trial
trial 3
3
trial
trial 3
3
TheAverage
same category evokes similar patterns of activity across trials
Summed
Activation
MVPA correlation approach
Faces
Houses
trial
1
trial 1
trial
1
trial
2
trial 2
trial
2
trial
3
trial 3
trial
trial 3
3
Average Similarity Within the same category
Summed
Activation
MVPA correlation approach
Faces
Houses
trial
1
trial 1
trial
trial 1
1
trial
2
trial 2
trial2
trial
2
trial
3
trial 3
trial3
trial
3
Average
Similarity Between different categories
Summed
Activation
Within-category similarity > Between-category similarity
The brain area contains distinct information about
faces and houses
Category-specificity of patterns of response in the ventral
temporal cortex
Haxby et al., 2001, Science
Category-specificity of patterns of response in the ventral
temporal cortex
SIMILARITY MATRIX
ODD RUNS
similarity
EVEN RUNS
high
low
Within-category similarity
Haxby et al., 2001, Science
Category-specificity of patterns of response in the ventral
temporal cortex
SIMILARITY MATRIX
ODD RUNS
similarity
EVEN RUNS
high
low
Between-category similarity
Haxby et al., 2001, Science
Correlation Approach Using
Representational Similarity Analysis
Representational similarity approach (RSA)
• Differently from the MVPA correlation, RSA does not separate stimuli
into a priori categories
MVPA correlation
RSA
ODD RUNS
high
EVEN RUNS
similarity
(correlation)
low
Kriegeskorte et al (2008)
No class boundaries!
high
.
.
.
.
.
.
.
.
similarity
CONDITIONS
TRIALS
C1
low
C96
C1
CONDITIONS
C96
Can compare theoretical models to data
high
similarity
low
Kriegeskorte et al (2008)
Can compare theoretical models to data
Which prediction matrix is more similar to the real data?
high
similarity
low
REAL
DATA
Kriegeskorte et al (2008)
“Metacorrelations”
Calculate
correlation
between
model
correlation
matrix
and
data
correlation
matrix
Can look at metacorrelations to determine
best model or see similarity between areas
Right FFA pattern
is similar to
left FFA pattern
Right FFA pattern
is similar to the
fane-anim prototype
theoretical model
Right FFA pattern
is not very similar to a
low-level vision
theoretical model
Metacorrelation Matrix
Multidimensional Scaling (MDS)
Input = matrix of distances (km here)
Vancouver
Vancouver
Winnipeg
Toronto
Montreal
Halifax
St. John's
Yellowknife
Whitehorse
Winnipeg Toronto
Montreal
Halifax
0
1869
3366
3694
0
1518
1825
0
503
0
St. John's
Yellowknife Whitehorse
4439
5046
1566
1484
2581
3250
1753
2463
1266
2112
3078
4093
792
1613
3194
4261
0
885
3768
4867
0
4127
5233
0
1109
0
Multidimensional Scaling (MDS)
Output = representational space (2D here)
Halifax
Toronto
Montreal
St. John’s
Winnipeg
Vancouver
Yellowknife
Whitehorse
Halifax
Toronto
Montreal
St. John’s
Winnipeg
Vancouver
Yellowknife
Whitehorse
MDS on MVPA Data
MDS
Different Representational Spaces in
Different Areas
Metacorrelation Matrix
MDS on Metacorrelations
Searchlights
Searchlight: 8 Voxel Example
Let’s zoom in on 8 voxels
Spherical Searchlight Cross-Section
• Ideally we’d like to test a spherical volume but the
functional brain image is voxelized so we end up with
a Lego-like sphere
• Typical radius = 4 mm
Kriegeskorte, Goebel &
Bandettini, 2006, PNAS
Moving the Searchlight
55 62 73 67 60 52 48 51
Each value in white is the decoding accuracy for a sphere of 5-voxels diameter
centered on a given voxel
First- and Second-Level Analysis
V1 V2 V3 V4 V5 V6 V7 V8
S1
First-level
Analysis
Second-level
Analysis
55 62 73 67 60 52 48 51
S2
46 52 65 69 60 59 53 48
S3
48 55 62 70 58 52 50 49
…
…
S15
52 55 59 57 56 43 42 52
Average
Decoding
Accuracy
51 59 69 67 59 55 50 50
t(14)
The same 8 voxels
in stereotaxic space
(e.g., Tal space)
0.3
2.0
4.1
3.7
2.9
1.9
1.2
SVM Classifier
Decoding
accuracies for
spheres centred at
each of the eight
voxels in each of the
15 Ss
Do a univariate t-test (which is an
RFX test based on intersubject
variability) at each voxel to calculate
the probability that the decoding
accuracy is higher than chance
0.8
threshold at p < .05
(or use your favorite way of
correcting for multiple comparisons)
p
.81
.06
.001
.008
.012
.08
.25
.44
Thresholded t-map
Second-level Analysis
V1 V2 V3 V4 V5 V6 V7 V8
S1
S2
First-level
Analysis
The principles of a second-level
analysis are the same
regardless of what dependent
variable we are testing
S3
…
…
S15
Beta
Weights
(or Differences in
Beta Weights =
Contrasts)
Second-level
Analysis
V1 V2 V3 V4 V5 V6 V7 V8
0.
1
0.
7
0.
2
Are they sig diff
than zero?
Decoding
Accuracies
51 59 69 67 59 55 50 50
Are they sig diff
than chance?
Correlations
Between
Model and
MVPA data
.03
Are they sig diff
than zero?
.22
1.
2
.41
1.
5
.50
1.
1
.38
0.
5
.19
0.
3
-.01
.04
UNIVARIATE
VOXELWISE
ANALYSIS
MULTIVARIATE
SEARCHLIGHT
ANALYSIS
Regions vs. Brains
• Univariate ROI analysis is to univariate
voxelwise analysis as multivariate ROI analysis
is to multivariate searchlight analysis
There are no differences at the second-level analysis
• It’s a way to find things by searching the whole brain
• Subjects’ brains must be aligned (Talairach, MNI or surface space)
• The same problems and solutions for multiple comparisons arise
• Degrees of freedom = #Ss - 1
There are differences at the first-level analysis
• Univariate voxelwise analyses are done one voxel at a time
• Multivariate searchlight analyses are done one sphere at a time
MVPA Searchlight
Kriegeskorte, Goebel & Bandettini, 2006, PNAS
Activation- vs. information-bases analysis
Activation-based (standard fMRI analysis):
regions more strongly active during face than house
perception
Information-based (searchlight MVPA analysis):
regions whose activity pattern distinguished the two
categories
35 % of voxels are marked only in the informationbased map: category information is lost when data
are smoothed
Kriegeskorte, Goebel & Bandettini, 2006
Activation- vs. information-based analysis
Mur et al., 2009, Social Cognitive and Affective Neuroscience
What Is MVPA Picking Up On?
Limitations of MVPA
•
•
•
•
•
•
MVPA will use whatever information is available, including confounds
– e.g., reaction time
MVPA works best for attributes that are coded at certain spatial scales (e.g.,
topography: retintopy, somatotopy, etc.)
A failure to find effects does not mean that neural representations do not
differ
– information may be present at a finer scale
– choice of classifier may not have been optimal (e.g., maybe nonlinear
would work better)
Good classification indicates presence of information (not necessarily
neuronal selectivity) (Logothetis, 2008).
– e.g., successful face decoding in primary visual cortex
Pattern-classifier analysis requires many decisions that affect the results
(see Misaki et al., 2010)
Classifiers and correlations don’t always agree
“Mind-Reading”:
Reconstructing new stimuli from brain activity
Reconstruct new images
Miyawaki et al., 2008
Decoding Vision
Gallant Lab,
UC Berkeley
Lie detector
• Non-linear classifier applied to fMRI data to discriminate spatial
patterns of activity associated to lie and truth in 22 individual
participants.
• 88% accuracy to detect lies in participants not included in the training
(Davatzikos et al., 2005)
Lie detector
• Non-linear classifier applied to fMRI data to discriminate spatial
patterns of activity associated to lie and truth in 22 individual
participants.
• 88% accuracy to detect lies in participants not included in the training
The real world is more complex!
Reconstruct dreams
• Measure brain activity while 3 participants were asleep and ask
them to describe their dream when awake
• Comparison between brain activity during sleep and vision of
pictures of categories frequently dreamt
• Activity in higher order visual areas (i.e. FFA) could successfully
(accuracy of 75-80%) decode the dream contents 9 seconds
before waking the participant!
Kamitami Lab
ATR Japan
Shared Semantic Space from brain activity during
observation of movies
Similar colors for categories similarly represented in the brain
Huth et al., 2012
Shared Semantic Space from brain activity during
observation of movies
Similar colors for categories similarly represented in the brain
People and
communication
verbs are
represented
similarly
Huth et al., 2012
Continuous Semantic Space across the surface
Each voxel is colored accordingly to which
part of the semantic space is selective for
http://gallantlab.org/semanticmovies/
Continuous Semantic Space across the surface
Click on each voxel to see which
categories it represents
FUSIFORM FACE AREA
http://gallantlab.org/semanticmovies/
Jody Culham
Brain and Mind Institute
Department of Psychology
Western University
http://www.fmri4newbies.com/
MVPA Tutorial
Last Update: January 18, 2012
LastCourse:
Update:
March 10,9223,
2013W2010, University of Western Ontario
Last
Psychology
Last Course: Psychology 9223, W2013, Western University
Test Data Set
• Two runs: A and B (same protocol)
• 5 trials per condition for 3 conditions
Measures of Activity
• β weights
– z-normalized
– %-transformed
• t-values
– β/error
• % BOLD signal change
– minus baseline
low
activity
high
activity
low βz
high βz
low β%
high β%
low t
high t
Step 1: Trial Estimation
• Just as in the Basic GLM, we are running
one GLM per voxel
• Now however, each GLM is estimating
activation not across a whole condition but
for each instance (trial or block) of a
condition
Three Predictors Per Instance
2-gamma
constant
linear within
trial
5 instances of motor imagery
5 instances of mental calculation
5 instances of mental singing
Step 1: Trial Estimation Dialog
Step 1: Trial Estimation Output
• Now for each instance of each condition in each
run, for each voxel we have an estimate of
activation
Step 2: Support Vector Machine
• SVMs are usually run in a subregion of the brain
– e.g., a region of interest (= volume of interest)
sample data:
SMA ROI
sample data:
3 Tasks ROI
Step 2: Support Vector Machine
• test data must be independent of training data
– leave-one-run-out
– leave-one-trial-out
– leave-one-trial-set-out
• often we will run a series of iterations to test multiple
combinations of leave-X-out
– e.g., with two runs, we can run two iterations of leave-one-run-out
– e.g., with 10 trials per condition and 3 conditions, we could run up to
103 = 1000 iterations of leave-one-trial-set-out
MVP file plots
98 functional voxels
Run B = test set
Run A = training set
intensity = activation
15 trials
SVM Output: Train Run A; Test Run B















Guessed
Condition















Guessed
Condition
Actual
Condition
15/15 correct
Actual
Condition
10/15 correct
(chance = 5/15)
SVM Output: Train Run B; Test Run A
Permutation Testing
•
•
•
•
•
randomize all the condition labels
run SVMs on the randomized data
repeat this many times (e.g., 1000X)
get a distribution of expected decoding accuracy
test the null hypothesis (H0) that the decoding
accuracy you found came from this permuted
distribution
Output from Permutation Testing
our data  reject H0
upper bound of 95% confidence limits on
permuted distribution
upper quartile of permuted distribution
median of permuted distribution (should be 33.3%)
lower quartile of permuted distribution
Voxel Weight Maps
• voxels with high weights contribute
strongly to the classification of a trial to a
given condition