Jody Culham Brain and Mind Institute Department of Psychology Western University http://www.fmri4newbies.com/ fMRI Techniques to Investigate Neural Coding: Multivoxel Pattern Analysis (MVPA) Last Update: January 18, 2012 LastCourse: Update: November9223, 24, 2014 Last Psychology W2010,
Download ReportTranscript Jody Culham Brain and Mind Institute Department of Psychology Western University http://www.fmri4newbies.com/ fMRI Techniques to Investigate Neural Coding: Multivoxel Pattern Analysis (MVPA) Last Update: January 18, 2012 LastCourse: Update: November9223, 24, 2014 Last Psychology W2010,
Jody Culham Brain and Mind Institute Department of Psychology Western University http://www.fmri4newbies.com/ fMRI Techniques to Investigate Neural Coding: Multivoxel Pattern Analysis (MVPA) Last Update: January 18, 2012 LastCourse: Update: November9223, 24, 2014 Last Psychology W2010, University of Western Ontario Last Course: Psychology 9223, F2014 Limitations of Subtraction Logic • Example: We know that neurons in the brain can be tuned for individual faces “Jennifer Aniston” neuron in human medial temporal lobe Quiroga et al., 2005, Nature high activity fMRI spatial resolution: 1 voxel 3 mm Fusiform Face Area (FFA) mm 33mm low activity 3 mm 3 mm A voxel might contain millions of neurons, so the fMRI signal represents the population activity Limitations of Subtraction Logic • fMRI resolution is typically around 3 x 3 x 6 mm so each sample comes from millions of neurons. Let’s consider just three neurons. Even though there are neurons tuned to each object, the population as a whole shows no preference Activation Neuron 3 “likes” Brad Pitt Firing Rate Neuron 2 “likes” Julia Roberts Firing Rate Firing Rate Neuron 1 “likes” Jennifer Aniston Two Techniques with “Subvoxel Resolution” • “subvoxel resolution” = the ability to investigate coding in neuronal populations smaller than the voxel size being sampled 1. Multi-Voxel Pattern Analysis (MVPA or decoding or “mind reading”) 2. fMR Adaptation (or repetition suppression or priming) Multivoxel Pattern Analyses (or decoding or “mind reading”) high activity fMRI spatial resolution: 1 voxel 3 mm low activity 3 mm high activity Region Of Interest (ROI): group of voxels 3 mm 3 mm low activity 3 mm Voxel Pattern Information 3 mm Condition 1 3 mm 3 mm L R Condition 2 Spatial Smoothing No smoothing 4 mm FWHM 7 mm FWHM 10 mm FWHM • most conventional fMRI studies spatially smooth (blur) the data – increases signal-to-noise – facilitates intersubject averaging • loses information about the patterns across voxels Effect of Spatial Smoothing and Intersubject Averaging 3 mm 3 mm 3 mm Standard fMRI Analysis FACES HOUSES trial 1 trial 1 trial 2 trial 2 trial 3 trial 3 Average Summed Activation Perhaps voxels contain useful information • In traditional fMRI analyses, we average across the voxels within an area, but these voxels may contain valuable information • In traditional fMRI analyses, we assume that an area encodes a stimulus if it responds more, but perhaps encoding depends on pattern of high and low activation instead • But perhaps there is information in the pattern of activation across voxels Decoding for Dummies Kerri Smith, 2013, Nature, “Reading Minds” Approaches to Multi-Voxel Pattern Analysis 1. MVPA classifier 2. MVPA correlation: Basic approach 3. MVPA correlation: Representational similarity analysis Preparatory Steps Initial Steps • Step 1: Select a region of interest (ROI) – e.g. a cube centred on an activation hotspot • [15 mm (5 functional voxels)]3 = 3,375 mm3 = 125 functional voxels • DO NOT SPATIALLY SMOOTH THE DATA • Step 2 : Extract a measure of brain activation from each of the functional voxels within the ROI • β weights – z-normalized – %-transformed • % BOLD signal change – minus baseline • t-values – β/error MVPA Methods • block or event-related data • resolution – works even with moderate resolution (e.g., 3 mm isovoxel) – tradeoff between resolution and coverage, SNR – preprocessing • usually steps apply (slice scan time correction, motion correction, low pass temporal filter) – EXCEPT: No spatial smoothing! • Model single subjects, not combined group data (at least initially) Classifier Approach Classifier Approach FACES Training Trials trial 1 trial 1 trial 2 trial 2 trial 3 trial 3 … Test Trials (not in training set) HOUSES … Can an algorithm correctly “guess” trial identity better than chance (50%)? Voxel 1 Voxel 2 Activity in Voxel 1 Each dot is one measurement (trial) from one condition (red circles) or the other (green circles) Activity in Voxel 2 Faces Houses Test set Activity in Voxel 1 Training set Activity in Voxel 2 Faces Houses Classifier Can the classifier generalize to untrained data? Activity in Voxel 1 Test set Activity in Voxel 2 Classifier Accuracy = Correct Incorrect = 6 = 75 % 8 Faces Houses Classifier Iterative testing (“folds”) • Example: Leave one-pair out – 10 trials of faces + 10 trials of houses – There are 100 possible combinations of trial pairs • • • • • • • F1, H1 F1, H2 … F2, H1 F2, H2 … F10, H10 – We can train on 9/10 trials of each with 1/10 excluded for 100 iterations – Average the accuracy across the 100 iterations • Many options: e.g., Leave one run out; classify the average of several trials left out 9 voxels 9 dimensions decision boundary Each dot is one measurement (trial) from one trial type (red circles) or the other (blue squares) simple 2D example Classifier can act on single voxels. Conventional fMRI analysis would detect the difference. Classifier would require curved decision boundary Haynes & Rees, 2006, Nat Rev Neurosci Classifier can not act on single voxels because distributions overlap Classifier can act on combination of voxels using a linear decision boundary White and black circles show examples of correct and erroneous classification in the test set Where to “Draw the Line”? • There are different approaches to determining what line/plane/hyperplane to use as the boundary between classes • We want an approach with good generalization to untrained data The most common approach is the linear support vector machine (SVM) Support Vector Machine (SVM) • SVM finds a linear decision boundary that discriminates between two sets of points • constrained to have a the largest possible distance from the closest points on both sides. • response patterns closest to the decision boundary (yellow circles) that defined the margins are called “support vectors”. Mur et al., 2009 Mur et al., 2009 Is decoding better than chance? Two options 1.Use intersubject variability to determine significance +/- 95% CI chance Mean Permutation Testing • • • • • randomize all the condition labels run SVMs on the randomized data repeat this many times (e.g., 1000X) get a distribution of expected decoding accuracy test the null hypothesis (H0) that the decoding accuracy you found came from this permuted distribution Is decoding better than chance? Two options 2.Permutation Testing our data reject H0 upper bound of 95% confidence limits on permuted distribution upper quartile of permuted distribution median of permuted distribution (should be 33.3%) lower quartile of permuted distribution Example of MVPA classifier approach: decoding future actions Gallivan et al., 2013, eLife Conditions Hand and Tool Decoding +/- 1 SEM Cross-decoding Logic • Task-Across-Effector – Train Grasp vs. Reach for one effector (e.g. Hand) – Test Grasp vs. Reach for other effector (e.g., Tool) – If (Accuracy > chance), then area codes task regardless of effector % Decoding Accuracy Hand and Tool Decoding L SPOC +/- 1 SEM % Decoding Accuracy Hand and Tool Decoding L SPOC L SMG L M1 L aIPS L PMd L PMv +/- 1 SEM % Decoding Accuracies Single TR Decoding Time (volumes) +/- 1 SEM Basic Correlation Approach First Demonstration MVPA correlation approach Faces Houses trial 1 trial 1 trial 1 trial 2 trial 2 trial 2 trial 3 trial 3 trial trial 3 3 Average Summed Activation MVPA correlation approach Houses Faces trial 1 trial trial 1 1 trial 1 trial 2 trial trial 2 2 trial 2 trial 3 trial trial 3 3 trial trial 3 3 TheAverage same category evokes similar patterns of activity across trials Summed Activation MVPA correlation approach Faces Houses trial 1 trial 1 trial 1 trial 2 trial 2 trial 2 trial 3 trial 3 trial trial 3 3 Average Similarity Within the same category Summed Activation MVPA correlation approach Faces Houses trial 1 trial 1 trial trial 1 1 trial 2 trial 2 trial2 trial 2 trial 3 trial 3 trial3 trial 3 Average Similarity Between different categories Summed Activation Within-category similarity > Between-category similarity The brain area contains distinct information about faces and houses Category-specificity of patterns of response in the ventral temporal cortex Haxby et al., 2001, Science Category-specificity of patterns of response in the ventral temporal cortex SIMILARITY MATRIX ODD RUNS similarity EVEN RUNS high low Within-category similarity Haxby et al., 2001, Science Category-specificity of patterns of response in the ventral temporal cortex SIMILARITY MATRIX ODD RUNS similarity EVEN RUNS high low Between-category similarity Haxby et al., 2001, Science Correlation Approach Using Representational Similarity Analysis Representational similarity approach (RSA) • Differently from the MVPA correlation, RSA does not separate stimuli into a priori categories MVPA correlation RSA ODD RUNS high EVEN RUNS similarity (correlation) low Kriegeskorte et al (2008) No class boundaries! high . . . . . . . . similarity CONDITIONS TRIALS C1 low C96 C1 CONDITIONS C96 Can compare theoretical models to data high similarity low Kriegeskorte et al (2008) Can compare theoretical models to data Which prediction matrix is more similar to the real data? high similarity low REAL DATA Kriegeskorte et al (2008) “Metacorrelations” Calculate correlation between model correlation matrix and data correlation matrix Can look at metacorrelations to determine best model or see similarity between areas Right FFA pattern is similar to left FFA pattern Right FFA pattern is similar to the fane-anim prototype theoretical model Right FFA pattern is not very similar to a low-level vision theoretical model Metacorrelation Matrix Multidimensional Scaling (MDS) Input = matrix of distances (km here) Vancouver Vancouver Winnipeg Toronto Montreal Halifax St. John's Yellowknife Whitehorse Winnipeg Toronto Montreal Halifax 0 1869 3366 3694 0 1518 1825 0 503 0 St. John's Yellowknife Whitehorse 4439 5046 1566 1484 2581 3250 1753 2463 1266 2112 3078 4093 792 1613 3194 4261 0 885 3768 4867 0 4127 5233 0 1109 0 Multidimensional Scaling (MDS) Output = representational space (2D here) Halifax Toronto Montreal St. John’s Winnipeg Vancouver Yellowknife Whitehorse Halifax Toronto Montreal St. John’s Winnipeg Vancouver Yellowknife Whitehorse MDS on MVPA Data MDS Different Representational Spaces in Different Areas Metacorrelation Matrix MDS on Metacorrelations Searchlights Searchlight: 8 Voxel Example Let’s zoom in on 8 voxels Spherical Searchlight Cross-Section • Ideally we’d like to test a spherical volume but the functional brain image is voxelized so we end up with a Lego-like sphere • Typical radius = 4 mm Kriegeskorte, Goebel & Bandettini, 2006, PNAS Moving the Searchlight 55 62 73 67 60 52 48 51 Each value in white is the decoding accuracy for a sphere of 5-voxels diameter centered on a given voxel First- and Second-Level Analysis V1 V2 V3 V4 V5 V6 V7 V8 S1 First-level Analysis Second-level Analysis 55 62 73 67 60 52 48 51 S2 46 52 65 69 60 59 53 48 S3 48 55 62 70 58 52 50 49 … … S15 52 55 59 57 56 43 42 52 Average Decoding Accuracy 51 59 69 67 59 55 50 50 t(14) The same 8 voxels in stereotaxic space (e.g., Tal space) 0.3 2.0 4.1 3.7 2.9 1.9 1.2 SVM Classifier Decoding accuracies for spheres centred at each of the eight voxels in each of the 15 Ss Do a univariate t-test (which is an RFX test based on intersubject variability) at each voxel to calculate the probability that the decoding accuracy is higher than chance 0.8 threshold at p < .05 (or use your favorite way of correcting for multiple comparisons) p .81 .06 .001 .008 .012 .08 .25 .44 Thresholded t-map Second-level Analysis V1 V2 V3 V4 V5 V6 V7 V8 S1 S2 First-level Analysis The principles of a second-level analysis are the same regardless of what dependent variable we are testing S3 … … S15 Beta Weights (or Differences in Beta Weights = Contrasts) Second-level Analysis V1 V2 V3 V4 V5 V6 V7 V8 0. 1 0. 7 0. 2 Are they sig diff than zero? Decoding Accuracies 51 59 69 67 59 55 50 50 Are they sig diff than chance? Correlations Between Model and MVPA data .03 Are they sig diff than zero? .22 1. 2 .41 1. 5 .50 1. 1 .38 0. 5 .19 0. 3 -.01 .04 UNIVARIATE VOXELWISE ANALYSIS MULTIVARIATE SEARCHLIGHT ANALYSIS Regions vs. Brains • Univariate ROI analysis is to univariate voxelwise analysis as multivariate ROI analysis is to multivariate searchlight analysis There are no differences at the second-level analysis • It’s a way to find things by searching the whole brain • Subjects’ brains must be aligned (Talairach, MNI or surface space) • The same problems and solutions for multiple comparisons arise • Degrees of freedom = #Ss - 1 There are differences at the first-level analysis • Univariate voxelwise analyses are done one voxel at a time • Multivariate searchlight analyses are done one sphere at a time MVPA Searchlight Kriegeskorte, Goebel & Bandettini, 2006, PNAS Activation- vs. information-bases analysis Activation-based (standard fMRI analysis): regions more strongly active during face than house perception Information-based (searchlight MVPA analysis): regions whose activity pattern distinguished the two categories 35 % of voxels are marked only in the informationbased map: category information is lost when data are smoothed Kriegeskorte, Goebel & Bandettini, 2006 Activation- vs. information-based analysis Mur et al., 2009, Social Cognitive and Affective Neuroscience What Is MVPA Picking Up On? Limitations of MVPA • • • • • • MVPA will use whatever information is available, including confounds – e.g., reaction time MVPA works best for attributes that are coded at certain spatial scales (e.g., topography: retintopy, somatotopy, etc.) A failure to find effects does not mean that neural representations do not differ – information may be present at a finer scale – choice of classifier may not have been optimal (e.g., maybe nonlinear would work better) Good classification indicates presence of information (not necessarily neuronal selectivity) (Logothetis, 2008). – e.g., successful face decoding in primary visual cortex Pattern-classifier analysis requires many decisions that affect the results (see Misaki et al., 2010) Classifiers and correlations don’t always agree “Mind-Reading”: Reconstructing new stimuli from brain activity Reconstruct new images Miyawaki et al., 2008 Decoding Vision Gallant Lab, UC Berkeley Lie detector • Non-linear classifier applied to fMRI data to discriminate spatial patterns of activity associated to lie and truth in 22 individual participants. • 88% accuracy to detect lies in participants not included in the training (Davatzikos et al., 2005) Lie detector • Non-linear classifier applied to fMRI data to discriminate spatial patterns of activity associated to lie and truth in 22 individual participants. • 88% accuracy to detect lies in participants not included in the training The real world is more complex! Reconstruct dreams • Measure brain activity while 3 participants were asleep and ask them to describe their dream when awake • Comparison between brain activity during sleep and vision of pictures of categories frequently dreamt • Activity in higher order visual areas (i.e. FFA) could successfully (accuracy of 75-80%) decode the dream contents 9 seconds before waking the participant! Kamitami Lab ATR Japan Shared Semantic Space from brain activity during observation of movies Similar colors for categories similarly represented in the brain Huth et al., 2012 Shared Semantic Space from brain activity during observation of movies Similar colors for categories similarly represented in the brain People and communication verbs are represented similarly Huth et al., 2012 Continuous Semantic Space across the surface Each voxel is colored accordingly to which part of the semantic space is selective for http://gallantlab.org/semanticmovies/ Continuous Semantic Space across the surface Click on each voxel to see which categories it represents FUSIFORM FACE AREA http://gallantlab.org/semanticmovies/ Jody Culham Brain and Mind Institute Department of Psychology Western University http://www.fmri4newbies.com/ MVPA Tutorial Last Update: January 18, 2012 LastCourse: Update: March 10,9223, 2013W2010, University of Western Ontario Last Psychology Last Course: Psychology 9223, W2013, Western University Test Data Set • Two runs: A and B (same protocol) • 5 trials per condition for 3 conditions Measures of Activity • β weights – z-normalized – %-transformed • t-values – β/error • % BOLD signal change – minus baseline low activity high activity low βz high βz low β% high β% low t high t Step 1: Trial Estimation • Just as in the Basic GLM, we are running one GLM per voxel • Now however, each GLM is estimating activation not across a whole condition but for each instance (trial or block) of a condition Three Predictors Per Instance 2-gamma constant linear within trial 5 instances of motor imagery 5 instances of mental calculation 5 instances of mental singing Step 1: Trial Estimation Dialog Step 1: Trial Estimation Output • Now for each instance of each condition in each run, for each voxel we have an estimate of activation Step 2: Support Vector Machine • SVMs are usually run in a subregion of the brain – e.g., a region of interest (= volume of interest) sample data: SMA ROI sample data: 3 Tasks ROI Step 2: Support Vector Machine • test data must be independent of training data – leave-one-run-out – leave-one-trial-out – leave-one-trial-set-out • often we will run a series of iterations to test multiple combinations of leave-X-out – e.g., with two runs, we can run two iterations of leave-one-run-out – e.g., with 10 trials per condition and 3 conditions, we could run up to 103 = 1000 iterations of leave-one-trial-set-out MVP file plots 98 functional voxels Run B = test set Run A = training set intensity = activation 15 trials SVM Output: Train Run A; Test Run B Guessed Condition Guessed Condition Actual Condition 15/15 correct Actual Condition 10/15 correct (chance = 5/15) SVM Output: Train Run B; Test Run A Permutation Testing • • • • • randomize all the condition labels run SVMs on the randomized data repeat this many times (e.g., 1000X) get a distribution of expected decoding accuracy test the null hypothesis (H0) that the decoding accuracy you found came from this permuted distribution Output from Permutation Testing our data reject H0 upper bound of 95% confidence limits on permuted distribution upper quartile of permuted distribution median of permuted distribution (should be 33.3%) lower quartile of permuted distribution Voxel Weight Maps • voxels with high weights contribute strongly to the classification of a trial to a given condition