2nd Level GLM

Download Report

Transcript 2nd Level GLM

2

nd

Level GLM

Emily Falk, Ph.D.

1

Acquire Functional s (De-noise) Slice Timing Correct Realign Smooth Predictors Determine Scanning Parameters Acquire Structurals (T1) Co Register Normalize Y X y = Xβ + ε 1 st level (Subject ) GLM β Contrast β how - β why Template 2 nd level (Group) GLM Threshold 2

Groups of Subjects • So far: Analyzing each individual voxel from one person • How do subjects combine data from groups of subjects?

– Often referred to as 2 nd -level random effects analysis • Basic approach: – Normalize SPMs from each subject into a standard space – Test whether statistic from a given voxel is significantly different from 0 across subjects – Correct for multiple comparisons 3

An Example 4

Subject 1: Subject 2: Subject 3: Subject 4: Subject 5:

Mean :

Region A: β 1 β 2 (repeat for all regions) 32 18 -4 45 23

22.8

*

5 (Aron et al., 2005)

Fixed and Random Effects •

Fixed effect

– Always the same, from experiment to experiment, levels are not draws from a random variable – – Sex (M/F) Drug type (Prozac) •

Random effect

– Levels are not randomly sampled from a population – – Subject Day, in a longitudinal design • If effect is treated as fixed, error terms in model do not include variability across levels – Cannot generalize to unobserved levels – e.g., if subject is fixed, cannot generalize to new subjects 6 Courtesy of Tor Wager

Fixed vs. Random Effects: Bottom Line • If I treat subject as a fixed effect, the error term reflects only scan-to-scan variability, and the degrees of freedom are determined by the number of observations (scans).

• If I treat subject as a random effect, the error term reflects the variability across subjects, which includes two parts: – Error due to scan-to-scan variability – Error due to subject-to-subject variability and degrees of freedom are determined by the number of subjects.

7 Courtesy of Tor Wager

Random Effects Analysis •

Subjects treated as “random” effect

– Randomly sampled from population of interest • Sample is used to make estimates of population effects • Results lead to inferences on the population 8

Random vs. Fixed Effects •

Whereas some early studies used fixed effects models, virtually all current studies use random effects models

Use random effects

All analysis that follow treat subject as a random effect

9

More specifically…

10

Voxel-Wise 2 nd -Level Analysis Model specification Parameter estimation Hypothesis Statistic single voxel subj series

Statistic at that voxel

SPM 11

Model Specification: Building the Design Matrix b + e Stat Value é ê ê ê

Y Y

2

Y

1

n

ù ú ú ú û

Y

= =

X

é ê ê ê ë 1 1 1 ù ú ú ú û ´ Design matrix [ ] + é ê ê ê ë e 1 e 2 e

n

ù ú ú ú û Residuals Model parameters = X + 12

y Parameter Estimation/Model Fitting = Find  values that produce best fit to observed data   0 + ERROR 13

y The SPM Way of Plotting the Variables X e = X + 14

Data Group Analysis Using Summary Statistics: A simple kind of ‘random effects’ model The “Holmes and Friston” approach (HF) First level Design Matrix Contrast Images Second level SPM(t) One-sample t-test @ 2 nd level 15 Courtesy of Tor Wager

Summary Statistic Approach: 2 Sample t-test from Mumford & Nichols, 2006 16

Summary Statistic Approach: Inference • In a 1-sample t-test, the contrast C = 1 derives the group mean – If images taken to a second level represent the contrast A – B, then • C = 1 is the mean difference (A > B) • C = -1 is the mean difference (B > A) • Dividing by the standard error of the mean yields a t statistic – Degrees of freedom is

N

– 1, where

N

is the number of subjects • Comparison of the t-statistic with the t-distribution yields a p-value – P(DataNull) 17

• Tech Note: Sufficiency of Summary Statistic Approach With simple t-tests under the summary statistic approach, within subject variance is assumed to be homogenous (within a group) – SPM’s approach, but other packages can act differently • If all subjects (within a group) have equal within-subject variance (homoscedastic), this is ok • If within-subject variance differs among subjects (heteroscedastic), this may lead to a loss of precision – May want to weight individuals as a function of within-subject variability • Practically speaking, the simple approach is good enough (Mumford & Nichols, 2009, NeuroImage) – Inferences are valid under heteroscedasticity – – – Slightly conservative under heteroscedasticity Near optimal sensitivity under heteroscedasticity Computationally efficient 18

For extended example of ways that you could do this wrong, check out Derek Nee’s second level GLM lecture from last year

19

DV One continuous Repeated measures The GLM Family Predictors Continuous One predictor Continuous Two+ preds Categorical 1 pred., 2 levels Categorical 1 p., 3+ levels Categorical 2+ predictors Two measures, one factor More than two measures Analysis Regression Multiple Regression 2-sample t-test One-way ANOVA Factorial ANOVA Paired t-test Repeated measures ANOVA General Linear Model 20

Correlations • To perform mass bi-variate correlations, use SPM’s “Multiple Regression” option with a single co-variate – Can also specify multiple co-variates and perform true multiple regression • Be cautious of multi-collinearity!

• Correlations are done voxel-wise • % of explained variance necessary to reach significance with appropriate correction for multiple comparisons may be very high – Interpret location, not effect size (more later) • May be more realistic to perform correlations on a small set of regions-of-interest (more later) 21

Examples •

First level: Why > How

– Regression with… • trait empathy • • • trait narcissism scan on weekday or weekend friends on facebook •

First level: Loved one > Other

– Regression with… • relationship closeness • relationship satisfaction • age 22

Example • Costly exclusion predicts susceptibility to peer influence

Correlations and Outliers Null-hypothesis data, N = 50 Same data, with one outlier 24 Courtesy of Tor Wager

Robust Regression • Outliers can be problematic, especially for correlations • Robust regression reduces the impact of outliers – 1) Weight data by inverse of leverage – 2) Fit weighted least squares model – 3) Scale and weight residuals – 4) Re-fit model – 5) Iterate steps 2-4 until convergence – 6) Adjust variances or degrees of freedom for p-values • Can be applied to simple group results or correlations – Whole brain: http://wagerlab.colorado.edu/ – ROI: whatever software you prefer (more later) 25

Null-hypothesis data, N = 50 Same data, with one outlier Robust IRLS solution 26 Courtesy of Tor Wager

Case Study: Visual Activation Visual responses 27 Courtesy of Tor Wager

Acquire Functional s (De-noise) Slice Timing Correct Realign Smooth Predictors Determine Scanning Parameters Acquire Structurals (T1) Co Register Normalize Y X y = Xβ + ε 1 st level (Subject ) GLM β Contrast β how - β why Template 2 nd level (Group) GLM Threshold 28

Acquire Functional s (De-noise) Slice Timing Correct Realign Smooth Predictors Determine Scanning Parameters Acquire Structurals (T1) Co Register Normalize Y X y = Xβ + ε 1 st level (Subject ) GLM β Contrast β how - β why Template 2 nd level (Group) GLM Threshold 29

Up Next… • • • Hypothesis Testing Levels of Inference Multiple Comparisons – – – Family-wise Error Correction False-Discovery Rate Correction Non-parametric Correction 30

Hypothesis Testing • • Null Hypothesis H 0 – No effect • T-test: No difference from zero • F-test: No variance explained α level – Set to an acceptable false positive rate – – Level level α α = P( T > μ Threshold μ α α | H 0 ) controls false positive rate at • P-value – Test statistics are compared with appropriate distributions • Changes as a function of degrees of freedom • T-distribution: bell-shaped • F-distribution: skewed – – Assessment of probability of test statistic assuming H 0 P(Data | Null) • But not P(Null | Data)!

31

Information for Making Inferences on Activation • Where? Signal location – Local maximum – •

no inference in SPM

Could extract peak coordinates and test (e.g., Woods lab, Ploghaus, 1999) • How strong? Signal magnitude – Local contrast intensity – Main thing tested in SPM • How large? Spatial extent – Cluster volume – Can get p-values from SPM • Sensitive to blob-defining-threshold • When? Signal timing –

No inference

in SPM; but see Aguirre 1998; Bellgowan 2003; Miezin et al. 2000, Lindquist & Wager, 2007 32

Unit of Analysis • Fundamental unit of analysis is voxel – GLM is run voxel-by-voxel – Statistical parametric maps (SPM’s) are calculated voxel-by-voxel • Unit of interest may instead by a “region” – Functional unit – Pool data across voxels • May also be broadly interested in the brain as a whole – Considering the brain as a whole, do these 2 conditions differ?

33

Levels of Inference • Inferences can be made at any “level” depending upon your unit of interest • Voxel-level – This/these particular voxels are significant – Most spatially specific, least sensitive • Cluster-level – These contiguous voxels together are significant – Less spatially specific, more sensitive • Set-level – The brain shows an effect – No spatial specificity, but can be most sensitive SPM’s results table shows p values for voxel-level, cluster level , and set-level tests.

34

Voxel-Level Inference • • Retain voxels above α-level threshold

u

α Gives best spatial specificity – The null hyp. at a single voxel can be rejected

u

α Significant Voxels space No significant Voxels 35 Courtesy of Tor Wager

Cluster-Level Inference • Two step-process – – Define clusters by arbitrary threshold

u

clus Retain clusters larger than α-level threshold

k

α Cluster not significant

u

clus

k

α

k

α space Cluster significant 36 Courtesy of Tor Wager

Cluster-Level Inference • • Typically better sensitivity Worse spatial specificity – – The null hyp. of entire cluster is rejected Only means that

one or more

active voxels in cluster

u

clus Cluster not significant

k

α

k

α space Cluster significant 37 Courtesy of Tor Wager

Multiple Comparisons Problem • Often over 100,000 voxels in the brain – Voxel-level tests are repeated over 100,000 times – If α = 0.05 (i.e. p < 0.05), over 5,000 false positive voxels!

• Need to control false positive rate at α across all tests – Otherwise, difficult to know if result is believable 38

Multiple Comparisons • Perform statistical tests at every voxel tens and tends of thousands • Quite likely that some would pass threshold by chance even if there was absolutely no effect • Need to correct for multiple comparisons.

39

Some Approaches • Bonferroni correction: Insist on p<.05/#voxels – Severely reduces sensitivity, but works with small ROIs • Gaussian random field theory: Suppose there is no effect but data is spatially smooth. What’s the chance of seeing a blob of X contiguous voxels all of which are above a threshold V?

– Default approach to controlling familywise error (FWE) in SPM • False Discovery Rate (FDR): Set threshold so that less than 5% of the voxels above threshold would be false positives under null hypothesis 40

Family-Wise Error (FEW) • FWE-rate is the probability of finding one or more false positives among all hypothesis tests – If FWE α = 0.05, probability of finding one or more false positives is 5% • Based on maximum distribution – If no true positives are present, most significant voxel will exceed the threshold 5% of the time • Several approaches to control FWE 41

Bonferroni Correction • Simplest method for controlling FWE • α corrected – = α/V α is the desired alpha level – α corrected is the alpha level corrected for FWE – V is the number of voxels/tests • 0.05/100,000 = 0.0000005

– E.g. t(20) = 6.93

When examining results in SPM, you can find the # of voxels in the statistics table (bottom of the table under Volume). Divide α by the # of voxels to determine a Bonferroni corrected threshold.

42

Bonferroni Correction: Limitations • Correction assumes that each test is independent • Data are actually spatially smooth, so not independent!

• Correction tends to be overly conservative – – False positives appropriately controlled But threshold is too high to detect many true positives 43

Gaussian Random Fields • SPM’s default method of FWE correction takes into account smoothness of data • Intuition – Smooth data  lower the resolution of the search space  fewer comparisons  less stringent correction • Assumes that an image of residuals can be descripbed by Gaussian noise convolved with a 3D kernel – Forms a Gaussian Random Field – FWHM of the kernel describes the smoothness of the data 44

Tech Note – Estimating Smoothness: RESELS • RESELS = RESolution Elements – 1 RESEL = FWHM x x FWHM y x FWHM z 1 2 3 4 5 6 7 8 9 10 1 2 3 4 Note, when examining results in SPM, you can find the # of resels and FWHM in the statistics table (bottom of the table under Volume) voxels RESELS 45

• Threshold needed to correct – Increases with greater search volume • Need more stringent correction – Decreases with greater smoothness, RESEL • Greater smoothness leads to less stringent correction 46

Gaussian Random Fields: Clusters 1) Threshold at voxel-level 2) Estimate chance of clusters of size ≥ k, taking into account Mean expected cluster size search volume smoothness Threshold -> p uncorrected of cluster of size ≥k 3) Apply previously described correction p corrected * z 2 5mm FWHM 10mm FWHM 15mm FWHM 47 Courtesy of Tor Wager

Gaussian Random Fields: Limitations • Requires sufficient smoothness of data – FWHM 3-4x voxel size • Performs poorly with low df – Better with df > 20 and sufficient smoothness • Tends to be conservative, especially with rough data (FWHM < 6) • Based on approximations – Approximations can be thrown off by “roughness spikes” – Approximations will vary on a contrast by contrast basis • Different contrasts in same data will have different thresholds • Typically regarded as better at individual level where df are high Select “FWE” in SPM results to threshold using Gaussian Random Field Theory. Expect a conservative threshold.

48

False-Discovery Rate • Correction of FWE ensures that false positives will be controlled per family of tests – α FWE-corrected = 0.05, 5% of contrasts (across all voxels) will have a single false positive • False-Discovery Rate (FDR) controls the number of false positives within a family of tests – α FDR-corrected = 0.05, 5% of reportedly active voxels in a contrast will be false positives • Upside: will find more true signal • Downside: will have a few false positives 49

False-Discovery Rate: Method 1.

2.

3.

Establish a rate, 0.05)

q

, of acceptable proportion of false-positives (e.g. Sort observed

p

-values from smallest to largest – Find max(i) such that P i V is the # of voxels < i*(q/V) • In other words – Smallest •

p

-value must pass Bonferroni, second smallest Bonferroni*2, third smallest Bonferroni*3, etc i.e. i = 1: Bonferroni*1, i = 2: Bonferroni*2, i = 3: Bonferroni*3, etc.

– – Highest such i gives threshold If no

p

-value passes, threshold cannot be determined (SPM will say the threshold is t = infinity) 0

i

/

V p

(

i

)

(i

/

V)q

1 50

False-Discovery Rate: Limitations • Limits inference – Cannot say which activated voxels are true positives or false positives • Adaptive – Good in some cases – Maps with lots of activations (many voxels with low p) will have low thresholds – Maps with little activation (few voxels with low p) with have high thresholds or no determinable threshold • Hard to find signal in small areas Select “FDR” in SPM results to threshold using False-Discovery Rate. Threshold may be “infinity” if effects are weak or it may be very low if results are strong.

51

Simulations Noise Signal Signal+Noise 52 Courtesy of Tor Wager

Control of Per Comparison Rate at 10% 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% Percentage of Null Pixels that are False Positives 10.2% 9.5% Control of Familywise Error Rate at 10% Occurrence of Familywise Error Control of False Discovery Rate at 10% FWE 6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% Percentage of Activated Pixels that are False Positives 8.7% 53

Nonparametric Inference • Parametric methods – Assume distribution of statistic under null hypothesis – Needed to find P-values,

u

5%

Parametric Null Distribution • Nonparametric methods – Use

data

to find distribution of statistic under null hypothesis – Any statistic!

5%

Nonparametric Null Distribution 54 Courtesy of Tor Wager

Permutation Test: Toy Example • Under H

0

– Consider all equivalent relabelings – Compute all possible statistic values – Find 95%ile of permutation distribution -8 -4 0 4 8 55 Courtesy of Tor Wager

Permutation Test: Details • Requires only assumption of exchangeability: – Under H 0 , distribution unperturbed by permutation • Subjects are exchangeable (good for group analysis) – Under H 0 , each subject’s A/B labels can be flipped • fMRI scans not exchangeable under H 0 time series analysis/single subject) – Due to temporal autocorrelation (bad for On the SPM website, click on Extensions and search under Toolboxes to download SnPM 56 Courtesy of Tom Nichols

Other Approaches • AlphaSim (AFNI): create random fields based on smoothness and observe rate of false positive clusters – Provides cluster-level correction – Can input appropriate voxel-level threshold and cluster extent to use with SPM • Once estimated, threshold can be used for all contrasts • Threshold-free Cluster Enhancement (FSL): combines signal strength and voxel extent into a single measure – Provides cluster-level correction without need to first specify voxel-level threshold – Avoids ambiguities that can arise from getting different results at different voxel-level thresholds with other cluster methods 57

Alpha Sim • FDR correction based on simulation… – “All whole-brain analyses were thresholded using an uncorrected p-value of .001 combined with an extent threshold of 21 contiguous voxels, corresponding to a false-positive discovery rate of 5% across the whole brain as estimated by Monte Carlo simluation implemented using AlphaSim in the software package AFNI ( http://afni.nimh.gov/afni/doc/manual/AlphaSim )”.

– – “Whole-brain analyses were conducted using a statistical criterion of at least 21 contiguous voxels exceeding a voxel-wise threshold of

p

<.001. A Monte Carlo simulation (http://afni.nimh.gov/afni/doc/manual/AlphaSim) of our brain volume demonstrated that this cluster extent cutoff provided an experiment-wise threshold of

p

<.05, corrected for multiple comparisons.” Cox, R. W. (1996). ANFI: Software for analysis and visualization of functional magnetic resonance neuroimages.

Comput Biomed Res 29

, 162-173.

58

AlphaSim: Which volume?

Analysis mask Cube 59

AlphaSim: Usage

> AlphaSim –nxyz

[ nx ] [ ny ] [ nz ]

–fwhm

8

–pthr

. 001

–iter

– 10000

dxyz

[ dx ] [ dy ] [ dz ] …

-quiet Where: nx, ny, nz = resolution in each dimension (e.g. 64 64 34) dx, dy, dz = size of voxels in mms (e.g. 3 3 3) fwhm = Size of smoothing kernel (mm) pthr = voxel-level threshold Iter = number of simulations to run (usually 10k) quiet = suppress output 60

AlphaSim: Usage

> AlphaSim –mask [name.hdr] –fwhm

8

–pthr

. 001 …

–iter

10000

-quiet Where: mask = name of

normalized

anatomical mask (2 nd level) fwhm = Size of smoothing kernel (mm) pthr = voxel-level threshold Iter = number of simulations to run (usually 10k) quiet = suppress output 61

Thresholding Summary • Most thresholding is done at either voxel or cluster level – Depends on level of inference that is of interest – – Spatial precision: voxel Regional inference • Methods differ in control over false positives and sesntivity – Bonferroni: strong control over false positives, Somewhat conservative – – False-Discovery Rate: admits false positives, more sensitive Non-parametric: most adaptive, reputedly most accurate • • Next up: Whole brain search isn’t your only option Avoid the problem!

– Can use SCV’s and ROI’s to reduce/eliminate multiple comparisons issues to improve power in areas with a prior hypothesis – But conclusions and inferences are different 62

Some notes on inference in brain mapping studies

63

What Brain Mapping is Good For • Making inferences on the presence of activity, to either a) test a theory, or b) characterize the pattern of brain responses to a task • Limiting the false positive rate to a specified level • Leverage hypothesis testing to provide evidence on a variety of theories: Is Area

r

involved in Task

x

?

64

What Brain Mapping is Not Good For • Estimating effect sizes (effect strength, or predictive power) • Testing the assumptions involved in the analysis – ‘Neural’ timing and temporal profile of neural response – Link between neural activity and observed signal: Hemodynamic response profile – – Appropriateness of additive linear model Normality and homogeneity of variance (needed for valid p-values) • Building a cumulative knowledge base 65

Kinds of Inferences Hypothesis tests Is there an effect?

How well can I predict…?

Effect size estimation: Cross-validation Where are the effects?

Spatial statistics Mixture models Key: There are

tradeoffs

among these goals With current analysis options, they cannot be maximized at once 66

• Effect Size Estimation is Important for Development of Applications Medicine – Predicting treatment response, diagnosis, ‘personalized medicine’, neuro-rehabilitation and prosthetics –

Sensitivity

and

specificity

of tests: relate to effect size • Law – Lie detection, guilty knowledge, tort cases; evidence on pain and cognitive deficits • Psychology and neuroscience – Testing for meaningfully large effects • Marketing, military, homeland security 67

• Spatial Pattern Estimation is Important for Theory Building Need to know both which areas are ‘active’ and which are not • Balance of false positives and false negatives important for building a cumulative science • Often ignored because of bias towards hypothesis testing and ‘strong inference’ 68

True signal Noise Observed signal Results Signal Noise Negative Positive Hypothesis Test Threshold 69 Courtesy of Tor Wager

The Problem with Estimating Effect Sizes Conditioning on Significant Test Results Observed signal Results Hypothesis test Multiple comparisons Signal Noise • Conditioning on significance selects for high noise values (red/purple) • Equally true for all effect size measures: Pearson’s

r

,

t

-values, Z-scores, p values, etc.

70 Courtesy of Tor Wager

The ‘File Drawer Problem’ Sample of ‘ significant ’ voxels

True positive False positive

Threshold Voxels Average Observed Effect size Sample of published studies Threshold Study 1 2 3 4 True effect size (blue) True effect size Conditioning on publication causes bias in effect size estimate 71 Courtesy of Tor Wager

The ‘File Drawer Problem’: Meta-analysis of 5 Antidepressants 72 Melander et al., 2003 BMJ

Regression to the Mean • Average height of a Chinese male is 5’7” • Yao Ming is 7’6” tall • If Yao had a son, is he likely to be shorter or taller than Yao?

73

Tradeoff #1: Is there… vs. How big… Hypothesis tests (inference) Is there an effect?

How well can I predict…?

Effect size estimation Where are the effects?

Spatial statistics Mixture models Stringent multiple comparisons correction is good for inference, but bad for size estimation 74

The Problem with Estimating Effect Sizes Conditioning on Significant Test Results • More stringent multiple comparisons: less accurate estimation of effect size • Increased power: reduced selection bias – Larger effects – Larger sample sizes – Less noise Yarkoni, 2009 75 Courtesy of Tor Wager

Problems with Estimating Effects Sizes Observed signal Results N = 20, p < .001

If the true correlation looks like this… r = .5

A typical ‘ significant ’ voxel looks like this… Hypothesis test Multiple comparisons Noise Negative Positive r = .78

Signal Why? In looking at ‘ significant ’ tests, we are

conditioning on

having a high

observed

effect size:

r

must be at least 0.67 in order to consider it!

76

So why would you ever show this?

r = .78

Descriptive reporting and plotting of results: Checking statistical assumptions Event-related OK Pathological OK Pathological Results AND OR 0 Behavior Behavior 77

Non-Solutions • Omitting display of scatterplots • Using cross-validation for everything, even if theory calls for a hypothesis test • Using regions of interest only and ignoring the information in much of the brain 78

Solutions: Effect Size Estimation 1. When performing a hypothesis test, interpret the results literally: “ Given the model assumptions, this brain area shows a non-zero effect.

” (…Not as an estimate of effect size) 2. Increase power!

2. To estimate effect size, use ‘ hold out ’ cross-validation data, i.e., 2. Select a small number of a prioi ROIs Unbiased estimates of true effect size Voxels 79

• Avoiding the Problem (Preview of What is Up Next) Small Volume Correction (SVC) – If you know the region(s) you are interested in a priori, you can limit examination to just those voxels – Reduces number of voxels and thus reduces multiple comparisons correction • Region-of-interest (ROI) Analysis – If you have a strong prior and reason to believe that areas-of- interest are homogenous, can simply average signal across an ROI – Single comparison (per ROI) • WFUPickAtlas toolbox for SPM provides a good means to create anatomical masks for SVC’s or ROI’s • Downside: can only confirm prior hypotheses – May miss new discoveries!

80

Region of Interest and Other Analysis Methods

81

True signal Noise Observed signal Results Signal Noise Negative Positive Hypothesis test Multiple comparisons Results Contrast: Task comparison Brain-behavior correlation Information-based mapping OR 0 Individual subject Effect size: d, Z, p OR Behavior Effect size: r, p Chance Effect size: accuracy, p 82

Whole Brain vs. ROI

None

Precision of prior spatial information

Some Lots

Test each voxel in whole brain Test each voxel In set of regions Test each voxel Single region Test average in

Multiple comparisons correction required Stringent

need very strong evidence

Some

need strong evidence single region

None

need less evidence 83

What is the Question?

Does this one?

Observed data Does this brain area respond to my task?

Does this one?

Does this one?

84

What is the Answer?

Does this one?

Observed data “ Given the model assumptions, this brain area shows a non-zero effect.

” (Not necessarily informative about how big the effect is.) 85

Up Next… • • • • • Masking – Limiting multiple comparisons – Conjunction Analysis – Disjunction ROI Analysis – ROI selection Reverse Inference Degrees of Freedom Brain Mapping Considerations 86

Masking 87

Whole Brain vs. ROI

None

Precision of prior spatial information

Some Lots

Test each voxel in whole brain Test each voxel In set of regions Test each voxel Single region Test average in

Multiple comparisons correction required Stringent

need very strong evidence

Some

need strong evidence single region

None

need less evidence 88

Masking • • In 1 st level analysis, all voxels with a nonzero value for every subject are estimated – gray matter, white matter, csf, skull, eyeballs Estimating a 1 st level model yields a mask representing all voxels included in the model 89

Masking • Masking 1 st level can eliminate unnecessary tests from your analysis, enabling better correlation for multiple comparisons 90

Masking • • Masking can also be used to test more targeted hypotheses – E.g. are the areas activated by in one contrast also activated in another (independent) contrast?

Create a mask from the first spmT*.img file using ImCalc, then use the mask analysis of second 91

Process Comparison • One use of fMRI is to compare brain processes • Two main interests: – Conjunction: tasks recruit common brain areas and thus common processes – Disjunction: tasks recruit distinct brain areas and thus distinct processes 92

Conjunctions • Often interested in processes shared across tasks – Task 1 recruits regions A,B,C, and D – Task 2 recruits regions A,D,E, and F • Task 1 and Task 2 share common process(es) instantiated by regions A and D • Conjunction analysis aims to demonstrate neural overlap Select 2 or more contrasts in SPM’s results menu (ctrl+ click to select multiple) to perform conjunctions Different null hypotheses can be tested. Conjunction null is the accepted standard.

93

Conjunction Analysis • SPM8 gives 2 flavors which differ in the null hypothesis that they test • • Global null: assesses whether the contrasts are likely to be sampled from the null distribution – Essentially a meta-analysis – Contrasts may not be individually significant • Consider three contrasts with t-scores of 0.5, 1.1, 1.3

– None of significant individually – But together, they reject the global null Conjunction null: logical AND – all contrasts are individually significant – Typically what researchers are interested in – Joint test can be conservative • Each contrast must be individually significant at corrected threshold • Conjunction null is generally accepted approach – SVC or ROI’s may be appropriate to overcome conservativeness at whole-brain levels Different null hypotheses can be tested. Conjunction null is the accepted standard.

94

Conjunction: Practical Issues • Contrast A activates a wide network of regions • Contrast B activates a smaller network, which differs from A, but some voxels overlap • Is this overlap meaningful?

– How much overlap would be expected by chance?

95

Conjunction: Practical Issues • Behavioral studies report low correlations between Task A and Task B • Subjects can concurrently perform Task A and Task B with little interference and additive factors analysis reveals no interactions • • • fMRI of the tasks reveal that they both activate the dorsal ACC Researcher concludes that the tasks do share processes and touts the superiority of fMRI Yarkoni et al., 2011, Nature Methods 20% of published studies find activation in the dorsal ACC Conclusion justified?

Should this base-rate be taken into account?

96

Conjunction: Summary • Conjunction analysis is one means of process comparison and should likely be done against the conjunction null (rather than the global null) • Conjunctions may occur simply because one contrast is very encompassing • Conjunctions may occur in areas that sub-serve domain general processing across many tasks • Important to be mindful of these matters when interpreting conjunctions 97

Disjunctions • How to characterize areas involved in contrast A, but not contrast B?

• Requirements – Active in contrast A –

A > B

– Not active in contrast B • Bad practice – Significant in A, not significant in B – Could be active in B, just failed to detect it!

– One sample demonstrated that nearly half of neuroscience papers made this error (Nieuwenhuis et al., 2011, Nature Neuroscience) When looking at results of contrast A, use an exclusive mask of contrast B to look for voxels active in A, but not B. Note, this is not sufficient in and-of itself to determine an interaction! Be sure to directly contrast A > B, as well.

98

Disjunctions: Practical Concerns • Want: area X involved in contrast A, but not contrast B – Need: 1) Contrast A  area X 2) 3) A > B B   area X area X • Issue #1: how to define area X?

– Voxels active in contrast A?

• No! This is biased to show 1 and 2 (more on this in a moment) – Contrast orthogonal to A and B • Independent data (e.g. functional localizer) • Anatomically defined • Issue #2: how to demonstrate not active in contrast B?

– Difficult, but in the least should show that area X is not significant in contrast B at a very liberal criterion • E.g. p < 0.05, uncorrected Different null hypotheses can be tested. Conjunction null is the accepted standard.

99

Outline • • • • • Masking – Limiting multiple comparisons – Conjunction Analysis – Disjunction ROI Analysis – ROI selection Reverse Inference Degrees of Freedom Brain Mapping Considerations 100

ROI Analysis • Often want to make inferences on a particular region-of-interest (ROI) • Must be careful to define ROI in such a way that does not make inferences dubious • Inappropriate ROI definition has led to a great deal of controversy in neuroscience – Double-dipping: Kriegeskorte et al., 2009, Nat. Neurosci.

101

ROIs Revsited • ROIs based upon particular contrast are biased to show a greater effect size than is truly present.

– Cannot estimate effect size of A region defined from contrast A – Cannot examine A > B from a region defined from contrast A – Some researchers object to even visually depicting effects from A • • E.g. fitted response will look too good NOTE: I DISAGREE WITH THIS STRONGLY 102

ROIs Revsited • Inferences should be performed on unbiased ROIs – Orthogonal to contrast of interest – Defined from independent data • • Separate functional localizer Cross-validation (separate data into sample test sets) • Separate study – Expect regression to the mean!

– Defined anatomically 103

Methods for Selecting Unbiased ROIs • • Anatomical Functional – Functional localizer – Meta analysis • • Curated Neurosynth 104

Anatomical WFUPickAtlas toolbox in SPM can define ROIs using popular atlases such as AAL and Talairach Daemon 105

• Localizer Functional 106

• Meta analysis Functional

Image from Bartra et al., 2012

107

Example: Neurosynth (ToM) 108

Outline • • • • • Masking – Limiting multiple comparisons – Conjunction Analysis – Disjunction ROI Analysis – ROI selection Reverse Inference Degrees of Freedom Brain Mapping Considerations 109

Reverse inference: When the heat is on, the house gets hot The house is hot… what can I conclude?

110

Reverse Inference 1.

2.

3.

In the present study, when task comparison A was presented, brain area Z was active In other studies, when cognitive process X was putatively engaged, then brain area Z was active.

Thus, the activity of area Z in the present study demonstrates engagement of cognitive process X by task comparison A • Example: – Stroop task activates the dorsal ACC – – In several studies, pain activated the dorsal ACC -> The Stroop task hurts Poldrack, 2006, TICS • Reverse Inference: – Reason backwards from activation in a region engagement of a cognitive function to 111 Yarkoni et al., 2011, Nature Methods

Reverse Inference: Problems • Logical fallacy: affirming the consequent – If one takes the fMRI course, they will know fMRI.

– Neo knows fMRI.

– Neo took the fMRI course.

• Brain regions are engaged by diverse demands – Even the presumably selective, fusiform “face” area is activated in response to diverse stimuli 112

Reverse Inference: What to Do?

• On some level, reverse inference is necessary – – Trying to build a collective knowledge of the brain Need to link results with prior data • Keep selectively in mind when making inferences – Brain area X is engaged in your context of interest. What about other contexts?

– Check out neurosynth.org

• For a given region, can output words associated with that region in the published literature 113

Analysis Choices & Degrees of Freedom

114

The Degrees of Freedom Problem • Many aspects of fMRI analysis have multiple solutions and options – Order of pre-processing steps – – – – – Size of smoothing kernel Spatial normalization template High-pass filter length Basis set Motion regression • Choices can strongly affect results Mean activation and variation as a result of analysis choices Carp, 2012, Frontiers in Neuroscience 115

Degrees of Freedom and Bias • Problematic scenario – “This doesn’t look how I expected it to look. I wonder if I did something wrong?” • Re-analyze until it looks right • fMRI analysis is complex and it is likely that some degree of optimization is necessary post-data collection – – E.g. Planned basis function does not appropriately fit data Must be very careful not to bias results • Ninja Derek’s recommendations – Embed contrasts with known solutions that are orthogonal to contrasts of interest • E.g. Right vs Left Motor response, Error vs Correct response – Use these contrasts as criterion for optimization – – Design experiments to adjudicate between multiple equally interesting hypotheses • Theory A predicts X, Theory B predicts Y, Theory C predicts Z Reduces bias towards any one result 116

Conclusions • Most studies have been based on null-hypothesis tests • These are useful for exploratory purposes, and for constraining theories about the physical basis of mind • In the brain imaging setting, there has been little attention paid to estimating effect sizes, and the standard framework produced biased post-hoc estimates • Effect sizes may be increasingly important in the future, as applications are developed • There are tradeoffs in analysis choices, and the best option depends on your goal and what kinds of effects (local vs. distributed) you expect 117

Questions?

…for you and for me!

118