Anova on SPSS - Plymouth University

Download Report

Transcript Anova on SPSS - Plymouth University

Factorial Analysis of Variance (ANOVA) on SPSS

Practice reproducing the analyses yourself:

2 Factor Between (2 levels x 2 levels).sav

2 Factor Between (2 levels x3 levels).sav

3 Factor Between (2 levels x 2 levels x 2 levels).sav

2 Factor Within (2 levels x 2 levels).sav

All on Portal

Reading

    http://www.socialresearchmethods.net/kb/expfact.htm

summary of factorial designs - a simple http://davidmlane.com/hyperstat/index.html

things as the Standard Error etc.

- see sections 11 & 12 for between subjects designs and section 13 for within subjects (repeated measures) designs. This is recommended –its concise, clear and to the point. It also contains a very good glossary from which you can quickly refresh your memory for definitions of such Chapters 10,11,12 of Gravetter & Forzano cover between, within, and factorial design issues.

Chapters 13,14,15 of Gravetter & Wallnau cover the stats – ANOVA etc. However don’t get bogged down with formulas for calculating sums of squares…. See next slide

Things you should know: •How to interpret interaction plots •How to interpret ANOVA tables and assumption tests •That the Error degrees of Freedom is always N-1 (N= total number of data points) •That the degrees of freedom for a test of a main effect of a factor = number of levels the factor contains -1.

•That the degrees of freedom for a test of an interaction between two or more factors = the number of levels in one factor x the number of levels in the other x…etc. Thus the DF for a 3 way interaction between factors having 2,2 and 4 factor levels is 1 x 1 x 3=3.

•That ANOVA uses F tests and that the F statistic for any effect is the Mean Square for the Effect divided by the Error mean Square: MS •That when you have an alpha level of .05 this means that the probability of not making a Type 1 error is 95% (.95) for each test you do •Thus if you have 20 F tests in your ANOVA table the probability of none of them being spurious is .95 x .95 x .95 x .95……or .95

you should stick to examining a few predictions.

20 condition or (1-

a ) 20 /MS

error

•This actually = .36 or 36% which is why (in complex designs especially)

Things you needn’t worry about: •The precise way that Sums of Squares are calculated (But it will help your understanding of ANOVA if you at least understand the gist of how variability is partitioned).

How Levene’s test or Mauchly’s test are calculated – only that they test the assumption of homogeneity of variance for between subjects designs and its (more or less equivalent) in within subjects designs.

In the SPSS output you can largely ignore the following when doing repeated measures analyses (at this stage at least): The multivariate tests Tests of which you get at the beginning within subject contrasts (although these can be a useful tool for examining patterns in the data) Any tests of between subjects effects you can ignore this output when all that only involve an intercept (i.e. your factors are within subjects)

1. Between Subjects Designs

2 Factor designs

Data Format All scores in a single column Additional columns for each Factor

Main assumptions of ANOVA:

Assumptions: There are 3 main assumptions underlying ANOVA 1. Homogeneity of variance

The error variance within each condition should be statistically equal. Thus any differences between conditions should only be a shift in the mean. Put another way the effect of treatment/condition manipulations is to add a constant to each individual’s score.

OK NOT OK

s 2 A m A Main assumptions of ANOVA: m

B

s 2 B s 2

C

m

C

m A m

B

m

C

Main assumptions of ANOVA:

2. Normality

The distribution of be normal. By errors within each condition should errors we mean deviations from the mean for that condition.

Because the errors are the deviations from the condition means this is equivalent to saying that the scores should be distributed normally about the condition means.

Main assumptions of ANOVA:

3.Independence of observations

The data points should represent independent observations. Knowing the value of one should not tell you anything about the value of any other.

N.B. This assumption is obviously violated in repeated measures experiments (because knowing that one data point comes from subject x –who might be a particularly fast responder, say- does tell you something about the likelihood of another observation from subject x being relatively fast). This is why Subjects have to be included as a factor in the analysis of repeated measures designs- the non-independent component is partialled out.

Design

Experiment to investigate the effect of stimulus duration and modality (Word vs Picture) on Recognition performance.

 

Dependent Variable (Score) Two Factors: Modality and Duration

Factor Levels

Modality – two levels Word, Picture

 

Duration – two levels 200msec, 800msec = 2 x 2 design

Pictures

Modality

Words 5 subjects 5 subjects 5 subjects 5 subjects

Data entry

View Factor Level Labels This person scored 127.19 and was tested in the ‘word’ modality and with the 800msec duration

Analyse / General Linear Model / Univariate

Dependent Variable : Score Fixed Factors: duration + modality

Main effect plot for modality

Main effect plot for duration

Interaction plot

Interaction plot ( duration*modality )

Options – Condition means, descriptive stats, test for homogeneity (equality) of variances.

Means

Displays overall mean, means for each level of duration, mean for each level of modality and the means for each combination of duration by modality (= the interaction means).

Homogeneity Test

Produces Levene’s test for homogeneity of variance (one of the assumptions of Anova – i.e. that the variances within each cell of the design are not significantly different.

Descriptive stats

Gives descriptive statistics (mean, max, min, SD etc. by the experimental groups)

Output Factors and Factor level labels

Output Descriptives cell means & SDs

Levene’s test. This significant result means the assumption of equal group variances has

not

been met.

Output

In this case the analysis is

not valid !

. A data transformation may be of use here.

Output

** **

Some cell SDs considerably different

At this point either –

   Abandon the analysis See if a data transformation removes the problem (e.g. Log(score)) Report results but with ‘extreme caution’

2 Factor Between (2 levels x 2 levels).sav

Assume we have different data:

Levene’s test, and any test that checks

assumptions

for an analysis should

not be significant.

Here the p value of .271 says that ‘there is no evidence for any differences in variances’ between the groups – which is what we want.

ANOVA Table (Ignore shaded items) Test for the Main Effect of Duration (i.e. 200 vs 800 ms pooling across both Modalities) Significant effect of Duration,

F

(1,16) = 5.5,

p

= .032

There was a significant effect of Stimulus Duration. Participants who viewed the stimulus for 200 msec scored higher (

M

=134

)

than those who viewed it for 800 msec (

M

= 115),

F

(1,16) = 5.5,

p

= .032.

Duration Profile Plot 140 Estimated Marginal Means of SCORE 130 This difference is significant 120 110 200 DURATION 800

ANOVA Table (Ignore shaded items) Test for the Main Effect of Modality (i.e. Pictures vs Words pooling across both Durations).

No Significant effect of Modality.

Profile Plot for Modality 126.0

Estimated Marginal Means of SCORE 125.5

125.0

124.5

124.0

123.5

Picture MODALITY This difference is not significant Word

Any graphs you present should be using the same scale. By default SPSS changes the scale so that the data takes up the whole graph area. Here are the two graphs on the same scale: 140.0

Estimated Marginal Means of SCORE 140 Estimated Marginal Means of SCORE 130.0

130 120.0

110.0

Picture MODALITY Modality 120 Word 110 200 DURATION Duration 800

ANOVA Table (Ignore shaded items) Test for the Interaction between Modality and Duration . There was a significant two-way interaction between modality and duration,

F

(1,16) = 7.2,

p =

.017.

Profile Plot of Modality by Duration interaction

150 Estimated Marginal Means of SCORE 140 130 120 110 100 200 DURATION MODALITY Picture Word 800

Main effect

of Duration is still observable in the graph

150 Estimated Marginal Means of SCORE

200 msec Average

140

800 msec Average

130 120 110 100 200 DURATION 800 MODALITY Picture Word

140 Estimated Marginal Means of SCORE 130 120 110 200 DURATION 800

Main effect

of Duration is still observable in the graph

150 Estimated Marginal Means of SCORE

200 msec Average

140

800 msec Average

130 120 110 100 200 DURATION MODALITY Picture Word 800

Interpretation of the Modality by Duration Interaction

150 Estimated Marginal Means of SCORE 140

Several ways of describing the interaction:

130 120 110 100 200 DURATION MODALITY Picture Word 800

Interpretation of the Modality by Duration Interaction

150 Estimated Marginal Means of SCORE 140 130 120 110 100 200 DURATION MODALITY Picture Word 800

“….At the 200 msec duration pictures resulted in scores approximately 20 points higher than words whereas at the 800 msec duration the opposite pattern was true with words producing scores approximately 20 points below pictures ), = 7.2, p F (1,16) = .017.

……”

Interpretation of the Modality by Duration Interaction

150 140 130 120 110 Estimated Marginal Means of SCORE MODALITY

““For words there was a small increase in performance going from the 200 msec (

M=

to the 800 msec duration. With pictures, however, there was a large

decrease

in performance

Picture 100 200 DURATION 800 Word

Alternative Plot – same data

150 140 130 120 110 Estimated Marginal Means of SCORE DURATION

At the 200 msec duration performance was better with pictures ( = 144) than words ( M M = 124) whereas at the 200 msec duration the opposite was true with words giving better performance ( M 127) than pictures ( 103), .017.

F M (1,16) = 7.2, p = = =

200 100 Picture MODALITY Word 800

Extension to factors with 3 Levels

10 extra participants at 500 msec duration - 5 with Words, 5 with Pictures

2 Factor Between (2 levels x 3 levels).sav

The analysis is the same, however the interpretation of the main effect of DURATION is a little more complex: Note the increased Degree of Freedom for Duration and the interaction

Duration Profile Plot: A significant F test only says that ‘not all the means are equal’

To examine individual pair-wise comparisons: 1. If you make a priori predictions about which means you are interested in comparing: You can use Simple T tests (LSD) for 3 means Sidak or Bonferroni for a greater number of comparisons.

2. If you want to make post hoc You can use Tukey’s Test comparisons:

Note that the more conservative Tukey test only finds one significant difference whereas LSD finds two. Note the Tukey test requires equal sample sizes .

There was a significant main effect if stimulus duration, F (2,24) = 8.07, p =.02. Post Hoc comparisons using Tukey’s HSD showed that only the difference between the 200 ( 134.3) and 500 ( M = 101.8) durations was significant, M = p= .001.

3 Factor Designs

3 Factor Between (2 levels x 2 levels x 2 levels).sav

Adding a third ‘Noise’ factor with two levels (Low, High) requires doubling the number of subjects, assuming you still want 5 in each cell. In the following analysis, for ease of interpretation, we will go back to having just two levels of the duration factor (200 vs. 800).

•Logic of the analysis is the same but we now have: • 3 possible main effects : •Duration •Modality •Noise •3 possible • • • 2-way interactions: Duration x modality Duration x noise Modality x noise •1 possible • 3-way interaction Duration x modality x noise

Both main effects of duration and noise significant. 3-way interaction also significant.

Interpreting 3-Way interactions.

•Much easier if you have some predictions about the expected pattern •For instance in this example we might predict that as well as generally decreasing performance high levels of noise might obscure any differences between the picture and word conditions:

3-way interaction is a difference in the pattern of a 2-way interaction at levels of the third factor

There was a significant 3 way interaction between duration, modality and noise, F (1,32)=4.5, evidence of any interaction.

p = .041. In the low noise condition pictures and words produced opposite effects on performance at the two durations. At stimulus presentations of 200msec words gave rise to performance some 20 points lower than pictures whereas the reverse pattern was true for the 800 msec duration. With high noise, however, there was very little

If you want to provide a bit more ‘weight’ to your conclusions concerning the interpretation of the 3-way interaction you could perform a simple interaction effects analysis.

•This is actually very easy • You just run two separate ANOVAs – one at each level of (in this example) the noise factor.

•Each of these analyses has the factors duration and modality but one uses the data from the high noise condition and the other from the low noise condition.

•You then interpret the 2-way interactions between duration and modality at each level of noise

One ANOVA on this data One ANOVA on this data Can then say whether it is true that the interaction on the left (low noise) is significant whilst the one on the right (high noise) is not.

There is one catch – the MS

error

F ratio for the 2-way interactions in each separate analysis needs to be computed using the from the original analysis.

Original 3 Factor ANOVA MS

error

from the original analysis = 400.8 on 32 DF

You now need to run the two separate 2 way ANOVAS on the data from the high and low noise conditions.

On SPSS the easiest way to do this is to first s using the command.

plit split data the data

Any subsequent commands be they Tables, Plots or, as in this case ANOVAs, will now be done separately for each level of the grouping variable (noise):

Having split the data file by the noise variable you now simply perform a 2 way ANOVA, with factors duration and modality as before: Analyse / General Linear Model / Univariate

This factor is left out as it is the one used to split the file SPSS will now compute the two 2 way ANOVAs

This table is simply 2 ANOVA tables put together – one for the low noise data and one for the high noise data.

However the F ratios are wrong as they need to be computed using the MS

error

from the original 3 way ANOVA

Original 3 Factor ANOVA MS

error

from the original analysis = 400.8 on 32 DF

F ratios are simply the result of dividing the Mean Square for the effect by the error Mean Square (MS

error

) E.g. the duration F ratio is simply MS

duration

/ MS

error

For the simple interaction effects follow up we need to compute our own F ratios for the modality by duration interactions at each noise level by substituting the MS

error

from the original analysis.

MS

error

from the original analysis = 400.8 on 32 DF For the low noise interaction the correct For the high noise interaction the correct F F ratio is 2393.978 / 400.8 = 5.97

ratio is 129.97 /400.8 = .32

For the low noise interaction the correct For the high noise interaction the correct F F ratio is 2393.978 / 400.8 = 5.97 ratio is 129.97 /400.8 = .32

5.97

.32

To work out the p value you need either to look it up in F tables.

Or to calculate the exact probability (very easlily) using a package such as Excel:

E.g. To calculate the p value associated with the low noise modality x duration interaction: The value we got was 5.97

This is based on 1 df for the effect and 32 df for error Click in any cell in Excel and type: =FDIST(5.97,1,32) and press return =FDIST(5.97,1,32) NB. Don’t forget the ‘=‘ at the start of the formula Excel then gives the answer: The simple interaction effect at the low noise level was significant, F (1,32) = 5.97, p = .02.

Repeated Measures designs

 These are where

the same subject

is tested in the different experimental conditions    Advantages are that the test is more sensitive Disadvantages – things like order effects, practice effects etc.

Not always possible

in principle

–e.g. if partaking in one condition exposes subjects to information that will ‘ruin’ them for any other condition 2 Factor Within (2 levels x 2 levels).sav

Test is more sensitive because: • Individual differences are controlled for: • e.g. suppose a reaction time study: Some people are just faster average responders than others. What we are usually interested in is the

relative

treatment on performance effect of a Repeated measures (or within subjects) designs examine the relative effect of conditions on individuals

Repeated measures ANOVA on SPSS

•Interpretation of effects from the ANOVA table is the same •Main difference is in the data entry •Designs can be all repeated measures or a mixture •E.g. A two factor repeated measures design could have: •Both factors as repeated measures (or within subjects) Or •One repeated measure and one between subjects measure

Both factors as repeated measures: Each subject is tested under every combination The order of the combinations would normally be randomised for each subject Or Pseudo-randomised so that equal numbers of subjects receive each order (this is the most common method)

Modality by Stimulus duration data Assuming this experiment was carried out with both factors as repeated measures: This is how the data is entered into SPSS.

Each row represents scores from a single subject.

Each subject has 4 data points.

These could be single scores or the average of many trials under that condition. The latter is common with measures such as RT which are ‘inherently noisy’ (i.e. you need to take the average of many raw data points to get a good estimate for that subject under those conditions).

Give the columns meaningful names – the first column contains data from the Duration level 1 (200msec) and Modality level 1 (picture). You can use short hand for the actual column names and put the longer, more meaningful, description as the variable label:

To avoid confusion later the columns should always be ordered in a hierarchy - take a 3 Factor example (all with 2 levels and where F1(1) = Factor 1 Level 1):

F1(1) F1(2) F2(1) F3(1) F3(2) F2(2) F3(1) F3(2) F2(1) F3(1) F3(2) F2(2) F3(1) F3(2)

To run the analysis:

First Factor is Duration and this has two levels NB – the first factor is the one at the top of the hierarchy:

Order in which you define the factors in SPSS

1 F1(1) F1(2) 2 F2(1) 3 F3(1) F3(2) F2(2) F3(1) F3(2) F2(1) F3(1) F3(2) F2(2) F3(1) F3(2)

Second factor is modality – with two levels This sets up all the factors – now click Define to tell SPSS where the columns are that correspond to each factor level combination

The first question mark is asking “where is the column containing the data from level 1 of factor 1 and level 1 of factor 2?” This is our column 1 (d1m1)

Note at the top where it says Within-Subjects Variables you get a reminder of which is the first and second factors. The order we defined the factors in was duration then modality hence at the top we have (duration, modality). The numbers in the brackets refer to the levels of the corresponding factors.

The process continues until all the within subject variables have been set up. NB: only when you set up the factors in the data sheet according to the hierarchy and define the factors starting from the top of the hierarchy will they be in the correct order already.

Once set up you can use the plots and options (display means) in exactly the same way as with between subjects designs.

SPSS Output This is not quite the same as for between subjects designs.

The first box just summarises the within-subjects factors and allows you to check that they have been entered in the right order:

You can ignore the multivariate tests output unless you have special reason to question certain assumptions.

Mauchly’s Sphericity test is an important assumption test –it is the repeated measures equivalent of Levene’s test for homogeneity of variance. IT SHOULD NOT BE SIGNIFICANT. NB when, as in this case, a factor only has two levels the sphericity cannot be violated and there is never a problem. The dots in the SIG column simply mean that the test is not appropriate.

The Tests of Within Subjects Effects are where you find the significance tests for all your within subjects factors and any interactions involving any within subjects factor. Highlighted here is the test for a main effect of Duration.

If for any test there is no violation of sphericity use the ‘sphericity assumed’ F and p value.

Suppose the test of sphercity for the interaction had given a significant result ( p <.05). Then when you came to interpret the interaction effect in the main ANOVA table you would use the Greenhouse-Geiser adjustment:

Notice also that in a repeated measures design any within subjects variable has its own error term and this should be checked when giving the DFs for a test: E.g. …..interaction was significant F (1,19) = ….

Here the 1 , as before, comes from the DF associated with the test of the interaction and the 19 comes from the DF associated with the Specific Duration x Modality error term.

The Tests of Within-Subjects contrasts only really apply •When you have a factor with more than 2 levels and • You want to test for a particular trend (e.g. that performance increases in a straight line (linear) fashion as drug dosage increases.

Plots and tables of means can be interpreted in exactly the same way as between subjects designs.