Quantitative Measures - University of Oxford

Download Report

Transcript Quantitative Measures - University of Oxford

Statistics for Linguistics
Students
Michaelmas 2004
Week 6
Bettina Braun
www.phon.ox.ac.uk/~bettina/teaching.html
Overview
• Recap
• X2-test for frequency data
• Introduction to Analysis of variance
(ANOVA)
– One-factor between-subjects design
– Two-factor between-subjects design
What do we report from the results
table?
Differences in the mean
There is a signficant
difference (t=2.94,
df=15, p = 0.01)
What are frequency data?
• Frequency count
– Number of subjects/events in a given category (e.g.
number of high and low accents)
– What about the number of correct responses of
different subjects? this is not frequency data!!!
• X2-test for
– Test of deviation from expected frequencies: Test
whether the observed frequencies deviate from
expected frequencies (e.g. using a dice, there is an a
priori chance of 16.67% for each number)
– Test of association: Finding relationship between two
or more independent variables (e.g. test relation
between gender and the use of high or low accents?)
X2-test for deviation from expected
frequencies
• Null-hypothesis: there is no difference between
expected and observed frequencies
• Example data
1
observed 12
2
17
expected 16.7 16.7
3
16
4
18
5
13
6
24
total
100
16.7
16.7
16.7
16.7
100
Have to be identical
• Calculation
= 5.8
Looking up the p-value
Calculated value for X2
must be larger than the
one found in the table
to be significant
Degrees of freedom:
• If there is one
independent variable
df = a – 1
• If there are two
independent variables:
df = (a-1)(b-1)
Further Notes
• The X2 test for the deviation from expected
frequencies can be used for one
independent variable only
• If the independent variable has only two
levels (e.g. high vs. low accent), a
correction for continuity has to be used
-0.5)2
X2 as test of association
• Calculation of expected frequencies:
Row total x column total
Cell freq =
Grand total
Past tense Present
tense
Progressive 308
476
Non315
297
progressive
Total
623
773
Aspect
total
784
612
1396
Useful checks
• Sum of expected frequencies must eqal to
the sum of the observed frequencies
fo = fe = N
• The sum of the observed frequencies
minus the expected frequencies must
equal zero
(fo – fe) = 0
X2-test
• Limitations:
– All raw data for X2 must be frequencies
(not percentages!)
– Each subject or event is counted only once,
i.e. contributes to only one cell value
(strictly between-subjects)
– The total number of observations should be
greater than 20
– The expected frequency in any cell should be
greater than 5
An Example
• You want to test how well non-Chinese speaking
students can learn Chinese characters using
different kinds of mnemonic. There are three
groups of subjects, one with no mnemonic, one
with mnemonic 1 and one with mnemonic 2. You
count how many characters were correctly
recalled.
• What are the independent and dependent
variables? IV: kind of mnemonic, DV: recall
– How many levels does the IV have?
3
– What is the type of the dependent variable? interval
ANOVA: general
• Analysis of variance
– Test the null-hypothesis that all the samples
are taken from the same population
– compares the variances within the samples
(random error) to the variance between the
samples (systematic error)
– If the variances between the samples are
larger than the variances within the samples,
we can reject the null-hypothesis
ANOVA: limitations
• All samples must be selected randomly
• The scores must be interval
• The scores in the samples must be
normally distributed
• The variances of the samples must be
homogenious
• There need to be an equal number of
scores in each sample
ANOVA: general
• Conventions
– Independent variables are called `factors´
– ANOVA calculates an F-statistic that
determines whether the null-hypothesis can
be rejected or not
– In SPSS, you find the ANOVAs in
Analyze => General Linear Model
• “univariate”: analysis of one DV (between)
• “multivariate”: for more than one DV (between)
• “repeated measures”: within-subjects designs
F-statistic
• F-statistic is the ratio of the between-group
variance to the within-group variance.
It has to be larger than
a critical value in a table
• The p-value of the F-statistic depends on two dfvalues  F(dfn, dfd) = value
– Df of the numerator dfn=k-1
– Df of the denominator dfd=N-k
(N: number of scores in sample, k: number of groups)
Reporting the F-value
• As the p-value of the F-statistic depends
on two df-values, you have to report them
(similarly to the t-value, the df, and the pvalue for t-tests!)
• Suppose, we have 3 groups (3 levels of an
independent variable), and 12 scores per
group, we report the F-statistic as follows:
F(2,9) = 2.9, p = ???
Critical values for the F-statistic
…
One factor between-subjects
ANOVA
• If the independent variable has two levels, the
results are comparable to an independent t-test
(F = t2)
• If we have more than two levels, we could in
principle run multiple independent t-tests
• BUT: This increases our Type I error
– With one test we can be 95% sure our conclusion is
correct
– With two tests, this percentage drops to 0.95 * 0.95 =
0.90 (we can only be 90% sure of our conclusion)
– With even more tests …
One factor between-subjects
ANOVA
• A one-factor ANOVA corrects for this increased
risk of a Type I error
• There are fixed factors and random factors:
– If you choose the IV to be a fixed factor, the model is
calculated for just the levels of independent variable
you have (e.g. gender, accentedness)
– If you choose the IV to be a random factor, you want
to generalise from the levels of your independent
variable to other levels (e.g. IV variable contains three
different degrees of blood alcohol but you want to
generalise the effect of e.g. speech control to other
levels)
SPSS output
Ignore these!
• There is a significant effect of mnemonicness on
the number of characters recalled: But between
F (2,27) = 17.7, p < 0.001
which of the
groups??
Post-hoc tests
• If the IV has more than 2 levels, we have to do
post-hoc tests to find out, which of the groups
are significantly differet
• Scheffé test:
– Suitable for pair-wise comparison between all groups
– Corrects for the increased risk of an Type I error
(most conservative post-hoc test)
• Dunnet test:
– Useful for “planned comparisons”, e.g. comparing two
different groups against a control group
– Less stringent than Scheffé
Post-hoc tests with SPSS
SPSS output for multiple
comparisions (here Scheffé test)
• Significant differences between “no
mnemonic” and the other two groups.
SPSS output for homegenous
subsets (Scheffé test)
• There are two subsets
Two factor between-subject ANOVA
• In an ANOVA, you can also investigate the effect
of more than one independent variable
• This is called a factorial design
• Example: You would like to investigate how the
two diff. speech rates affect the duration of
words in sentence-initial, -medial, and -final
position.
• What are the IV and DV? IV: speech rates, position
– How many levels do the IV have? 2 and 3
interval
– What is the type of the DV?
Factorial Design:
Example
• Every level of each factor is combined with
every level of the other factor
• Factorial designs have to be completely
randomised, i.e. every group contributes
data to only one cell
Initial
Medial final
Fast Group1 Group2 Group3
Slow Group4 Group5 Group6
Factorial Design:
Main effects and interaction
• For every variable we can find significant main
effects. Would you expect main effects here?
– we would expect to find a main effect of speech rate
on duration
(i.e. higher speech rate => shorter durations)
– Also, there might be a main effect of position
(final segments undergo phrase-final lengthening,
early and medial ones don’t
• An interaction would indicate that the effect of
one IV is different in the conditions of another IV
Factorial Design:
Hypothesised results
Does this graph show an interaction?
Slow speech
duration
Fast speech
Non-parallel lines
always show
interactions!!!
Initial
medial
position
final
Factorial Design:
Degrees of freedom
• Degrees of freedom are different for stating main
effects and for interactions
• Numerator (the first value in round brackets):
– For main effects: df = k -1
(number of groups -1)
– For interactions: df = (k-1)*(j-1)
(k, j: number of levels in IV)
• Denominator (second value in round brackets):
– for both: df = N-j*k
– Note: df for denominator is always found in row
labelled “error”
Output for factorial design:
Please interpret
Inter-
action
From http://www.uvm.edu/~dhowell/gradstat/psych341/lectures/Factorial1Folder/class4.html
• Significant main effect of task: F(2,126)=132.9,
p<0.001
• Signficant interaction: F(4,126)=6.3, p<0.001
Why are inhomogenious variances
a problem?
• Assume that the means and the variances
are correlated (i.e. a higher mean in one
sample coccurs with a higher variance in
that sample – possibly caused by outliers
and extreme values)
– Then the mean is very unreliable
– Since the ANOVA compares the variances
and the means, you might get a significant
difference which is not actually in the data!