ANOVA: ANalysis Of VAriance

Download Report

Transcript ANOVA: ANalysis Of VAriance

ANOVA:
ANalysis Of VAriance
In the general linear model
x = μ + σ2(Age) + σ2(Genotype) + σ2(Measurement) + σ2(Condition) + σ2(ε)
Each of the terms σ2 can be questioned.
Moreover, their particular combinations can be studied:
x = μ + … σ2(Age X Genotype) +…+ σ2(Age X Genotype X Condition) + … + σ2(ε)
…discrete classes (~bins, levels etc.) for
one variable,..
Y
Class 1
Class 2
X
Sampling
• Random
• Should provide sufficient sample size
given the signal/noise ratio
• The population from which the sample is
taken should correspond to the studied
general population
• While only comparing two means, ANOVA
will give the same results as the normal ttest.
• However, it allows comparing multiple
means and thus multiple groups (factor
levels) as well as multiple factors
simultaneously.
Basic terms
• Factor: an independent variable to be tested in the ANOVA
design. Example: gender
• Factor level: an individual value of the variable specifying
the factor, defines a group of observations. Example:
MALE
• Observation: an individual element of the dataset; shall
have unambiguously identified factor levels it belongs to
• ANOVA design: a chart to delineate which factors are
analysed, with which level and in which combinations
• Factor interaction: a cumulative action of more than one
factors that cannot be predicted from their known individual
signals
• Effect: a signal of a factor or of an interaction of factors
Basic terms
• Sum of squares, SS: the sum of squared
individual deviations from a mean (~the
cumulative estimate of the variability due to the
factor in the dataset)
• Number of degrees of freedom, df: an estimate
of the number of individual elements that have
contributed to SS
• Mean square, MS: SS/df, the normalized
measure of the variability due to the factor
ANOVA (as well as t-test) takes into account:
• Mean differences (~effect magnitude)
• Variance (~noise magnitude)
• Sample size (as a measure of potential bias)
P(H0) = f(SSF, SSe, df)
To estimate every effect, all the 3 components
shall be known for it! In ANOVA, due to its
complexity, it is more problematic than in ttests
• The core ANOVA test:
F = MSfactor / MSerror
The F value is distributed in accordance with
the F statistics, and provides a p-value for
the null hypothesis (σ2(effect) = 0) given
the dffactor and dferror
A factor effect is easier to prove if:
• The mean difference is bigger
• The residual variance is smaller
• The sample is larger
• Fixed effect factors: levels are deliberately
arranged by the experimenter, rather than
randomly sampled from an infinte
population of possible levels: to study the
effects of EXACTLY THESE levels of
specific research interest.
• Random effect factors: levels sampled
from a population of “possible levels”
instead: to study the effect of the factor in
general
A simple criterion for deciding whether an effect in
an experiment is random or fixed is to determine
how you would select (or arrange) the levels for
the respective factor in a replication of the study.
For example, if you want to replicate a school
study, you would choose (take a sample of)
different schools from the population of schools.
Thus, the factor "school" in this study would be a
random factor. In contrast, if you want to
compare the academic performance of boys to
girls in an experiment with a fixed factor Gender,
you would always arrange two groups: boys and
girls. Hence, in this case the same (and in this
case only) levels of the factor Gender would be
chosen when you want to replicate the study.
Variance components
Factor 1, 0
Factor 3, 0
1*3, 0
Error, 0
• The estimates of
σ2(a factor) derived
from the ANOVA
results: MSs, Ns, etc.
Allow not only prove an
effect of the factor,
but to show its
strength. Especially
useful to compare
multiple ANOVA
results with each
other.