META-ANALYSIS - Researcher Education Programme

Download Report

Transcript META-ANALYSIS - Researcher Education Programme

Meta-analysis
Definition
“Meta-analysis refers to the analysis of analyses...
the statistical analysis of a large collection of
analysis results from individual studies for the
purpose of integrating the findings”
(Glass 1976)
When is meta-analysis useful?
Recurrent issues:
• Small effect sizes: e.g. average r in ecology and
evolution is often lower than 0.20 (Møller &
Jennions 2002)
• Studies on small and/or highly variable sample size
• Debated theories
Møller AP, Jennions MD. 2002. How much variation can be explained by ecologists and
evolutionary biologists? Oecologia 132: 492-500
Benefits of Meta-Analysis – An Example using
a funnel plot
Benefits of meta-analysis
• Gives the general effect of a given phenomenon (e.g. the effect
of group size on fitness in primates; Majolo et al 2008)
• It controls for variance in the data by using effect size
• Effect size: a statistical measure portraying the degree to which
a given event is present in a sample (Cohen, 1969). The type of
measure is called the effect, and its magnitude is considered an
effect size
• The effect size of a meta-analysis is greater than the effect sizes
of the single studies on which it’s based
Majolo B, de Bortoli Vizioli a, Schino G (2008) Costs and benefits of group living in
primates: group size effects on behaviour and demography. Anim Behav 76:1235-1247
Steps to run a meta-analysis
• Select research question: highly studied but debated topic?
• Select criteria for data to be included in your dataset (very
important to avoid biases!)
• Collect data from previous studies (published or not)
• Calculate effect size (chosen based on type of data available, e.g.
means, standard deviations, correlation coefficients, and so on)
– Statistics necessary for chosen effect size can be obtained
from various sources, e.g. p value, F, t, chi-square…
• Calculate variance of your dataset
• Run analysis with dedicated software (e.g. STATA)
Problems of meta-analysis
•
Meta-analysis is usually run on published studies and thus the
researcher has limited power on data availability or experimental design
•
Usually required a minimum of 25 studies (sample points) but often
meta-analyses have been published with smaller sample sizes
•
Meta-analysis is run at the within-study level: effect size is calculated
for each study (so each study has to have, e.g., data on a control and an
experimental group)
•
Publication bias: the tendency to publish studies only with significant
results may bias data used in a meta-analysis
•
Test for publication bias need to be performed to make sure this factor
does not affect results (e.g. Begg’s or Egger’s test)
Reading material (available in the library)
Hedges L.V. & Olkin I. (1985). Statistical methods for
meta-analysis. Academic Press
Stangl D.K. & Berry D.A. (2000). Meta-analysis in
medicine and health policy. E-Book
Generalised Linear Mixed Models
(GLMMs)
Some recurrent problems
Data are often clustered or hierarchically structured,
e.g.:
– Children are nested within schools
– Subjects come from different populations / study
sites / cultures
– Several (repeated) observations are collected on
the same individual
We need to take these clusters into account…
An example of the relationship between
exercise and blood pressure – Missing
important information?
12.00
relative blood pressure
10.00
8.00
6.00
4.00
R Sq Linear = 0.401
2.00
0.00
2.00
4.00
6.00
8.00
hour excercises per week
10.00
12.00
Same example as previous slide (relationship
between exercise and blood pressure) but this
time we look at individual scores
individuals
12.00
1.00
2.00
3.00
4.00
5.00
6.00
10.00
7.00
relative blood pressure
8.00
9.00
1.00
2.00
3.00
8.00
4.00
5.00
6.00
7.00
8.00
6.00
9.00
R Sq Linear = 1
R Sq Linear = 1
R Sq Linear = 1
4.00
R Sq Linear = 1
R Sq Linear = 1
2.00
0.00
2.00
4.00
6.00
8.00
hour excercises per week
10.00
12.00
Some problems with (RM) ANOVAs - 1
• Missing values: a subject is excluded from the
analysis if one datum is missing
• Not possible to include covariates on each
time/condition measurement: this is a problem as
often various factors change across conditions (e.g.
age)
• Needs equal spacing among conditions (e.g.: time 1,
time 2, time 3)
• Developmental trajectories difficult to model (e.g.
growth curves)
Some problems with (RM) ANOVAs - 2
• Differences in individual behaviour not detectable,
so we may miss important information
• Not easy to analyse more complex designs:
– individuals nested within families or groups
– students nested in class, classes nested in
schools, schools nested in countries...
• Only available for continuous and normal distributed
data
For example:
Factors affecting reconciliation in macaques
(Majolo et al 2009)
Aggressor_ID
Victim_ID
Aggression_type
Context
Reconciliation
A
B
Threat
Social
Yes
A
C
Bite
Feeding
No
C
A
Bite
Feeding
Yes
D
C
Threat
Social
No
A
B
Bite
Feeding
No
B
A
Threat
Feeding
Yes
Majolo B., Ventura R. & Koyama N.F. (2009a). A statistical modelling approach to the
occurrence and timing of reconciliation in wild Japanese macaques. Ethology, 115: 152-166.
GLMMs - 1
Solve most (all) of the problems encountered with
ANOVAs:
– DV can be continuous or dichotomous
– Individual ID can be incorporated (as a random
factor) and controlled for (thus we can have
multiple observations on the same subject without
the risk of sample inflation)
– Different fixed factors and covariates can be added
for each condition or observation time
– Missing data do not result in sample reduction
GLMMs - 2
• Random factors: variables from which you want to
obtain a more general result from your dataset
– E.g. You have to control for your subject IDs but
you want to generalise your finding to the whole
study population
• Fixed factors: variables for which you are interested
in their specific effect on the DV
– E.g. Gender (male vs female) or treatment
conditions are fixed factors (you cannot generalise
their effects on the DV to more treatments or sex)
GLMMs - 3
• Model selection may be used to choose the model
with the best fit
• One measure frequently used is the Akaike
Information Criterion (AIC)
• A lower AIC corresponds to a better fit of the model
Same example as before:
Factors affecting reconciliation in macaques
(Majolo et al 2009)
Aggressor_ID
Victim_ID
Aggression_type
Context
Reconciliation
A
B
Threat
Social
Yes
A
C
Bite
Feeding
No
C
A
Bite
Feeding
Yes
D
C
Threat
Social
No
A
B
Bite
Feeding
No
B
A
Threat
Feeding
Yes
Majolo B., Ventura R. & Koyama N.F. (2009a). A statistical modelling approach to the
occurrence and timing of reconciliation in wild Japanese macaques. Ethology, 115: 152-166.
Reading material (available in the library)
• Ho R. (2006). Handbook of Univariate and Multivariate
Data Analysis and Interpretation with SPSS.
Chapman & Hall.
• Tabachnick B.G. & Fidell L.S. (2001). Using
multivariate statistics. Allyn & Bacon.
• West B., Welch K.B. & Galecki A.T. (2006). Linear
Mixed Models: A Practical Guide Using Statistical
Software. Chapman & Hall.