Document 7847805

Download Report

Transcript Document 7847805

mean/meðaltal
standard deviation/staðalfrávik
sample size/úrtaksstærð
standard error/staðalvilla
effect size/stærð áhrifa
power/styrkur
meta-analysis/eftirgreining
replication/endurtekning
p-gildi er < 0,01
Er áhrif
klínískt
marktækt?
ARA0103
Aðferðafræði Rannsókna
Fyrirlestrar 16 og 17
Power, Effect Sizes, Meta-Analysis, and Replication
staðalvilla
27/05/2016
s
sx 
(n, úrtaksstær ð)
n
Dr Andy Brooks
1
blood pressure/blóðþrýstingur
measurement error/mælingarvilla
Diastolic Blood Pressure Data
Hvað er mælingarvilla? ±1, ±0,01?
27/05/2016
Dr Andy Brooks
2
Null hypothesis
Núlltilgáta
Hópur 1
Hópur 2
• It is suspected that staff working in an old hospital
have different blood pressures to staff working in a
new hospital.
– Working conditions in the both hospitals can be stressful
for different reasons.
• The null hypothesis is that, on average, there is no
difference between blood pressures.
– Mismunur er ekki til.
• We start by taking a random sample/slembiúrtak of
10 from the old hospital (Group 1) and the new
hospital (Group 2).
27/05/2016
Dr Andy Brooks
3
t-próf
Group 1
Group 2
t-Test: Two-Sample Assuming Unequal Variances
76
76
76
70
74
82
Mean
70
90
Variance
80
68
Observations
68
60
Hypothesized Mean Difference
90
62
df
70
68
t Stat
0,9335
90
80
P(T<=t) one-tail
0,1814
72
74
t Critical one-tail
1,7341
P(T<=t) two-tail
0,3629
t Critical two-tail
2,1009
Variable 1
Variable 2
76,6
73
62,2667
86,4444
10
10
0
18
Excel
27/05/2016
Dr Andy Brooks
4
fjöldi í hverju hólfi sniðsins
Niðurstaða (n=10)
• Although there is an average difference of 3,6
we cannot reject the null hypothesis.
–
–
–
–
p-gildi er 0,36 og miklu stærri en 0,05
Ekki hægt að hafna núlltilgátuna
Segjum “núlltilgátan er rétt”
The standard deviations are large compared to the
average difference of 3,6.
• Group 1
• Group 2
staðalfrávik = 7,9
staðalfrávik = 9,3
– The standard errors of the means are only slightly
less than the average difference of 3,6.
• Group 1
staðalvilla = 7,9/√10 = 2,5
“segjum 2,7”
• Group 2
27/05/2016
staðalvilla = 9,3/√10 = 2,9
Dr Andy Brooks
5
standard error bar/staðalvillusúla
Diastolic Blood Pressure
Graph showing standard error bars
80
78
76
skörun er mikil
74
72
70
68
0
1
2
Hospital
• Standard error bars are approximate (± 2,7).
• (Standard error not standard deviation bars are shown.)
27/05/2016
Dr Andy Brooks
6
Confidence Interval (CI)/Öryggisbil
• The 1-α confidence interval for the
population mean/þýðismeðaltal μ is:
xt
s
( df ,  / 2 )
n
til x  t
s
( df ,  / 2 )
n
df  n -1
degrees of freedom/frígráður
27/05/2016
The critical values of t can be read from tables
in statistical books or calculated using
statistical software (t.d. TINV in Excel).
Dr Andy Brooks
7
95% Öryggisbil
• n = 10
• df (frígráður) = n-1 = 9
• 5% in the tails
– 2,5% left tail, 2,5% right tail
• From a table of the t-distribution, the
multiplier is 2,26.
• 2,26 * 2,7 (staðalvilla) ≈ 6,1
27/05/2016
Dr Andy Brooks
8
one tail
0,05
0,025
0,01
0,005
two tail
0,1
0,05
0,02
0,01
3
2,35
3,18
4,54
5,84
4
2,13
2,78
3,75
4,60
5
2,02
2,57
3,36
4,03
6
1,94
2,45
3,14
3,71
7
1,89
2,36
3,00
3,50
8
1,86
2,31
2,90
3,36
9
1,83
2,26
2,82
3,25
10
1,81
2,23
2,76
3,17
11
1,80
2,20
2,72
3,11
12
1,78
2,18
2,68
3,05
13
1,77
2,16
2,65
3,01
14
1,76
2,14
2,62
2,98
15
1,75
2,13
2,60
2,95
16
1,75
2,12
2,58
2,92
17
1,74
2,11
2,57
2,90
18
1,73
2,10
2,55
2,88
Critical Values of
Student´s t-Distribution
df
27/05/2016
BOOKTABLE6
9
Diastolic Blood Pressure
Graph showing 95% confidence intervals
84
79
skörun er mikil
74
69
64
0
1
2
Hospital
In research papers, sometimes it is not clear if standard error bars or
standard deviation bars or 95% confidence intervals are being shown.
27/05/2016
Dr Andy Brooks
10
possible error/hugsanleg villa
real difference/raunverulegur mismunur
Possible error in conclusion
• If there is a real difference, on average, of 3,6 in
diastolic blood pressures,
– then it is an error to accept the null hypothesis that
there is no difference.
• svo er mistök að segja núlltilgátan er rétt.
• If there is a real difference, on average, of 3,6 in
diastolic blood pressures,
– then our samples (n=10) were not big enough/ekki nógt
stórt
• the standard errors of the means are too big
• our statistical test did not have enough power to detect a
difference in means as small as 3,6
27/05/2016
Dr Andy Brooks
11
Type I error/mistök af tegund I
Type I and II errors
• A Type I error is rejecting the null hypothesis
when it is true.
– Að hafna réttri núlltilgátu.
– The probability of a Type I error is α.
• α is usually 0,05 or 0,01
– gengur vel ef þú getur sagt p-gildi minna en 0,01 frekar en 0,05
• Samples from two groups which have the same population
mean can produce what appears to be a statistically
significant difference 1 time in 20 or 1 time in 100.
• A Type II error is accepting the null hypothesis
when it is false.
– Að hafna ekki rangri núlltilgátu
– The probability of a Type II error is β.
• The power of a statistical test/styrkur tölfræðiprófs is 1- β.
27/05/2016
Dr Andy Brooks
12
Warning/Viðvörun
• If α = 0,05
– There is a 1:20 chance you have committed a Type I error.
• If α = 0,01
– There is a 1:100 chance you have committed a Type I error.
• If α = 0,001
– There is a 1:1000 chance you have committed a Type I error.
• If your sample size is small:
– Statistical power/styrkur may be very low.
– And you may easily commit a Type II error.
– β can be calculated for a test knowing the size of the
effect/stærð áhrifa you are looking for, sample
size/úrtaksstærð, and α level/alfastig.
27/05/2016
Dr Andy Brooks
13
We measure another 40 workers at each hospital...
fjöldi í hverju hólfi sniðsins
t-próf (n=50)
t-Test: Two-Sample Assuming Unequal Variances
Variable 1
Mean
Variance
Observations
Hypothesized Mean Difference
df
Variable 2
76,6
73
57,1837
79,3878
50
50
0
95
t Stat
2,1782
P(T<=t) one-tail
0,0159
t Critical one-tail
1,6611
P(T<=t) two-tail
0,0319
t Critical two-tail
1,9853
27/05/2016
Excel
Dr Andy Brooks
14
standard error bar/staðalvillusúla
Graph showing standard error bars
Diastolic Blood Pressure
(n=50)
78
77
76
75
74
73
72
71
0
1
2
Hospital
• Standard error bars are approximate (± 1,1).
27/05/2016
Dr Andy Brooks
15
Graph showing 95% confidence intervals
Diastolic Blood Pressure
(n=50)
80
78
76
74
72
70
0
1
2
Hospital
2,02 * 1,1 (staðalvilla) = 2,22
27/05/2016
Dr Andy Brooks
16
point estimate/punktspá
effect size/stærð áhrifa
fjöldi í hverju hólfi sniðsins
Niðurstaða (n=50)
• Við höfnum núlltilgátuna. An increased sample size has
given us the power to detect a difference.
– Núlltilgátan er röng, hin tilgátan er rétt.
– Tölfræðileg marktekt p = 0,03 (< 0,05)
• The point estimate for the effect size is 3,6.
• En er áhrif klínískt marktækt?
– No ?
• Standard deviations are large at both hospitals.
– Maybe we should be seeking explanations/útskýringar for these
large standard deviations.
• Hvaða fólk er að reykja?
• Hvaða folk er með yfirvinnu?
• Hvaða fólk er með næturvakt?
– Maybe we should test to see if the standard deviations are
statistically different?
27/05/2016
Dr Andy Brooks
17
effect size/áhrifsstærð
Effect Sizes two sample case
• The size of the effect is usually normalised with
respect to the standard deviation.
• The effect size, assuming a common variance, is
given by:
• Cohen proposed:
1   2


– 0,2 is small effect size
– 0,5 is a medium effect size
– 0,8 is a large effect size
27/05/2016
Dr Andy Brooks
18
estimate/spágildi
Estimate of effect size
diastolic blood pressure experiment
• Point estimate/punktspá of difference
between means = 3,6
• Estimate of variance = 70
– For simplification, we assume a single
common variance in the diastolic blood
pressure experiment.
• Estimate of standard deviation = 8,3666
• Estimate of effect size = 0,4303
– a small to medium effect
27/05/2016
Dr Andy Brooks
19
http://www.stat.uiowa.edu/~rlenth/Power/
Java applets for power and sample size by Russ Lenth
power = 0,04
27/05/2016
Dr Andy Brooks
20
Java applets for power and sample size by Russ Lenth
power = 0,15
27/05/2016
Dr Andy Brooks
21
Java applets for power and sample size by Russ Lenth
power = 0,32
27/05/2016
Dr Andy Brooks
22
Java applets for power and sample size by Russ Lenth
power = 0,57
27/05/2016
Dr Andy Brooks
23
unacceptable/óaðgengilegur
α and β
• As the α level gets more strict (0,05 -> 0,01), then
you have less power β.
– There is less chance of a Type I error.
– But more chance of a Type II error (1-β).
• As the α level gets less strict (0,01 -> 0,05), then
you have more power β.
– There is more chance of a Type I error.
– But less chance of a Type II error (1-β).
• Some researchers use an α level of 0,10, but this
means a 1:10 chance of making a Type I error.
– Many researchers find an alpha level of 0,10 to be
unacceptable.
27/05/2016
Dr Andy Brooks
24
More power
• A power of 0,5 means there is a 50%
chance your experiment will fail to detect a
difference that is real.
– If an experiment costs $10 million to run, you
want a power of 0,99 and not 0,5.
• There may be no way of estimating power
until you have performed the experiment.
– Previous results by other researchers can
sometimes be used to estimate the effect size.
27/05/2016
Dr Andy Brooks
25
Power calculations/Styrksútreikningar
• Power calculations get more complicated with more
complicated experimental designs.
• Power calculations get more complicated when group
sample sizes and/or group variances are unequal.
• Professional software exists to support calculations of
power for many types of statistical tests.
– www.power-analysis.com
• Power calculations are impossible unless you have an
estimate of the effect size.
• In research papers, a power analysis is often not reported
because a power analysis was never done. It is becoming
more common to insist a power analysis is done before a
research paper is accepted for publication.
27/05/2016
Dr Andy Brooks
26
Missing effect size ?
• In the absence of previous results, group
sample sizes should be at least 10.
– Have at least 20 participants if you plan to
randomize patients into two groups of 10 and
use an independent two-sample t-test.
• Try if possible to have large numbers in
each group (20, 30, 40, or 50...).
– The more the better.
27/05/2016
Dr Andy Brooks
27
Java applets for power and sample size by Russ Lenth
power = 0,39
27/05/2016
Dr Andy Brooks
28
Java applets for power and sample size by Russ Lenth
power = 0,86
27/05/2016
Dr Andy Brooks
29
descriptive statistics/lýsandi tölfræði
outlier/einfari
Failure to reject the null hypothesis
Ekki hægt að hafna núlltilgátuna?
• If you cannot reject the null hypothesis, use descriptive
statistics (average, standard deviation, standard error,
minimum, maximum), histograms, boxplots and line graphs
to present, compare, and interpret the data.
• What happens if you use an α of 0,10 ?
– This may allow you to interpret the experimental results statistically,
but you need to emphasis the need to repeat the experiment with
bigger samples.
• Try and estimate the power of the experiment
retrospectively.
– This can help future researchers.
• Find explanations of any outliers.
– Sometimes this is where the real results of an experiment are.
27/05/2016
Dr Andy Brooks
30
meta-analysis/eftirgreining
Meta-analysis
Tilraun 1
Tilraun 2
Tilraun 3
Tilraun 4
Tilraun 5
• Meta-analysis involves examining the results of
experiments with the same null hypothesis.
• A meta-analysis can simply involve counting the
number of research papers that conclude the
effect was present against the number of papers
that conclude there was no effect.
– Counts are based on the best quality experiments
(t.d. Randomized Control Trial/Hrein Tilraun).
– Simple counting of research papers is viewed by
many researchers as insufficient.
• The data has to be combined statistically.
27/05/2016
Dr Andy Brooks
31
Meta-analysis
• Another form of meta-analysis involves pooling
together raw data from several experiments.
– að samlaga óunnin gögn úr nokkrir tilraunum
• This pooling together data effectively increases
group sample sizes and so increases the power
of any statistical tests applied.
– If we have data for 5 experiments where group
sample sizes were 10, in the meta-analysis, group
sample sizes become 50.
fjöldi í hverju hólfi sniðsins
27/05/2016
Dr Andy Brooks
32
specialsoftware/sérstakur hugbúnaður
Meta-analysis
• Another form of meta-analysis involves
pooling together effect size estimates from
several experiments.
– Að samlaga áhrifastærðir
• Special software exists to support metaanalytic procedures.
– t.d. RevMan from the Cochrane Collaboration.
• Dæmisaga, Fyrirlestur 10
27/05/2016
Dr Andy Brooks
33
Dæmisaga, Fyrirlestur 10
Fig 3 Relative risk for mortality
forest plot
27/05/2016
Dr Andy Brooks
(c) BMJ
34
other explanations/aðrar útskýringar
dispute/rökræða
Replication/Endurtekning
• The results from a RCT may be wrong.
– A cause-and-effect relationship does not exist.
• 0rsakatengls er ekki til
• Gæti verið að aðrar útskýringar eru til
• People only start believing the result when the
RCT is successfuly replicated by other research
teams.
• The results from several RCTs can be combined
in a meta-analysis.
– Even the results of a meta-analysis can be disputed...
27/05/2016
Dr Andy Brooks
35
improve/bæta við
reliability/áreiðanleiki
validity/réttmæti
Replication/Endurtekning
• Often, when you decide to replicate an
experiment, you also improve the experiment:
– Measure O1 to check the groups are equal.
– Breyta spurningalista
• Bæta við spurningar, taka burt spurningar, breyta orðalag,...
– Use a different questionnaire, one which has been
validated and shown to be reliable.
• But if you make too many improvements, you
might be running a different experiment.
– Svo er ekki hægt að samlaga óunnin gögn, osf.
27/05/2016
Dr Andy Brooks
36