DEPARTMENT FOR HEALTH Sport, Health and Exercise Science MAGNITUDE-BASED INFERENCES An alternative to hypothesis testing CONTACT Dr Sean Williams | [email protected] –Sean Williams.

Download Report

Transcript DEPARTMENT FOR HEALTH Sport, Health and Exercise Science MAGNITUDE-BASED INFERENCES An alternative to hypothesis testing CONTACT Dr Sean Williams | [email protected] –Sean Williams.

DEPARTMENT FOR HEALTH
Sport, Health and Exercise Science
MAGNITUDE-BASED INFERENCES
An alternative to hypothesis testing
CONTACT
Dr Sean Williams | [email protected]
–Sean Williams
Magnitude-based inferences
Lecture outline

Limitations of null hypothesis significance
testing (NHST)



y
Confidence intervals
Smallest worthwhile effects
Limitations of magnitude-based inferences
CONTACT
Dr Sean Williams | [email protected]
2
Magnitude-based inferences (MBI)
Magnitude-based inferences
Null hypothesis significance testing
•
•
•
A major aim of research is to make an inference about an effect in a
population based on study of a sample.
Null-hypothesis testing via the P-value and statistical significance is the
traditional approach to making an inference.
P-value is the probability of obtaining the observed result, or more
extreme results, if the null hypothesis is true.
Observed data point
One-sided p-value
0
Set of possible results
(under the null hypothesis)
3
CONTACT
Dr Sean Williams | [email protected]
Probability
Most likely observation
Magnitude-based inferences
Limitations of NHST
 P-values are difficult to understand!
 P ≤ 0.05 is arbitrary
 Some useful effects aren’t statistically significant
 Some statistically significant effects aren’t useful
 P > 0.05 often interpreted as unpublishable
 So good data don’t get published
 Ignores ‘judgement’, leads to dichotomised thinking
 Statistical Hypothesis Inference Testing?
4
CONTACT
Dr Sean Williams | [email protected]
 “Surely God loves the 0.06 nearly as much as 0.05.” (Rosnow and
Rosenthal, 1989)
5
CONTACT
Dr Sean Williams | [email protected]
Magnitude-based inferences
Limitations of NHST
6
CONTACT
Dr Sean Williams | [email protected]
Magnitude-based inferences
Limitations of NHST
Magnitude-based inferences
Dance of the p-values
*** <0.001
** <0.01
* <0.05
? 0.05 – 0.10
>0.10
7
Very highly significant!!!
Highly significant!!
Significant (phew)
“Approaching significance”
Non-significant
CONTACT
Dr Sean Williams | [email protected]
https://www.youtube.com/watch?v=ez4DgdurRPg
Magnitude-based inferences
Confidence intervals
• A range within which we infer the true, population or large
sample value is likely to fall.
probability
Area = 0.95
lower likely limit
probability distribution
of true value, given
the observed value
observed value
upper likely limit
negative 0 positive
value of effect statistic
•
Representation of the limits
as a confidence interval:
likely range
of true value
negative 0 positive
value of effect statistic
8
CONTACT
Dr Sean Williams | [email protected]
• Likely is usually
a probability of 0.95
(for 95% limits).
Magnitude-based inferences
Confidence intervals (CI)
To calculate a confidence interval you need:
• Confidence level (usually 95%, but can use 90% or 99%)
• Statistic (e.g. group mean)
• Margin of error (Critical value x Standard error of the statistic)
z-score or t-score
𝑆𝐸 =
𝑆𝐷
𝑛
EXAMPLE:
You test the vertical jump heights of 100 athletes. The mean and
standard deviation of the sample was 60±20 cm. The 95% CIs for this
mean would be:
Lower CI = 60 – 1.96 x 20/√100
Upper CI = 60 + 1.96 x 20/√100
Lower CI = 60 – 3.92
Upper CI = 60 + 3.92
95% CI = 56 to 64 cm
9
CONTACT
Dr Sean Williams | [email protected]
CI = [M – MOE, M + MOE]
CI = [M – Critical value x SE, M + Critical value x SE]
Magnitude-based inferences
Confidence intervals (CI)
Confidence intervals also convey the precision of an estimate
• Wider confidence interval = less precision
In the previous example, a smaller sample size (e.g. n=20) would
have given less precision for our estimate…
Lower CI = 60 – 1.96 x 20/√20
Lower CI = 60 – 8.77
Upper CI = 60 + 1.96 x 20/√20
Upper CI = 60 + 8.77
n = 20
n = 100
40
10
45
50
55
60
65
Vertical Jump Height [cm]
70
75
80
CONTACT
Dr Sean Williams | [email protected]
95% CI = 51 to 69 cm
Magnitude-based inferences
• P-value = The probability of obtaining the observed
result, or more extreme results, if the null hypothesis
is true.
• ‘NHST’ has several limitations, namely, it leads to
dichotomised thinking and does not tell us if the effect
is important/worthwhile.
• Confidence intervals tell us the likely range of the true
(population) value.
• It could be red!
• Confidence intervals also convey precision of our
estimate
•
11
Larger sample size and/or more consistent response = Smaller
confidence interval & more precision.
CONTACT
Dr Sean Williams | [email protected]
Recap
Magnitude-based inferences
Magnitude-based inferences
• For magnitude-based inferences, we interpret confidence
limits in relation to the smallest clinically beneficial and
harmful effects.
• These are usually equal and opposite in sign.
•
Harm is the opposite of benefit, not side effects.
harmful
smallest
clinically
harmful
effect
trivial
beneficial
smallest
clinically
beneficial
effect
negative 0 positive
value of effect statistic
12
All you need is these two things: the confidence interval and a sense
of what is important (e.g., beneficial and harmful).
CONTACT
Dr Sean Williams | [email protected]
• They define regions of beneficial, trivial, and harmful values:
Magnitude-based inferences
Magnitude-based inferences
Put the confidence interval and these regions together to make
a decision about clinically significant, clear or decisive effects.
negative
13
0
positive
value of effect statistic
MBI
Use it.
Yes
Use it.
Yes
Use it.
No
Depends
No
Don’t use it.
Yes
Don’t use it.
No
Don’t use it.
No
Don’t use it.
Yes
Don’t use it.
Yes
Unclear: need more No
research. Why hypothesis testing can
be unethical and impractical!
CONTACT
Dr Sean Williams | [email protected]
harmful trivial beneficial
Statistically
significant?
Magnitude-based inferences
Magnitude-based inferences
We calculate probabilities that the true effect could be
clinically beneficial, trivial, or harmful (Pbeneficial, Ptrivial, Pharmful).
probability
Ptrivial
smallest
= 0.15
harmful
Pharmful
value
= 0.05
probability
smallest
Pbeneficial
distribution
beneficial
of
true value
= 0.80
value
observed
value
negative 0 positive
value of effect statistic
• The Ps allow a more detailed call on magnitude, as follows…
14
CONTACT
Dr Sean Williams | [email protected]
Spreadsheets available at:
sportsci.org
Magnitude-based inferences
Magnitude-based inferences
Making a more detailed call on magnitudes using chances of benefit and
harm.
0/0/100
0/7/93
2/33/65
1/59/40
0/97/3
2/94/4
28/70/2
74/26/0
97/3/0
9/60/31
negative 0 positive
value of effect statistic
15
Most likely beneficial
Likely beneficial
Possibly beneficial
Mechanistic:
Clinical: unclear Possibly trivial
Very likely trivial
Likely trivial
Possibly harmful
Possibly harmful
Very likely harmful
Unclear
Risk of harm >0.5% is unacceptable,
unless chance of benefit is high enough.
CONTACT
Dr Sean Williams | [email protected]
Chances (%) that the effect is
harmful trivial beneficial harmful / trivial / beneficial
Magnitude-based inferences
Magnitude-based inferences
Probability Chances
<0.005
<0.5%
Odds
<1:199
0.005–0.05 0.5–5% 1:999–1:19
0.05–0.25 5–25% 1:19–1:3
0.25–0.75 25–75% 1:3–3:1
0.75–0.95 75–95%
3:1–19:1
0.95–0.995 95–99.5% 19:1–199:1
>0.995
>99.5%
>199:1
The effect… beneficial/trivial/harmful
is almost certainly not…
is very unlikely to be…
is unlikely to be…, is probably not…
is possibly (not)…, may (not) be…
is likely to be…, is probably…
is very likely to be…
is almost certainly…
• An effect should be almost certainly not harmful (<0.5%) and at least
possibly beneficial (>25%) before you decide to use it.
•
16
But you can tolerate higher chances of harm if chances of benefit are much
higher: e.g., 3% harm and 76% benefit = clearly useful.
•
Use an odds ratio of benefit/harm of >66 in such situations.
CONTACT
Dr Sean Williams | [email protected]
Use this table for the plain-language version of chances:
Magnitude-based inferences
Two examples of use of the spreadsheet for clinical chances:
0.20
1.5
90
18
0.4
2.6
1
-1
2.4
90
18
-0.7
5.5
1
-1
Both these
effects are
clinically
decisive,
clear, or
significant.
17
Chances (% or odds) that the true value of the statistic is
clinically positive
prob (%)
78
odds
3:1
likely, probable
78
3:1
likely, probable
clinically trivial
prob (%)
22
odds
1:3
unlikely, probably not
19
1:4
unlikely, probably not
clinically negative
prob (%)
0
odds
1:2071
almost certainly not
3
1:30
very unlikely
CONTACT
Dr Sean Williams | [email protected]
P value
0.03
threshold values
value of Conf. deg. of Confidence limits for clinical chances
upper
positive negative
statistic level (%) freedom lower
Magnitude-based inferences
How to Publish Clinical Chances
Example of a table from a randomized controlled trial:
Compared groups
Slow - control
Explosive - control
Slow - explosive
Mean improvement
(%) and 90%
confidence limits
3.1; ±1.6
2.6; ±1.2
0.5; ±1.4
Qualitative outcomea
Almost certainly beneficial
Very likely beneficial
Unclear
a with reference to a smallest worthwhile change of 0.5%.
18
CONTACT
Dr Sean Williams | [email protected]
TABLE 1–Differences in improvements in kayaking sprint speed
between slow, explosive and control training groups.
Magnitude-based inferences
Recap
• Confidence intervals also convey precision of our
estimate
• Larger sample size and/or more consistent response =
Smaller confidence interval & more precision.
19
CONTACT
Dr Sean Williams | [email protected]
• P-value = The probability of obtaining the observed
result, or more extreme results, if the null hypothesis
is true.
• NHST has several limitations, namely, it does not tell us if
the effect is important/worthwhile.
• Confidence intervals tell us the likely range of the true
(population) value.
• It could be red!
Magnitude-based inferences
• For magnitude-based inferences, we interpret
confidence limits in relation to the smallest clinically
beneficial and harmful effects.
• Spreadsheets at sportsci.org provide the % likelihood
that an effect is harmful | trivial | beneficial.
• Effects that cross thresholds for benefit and harm are
classed as unclear.
• An effect should be almost certainly not harmful
(<0.5%) and at least possibly beneficial (>25%) before
you decide to use it.
• But you can tolerate higher chances of harm if chances of
benefit are much higher: e.g., 3% harm and 76% benefit =
clearly useful.
• Use an odds ratio of benefit/harm of >66 in such situations.
20
CONTACT
Dr Sean Williams | [email protected]
Recap
Magnitude-based inferences
Smallest worthwhile difference?
• Problem: what's the smallest clinically important effect?
 0.3 of a CV gives a top athlete one extra medal every 10
races.
 This is the smallest important change in performance to
aim for in research on, or intended for, elite athletes.
 0.9, 1.6, 2.5, 4.0 of a CV gives an extra 3, 5, 7, 9 medals
per 10 races (thresholds for moderate, large, very large,
extremely large effecs).
 References: Hopkins et al. MSSE 31, 472-485, 1999 and MSSE 41, 312, 2009.
21
CONTACT
Dr Sean Williams | [email protected]
• “If you can't answer this question, quit the field”.
• This problem applies also with hypothesis testing, because it
determines sample size you need to test the null properly.
Magnitude-based inferences
•The default for most other populations and effects is
Cohen's set of smallest values.
• You express the difference or change in the mean as a
fraction of the between-subject standard deviation
(mean/SD).
• It's like a z score or a t statistic.
• In a controlled trial, it's the SD of all subjects in the
pre-test, not the SD of the change scores.
• The smallest worthwhile difference or change is 0.20.
• 0.20 is equivalent to moving from the 50th to the 58th
percentile.
22
CONTACT
Dr Sean Williams | [email protected]
Smallest worthwhile difference?
Magnitude-based inferences
Example: The effect of a treatment on strength
Trivial effect (0.1x SD) Very large effect (3x SD)
post
pre
strength
 Interpretation of
standardised
difference or
change in means:
23
trivial
small
moderate
large
very large
extremely large
strength
Cohen
<0.2
0.2-0.5
0.5-0.8
>0.8
?
?
Hopkins
<0.2
0.2-0.6
0.6-1.2
1.2-2.0
2.0-4.0
>4.0
CONTACT
Dr Sean Williams | [email protected]
post
pre
Magnitude-based inferences
Smallest worthwhile difference?
Relationship of standardised effect to difference or change in percentile:
Standardised effect
= 0.20
area
athlete
= 50%
on 50th
percentile
area
= 58%
athlete
on 58th
percentile
strength
Percentile
change
50  58
80  85
95  97
50  60
50  84
50  98
strength
CONTACT
Dr Sean Williams | [email protected]
24
Standardised
effect
0.20
0.20
0.20
0.25
1.00
2.00
Magnitude-based inferences
Trivial
Small
Moderate
Large
Very large
Nearly
perfect
Perfect
Correlation
0.0
0.1
0.3
0.5
0.7
0.9
1
Diff. in means
0.0
0.2
0.6
1.2
2.0
4.0
Infinite
Freq. diff
0
10
30
50
70
90
100
Rel. risk
1.0
1.2
1.9
3.0
5.7
19
Infinite
Odds ratio
1.0
1.5
3.5
9.0
32
360
infinite
25
CONTACT
Dr Sean Williams | [email protected]
Smallest worthwhile difference?
Magnitude-based inferences
Limitations of magnitude-based inferences
• Problem: these new approaches are not yet mainstream.
• Confidence limits at least are coming in, so look for and interpret
the importance of the lower and upper limits.
• You can use a spreadsheet to convert a published P value into a
more meaningful magnitude-based inference.
• More difficult to present and discuss results?
• ‘Magnitude-based inferences under attack’
• http://sportsci.org/2014/inbrief.htm#MBI
26
CONTACT
Dr Sean Williams | [email protected]
• If the authors state “P<0.05” you can’t do it properly.
• If they state “P>0.05” or “NS”, you can’t do it at all.
Magnitude-based inferences
• MBI’s are an alternative to traditional NHST.
• For magnitude-based inferences, we interpret confidence
limits in relation to the smallest clinically beneficial and
harmful effects.
• Smallest worthwhile effects may be based on variability
of performance (e.g. 0.3 of CV).
• Or standardised effects may be used (e.g. Cohen’s D).
• Spreadsheets available at sportsci.org to carry out
MBI’s.
• Growing in popularity, but still not understood/accepted
by many journals and academics.
• Confidence intervals convey far more information than
a P-value alone, and should be presented where possible.
27
CONTACT
Dr Sean Williams | [email protected]
Summary
Magnitude-based inferences
Recommended Reading
• Simulation software:
http://www.latrobe.edu.au/psy/research/cognitive-anddevelopmental-psychology/esci
• Hopkins, W., Marshall, S., Batterham, A. & Hanin, J. (2009)
Progressive statistics for studies in sports medicine and
exercise science. Medicine and Science in Sports and
Exercise, 41, 3-12.
• http://sportsci.org/
28
CONTACT
Dr Sean Williams | [email protected]
• Batterham, A. M. & Hopkins, W. G. (2006) Making
meaningful inferences about magnitudes. International
Journal of Sports Physiology and Performance, 1, 50-57.