DEPARTMENT FOR HEALTH Sport, Health and Exercise Science MAGNITUDE-BASED INFERENCES An alternative to hypothesis testing CONTACT Dr Sean Williams | [email protected] –Sean Williams.
Download ReportTranscript DEPARTMENT FOR HEALTH Sport, Health and Exercise Science MAGNITUDE-BASED INFERENCES An alternative to hypothesis testing CONTACT Dr Sean Williams | [email protected] –Sean Williams.
DEPARTMENT FOR HEALTH Sport, Health and Exercise Science MAGNITUDE-BASED INFERENCES An alternative to hypothesis testing CONTACT Dr Sean Williams | [email protected] –Sean Williams Magnitude-based inferences Lecture outline Limitations of null hypothesis significance testing (NHST) y Confidence intervals Smallest worthwhile effects Limitations of magnitude-based inferences CONTACT Dr Sean Williams | [email protected] 2 Magnitude-based inferences (MBI) Magnitude-based inferences Null hypothesis significance testing • • • A major aim of research is to make an inference about an effect in a population based on study of a sample. Null-hypothesis testing via the P-value and statistical significance is the traditional approach to making an inference. P-value is the probability of obtaining the observed result, or more extreme results, if the null hypothesis is true. Observed data point One-sided p-value 0 Set of possible results (under the null hypothesis) 3 CONTACT Dr Sean Williams | [email protected] Probability Most likely observation Magnitude-based inferences Limitations of NHST P-values are difficult to understand! P ≤ 0.05 is arbitrary Some useful effects aren’t statistically significant Some statistically significant effects aren’t useful P > 0.05 often interpreted as unpublishable So good data don’t get published Ignores ‘judgement’, leads to dichotomised thinking Statistical Hypothesis Inference Testing? 4 CONTACT Dr Sean Williams | [email protected] “Surely God loves the 0.06 nearly as much as 0.05.” (Rosnow and Rosenthal, 1989) 5 CONTACT Dr Sean Williams | [email protected] Magnitude-based inferences Limitations of NHST 6 CONTACT Dr Sean Williams | [email protected] Magnitude-based inferences Limitations of NHST Magnitude-based inferences Dance of the p-values *** <0.001 ** <0.01 * <0.05 ? 0.05 – 0.10 >0.10 7 Very highly significant!!! Highly significant!! Significant (phew) “Approaching significance” Non-significant CONTACT Dr Sean Williams | [email protected] https://www.youtube.com/watch?v=ez4DgdurRPg Magnitude-based inferences Confidence intervals • A range within which we infer the true, population or large sample value is likely to fall. probability Area = 0.95 lower likely limit probability distribution of true value, given the observed value observed value upper likely limit negative 0 positive value of effect statistic • Representation of the limits as a confidence interval: likely range of true value negative 0 positive value of effect statistic 8 CONTACT Dr Sean Williams | [email protected] • Likely is usually a probability of 0.95 (for 95% limits). Magnitude-based inferences Confidence intervals (CI) To calculate a confidence interval you need: • Confidence level (usually 95%, but can use 90% or 99%) • Statistic (e.g. group mean) • Margin of error (Critical value x Standard error of the statistic) z-score or t-score 𝑆𝐸 = 𝑆𝐷 𝑛 EXAMPLE: You test the vertical jump heights of 100 athletes. The mean and standard deviation of the sample was 60±20 cm. The 95% CIs for this mean would be: Lower CI = 60 – 1.96 x 20/√100 Upper CI = 60 + 1.96 x 20/√100 Lower CI = 60 – 3.92 Upper CI = 60 + 3.92 95% CI = 56 to 64 cm 9 CONTACT Dr Sean Williams | [email protected] CI = [M – MOE, M + MOE] CI = [M – Critical value x SE, M + Critical value x SE] Magnitude-based inferences Confidence intervals (CI) Confidence intervals also convey the precision of an estimate • Wider confidence interval = less precision In the previous example, a smaller sample size (e.g. n=20) would have given less precision for our estimate… Lower CI = 60 – 1.96 x 20/√20 Lower CI = 60 – 8.77 Upper CI = 60 + 1.96 x 20/√20 Upper CI = 60 + 8.77 n = 20 n = 100 40 10 45 50 55 60 65 Vertical Jump Height [cm] 70 75 80 CONTACT Dr Sean Williams | [email protected] 95% CI = 51 to 69 cm Magnitude-based inferences • P-value = The probability of obtaining the observed result, or more extreme results, if the null hypothesis is true. • ‘NHST’ has several limitations, namely, it leads to dichotomised thinking and does not tell us if the effect is important/worthwhile. • Confidence intervals tell us the likely range of the true (population) value. • It could be red! • Confidence intervals also convey precision of our estimate • 11 Larger sample size and/or more consistent response = Smaller confidence interval & more precision. CONTACT Dr Sean Williams | [email protected] Recap Magnitude-based inferences Magnitude-based inferences • For magnitude-based inferences, we interpret confidence limits in relation to the smallest clinically beneficial and harmful effects. • These are usually equal and opposite in sign. • Harm is the opposite of benefit, not side effects. harmful smallest clinically harmful effect trivial beneficial smallest clinically beneficial effect negative 0 positive value of effect statistic 12 All you need is these two things: the confidence interval and a sense of what is important (e.g., beneficial and harmful). CONTACT Dr Sean Williams | [email protected] • They define regions of beneficial, trivial, and harmful values: Magnitude-based inferences Magnitude-based inferences Put the confidence interval and these regions together to make a decision about clinically significant, clear or decisive effects. negative 13 0 positive value of effect statistic MBI Use it. Yes Use it. Yes Use it. No Depends No Don’t use it. Yes Don’t use it. No Don’t use it. No Don’t use it. Yes Don’t use it. Yes Unclear: need more No research. Why hypothesis testing can be unethical and impractical! CONTACT Dr Sean Williams | [email protected] harmful trivial beneficial Statistically significant? Magnitude-based inferences Magnitude-based inferences We calculate probabilities that the true effect could be clinically beneficial, trivial, or harmful (Pbeneficial, Ptrivial, Pharmful). probability Ptrivial smallest = 0.15 harmful Pharmful value = 0.05 probability smallest Pbeneficial distribution beneficial of true value = 0.80 value observed value negative 0 positive value of effect statistic • The Ps allow a more detailed call on magnitude, as follows… 14 CONTACT Dr Sean Williams | [email protected] Spreadsheets available at: sportsci.org Magnitude-based inferences Magnitude-based inferences Making a more detailed call on magnitudes using chances of benefit and harm. 0/0/100 0/7/93 2/33/65 1/59/40 0/97/3 2/94/4 28/70/2 74/26/0 97/3/0 9/60/31 negative 0 positive value of effect statistic 15 Most likely beneficial Likely beneficial Possibly beneficial Mechanistic: Clinical: unclear Possibly trivial Very likely trivial Likely trivial Possibly harmful Possibly harmful Very likely harmful Unclear Risk of harm >0.5% is unacceptable, unless chance of benefit is high enough. CONTACT Dr Sean Williams | [email protected] Chances (%) that the effect is harmful trivial beneficial harmful / trivial / beneficial Magnitude-based inferences Magnitude-based inferences Probability Chances <0.005 <0.5% Odds <1:199 0.005–0.05 0.5–5% 1:999–1:19 0.05–0.25 5–25% 1:19–1:3 0.25–0.75 25–75% 1:3–3:1 0.75–0.95 75–95% 3:1–19:1 0.95–0.995 95–99.5% 19:1–199:1 >0.995 >99.5% >199:1 The effect… beneficial/trivial/harmful is almost certainly not… is very unlikely to be… is unlikely to be…, is probably not… is possibly (not)…, may (not) be… is likely to be…, is probably… is very likely to be… is almost certainly… • An effect should be almost certainly not harmful (<0.5%) and at least possibly beneficial (>25%) before you decide to use it. • 16 But you can tolerate higher chances of harm if chances of benefit are much higher: e.g., 3% harm and 76% benefit = clearly useful. • Use an odds ratio of benefit/harm of >66 in such situations. CONTACT Dr Sean Williams | [email protected] Use this table for the plain-language version of chances: Magnitude-based inferences Two examples of use of the spreadsheet for clinical chances: 0.20 1.5 90 18 0.4 2.6 1 -1 2.4 90 18 -0.7 5.5 1 -1 Both these effects are clinically decisive, clear, or significant. 17 Chances (% or odds) that the true value of the statistic is clinically positive prob (%) 78 odds 3:1 likely, probable 78 3:1 likely, probable clinically trivial prob (%) 22 odds 1:3 unlikely, probably not 19 1:4 unlikely, probably not clinically negative prob (%) 0 odds 1:2071 almost certainly not 3 1:30 very unlikely CONTACT Dr Sean Williams | [email protected] P value 0.03 threshold values value of Conf. deg. of Confidence limits for clinical chances upper positive negative statistic level (%) freedom lower Magnitude-based inferences How to Publish Clinical Chances Example of a table from a randomized controlled trial: Compared groups Slow - control Explosive - control Slow - explosive Mean improvement (%) and 90% confidence limits 3.1; ±1.6 2.6; ±1.2 0.5; ±1.4 Qualitative outcomea Almost certainly beneficial Very likely beneficial Unclear a with reference to a smallest worthwhile change of 0.5%. 18 CONTACT Dr Sean Williams | [email protected] TABLE 1–Differences in improvements in kayaking sprint speed between slow, explosive and control training groups. Magnitude-based inferences Recap • Confidence intervals also convey precision of our estimate • Larger sample size and/or more consistent response = Smaller confidence interval & more precision. 19 CONTACT Dr Sean Williams | [email protected] • P-value = The probability of obtaining the observed result, or more extreme results, if the null hypothesis is true. • NHST has several limitations, namely, it does not tell us if the effect is important/worthwhile. • Confidence intervals tell us the likely range of the true (population) value. • It could be red! Magnitude-based inferences • For magnitude-based inferences, we interpret confidence limits in relation to the smallest clinically beneficial and harmful effects. • Spreadsheets at sportsci.org provide the % likelihood that an effect is harmful | trivial | beneficial. • Effects that cross thresholds for benefit and harm are classed as unclear. • An effect should be almost certainly not harmful (<0.5%) and at least possibly beneficial (>25%) before you decide to use it. • But you can tolerate higher chances of harm if chances of benefit are much higher: e.g., 3% harm and 76% benefit = clearly useful. • Use an odds ratio of benefit/harm of >66 in such situations. 20 CONTACT Dr Sean Williams | [email protected] Recap Magnitude-based inferences Smallest worthwhile difference? • Problem: what's the smallest clinically important effect? 0.3 of a CV gives a top athlete one extra medal every 10 races. This is the smallest important change in performance to aim for in research on, or intended for, elite athletes. 0.9, 1.6, 2.5, 4.0 of a CV gives an extra 3, 5, 7, 9 medals per 10 races (thresholds for moderate, large, very large, extremely large effecs). References: Hopkins et al. MSSE 31, 472-485, 1999 and MSSE 41, 312, 2009. 21 CONTACT Dr Sean Williams | [email protected] • “If you can't answer this question, quit the field”. • This problem applies also with hypothesis testing, because it determines sample size you need to test the null properly. Magnitude-based inferences •The default for most other populations and effects is Cohen's set of smallest values. • You express the difference or change in the mean as a fraction of the between-subject standard deviation (mean/SD). • It's like a z score or a t statistic. • In a controlled trial, it's the SD of all subjects in the pre-test, not the SD of the change scores. • The smallest worthwhile difference or change is 0.20. • 0.20 is equivalent to moving from the 50th to the 58th percentile. 22 CONTACT Dr Sean Williams | [email protected] Smallest worthwhile difference? Magnitude-based inferences Example: The effect of a treatment on strength Trivial effect (0.1x SD) Very large effect (3x SD) post pre strength Interpretation of standardised difference or change in means: 23 trivial small moderate large very large extremely large strength Cohen <0.2 0.2-0.5 0.5-0.8 >0.8 ? ? Hopkins <0.2 0.2-0.6 0.6-1.2 1.2-2.0 2.0-4.0 >4.0 CONTACT Dr Sean Williams | [email protected] post pre Magnitude-based inferences Smallest worthwhile difference? Relationship of standardised effect to difference or change in percentile: Standardised effect = 0.20 area athlete = 50% on 50th percentile area = 58% athlete on 58th percentile strength Percentile change 50 58 80 85 95 97 50 60 50 84 50 98 strength CONTACT Dr Sean Williams | [email protected] 24 Standardised effect 0.20 0.20 0.20 0.25 1.00 2.00 Magnitude-based inferences Trivial Small Moderate Large Very large Nearly perfect Perfect Correlation 0.0 0.1 0.3 0.5 0.7 0.9 1 Diff. in means 0.0 0.2 0.6 1.2 2.0 4.0 Infinite Freq. diff 0 10 30 50 70 90 100 Rel. risk 1.0 1.2 1.9 3.0 5.7 19 Infinite Odds ratio 1.0 1.5 3.5 9.0 32 360 infinite 25 CONTACT Dr Sean Williams | [email protected] Smallest worthwhile difference? Magnitude-based inferences Limitations of magnitude-based inferences • Problem: these new approaches are not yet mainstream. • Confidence limits at least are coming in, so look for and interpret the importance of the lower and upper limits. • You can use a spreadsheet to convert a published P value into a more meaningful magnitude-based inference. • More difficult to present and discuss results? • ‘Magnitude-based inferences under attack’ • http://sportsci.org/2014/inbrief.htm#MBI 26 CONTACT Dr Sean Williams | [email protected] • If the authors state “P<0.05” you can’t do it properly. • If they state “P>0.05” or “NS”, you can’t do it at all. Magnitude-based inferences • MBI’s are an alternative to traditional NHST. • For magnitude-based inferences, we interpret confidence limits in relation to the smallest clinically beneficial and harmful effects. • Smallest worthwhile effects may be based on variability of performance (e.g. 0.3 of CV). • Or standardised effects may be used (e.g. Cohen’s D). • Spreadsheets available at sportsci.org to carry out MBI’s. • Growing in popularity, but still not understood/accepted by many journals and academics. • Confidence intervals convey far more information than a P-value alone, and should be presented where possible. 27 CONTACT Dr Sean Williams | [email protected] Summary Magnitude-based inferences Recommended Reading • Simulation software: http://www.latrobe.edu.au/psy/research/cognitive-anddevelopmental-psychology/esci • Hopkins, W., Marshall, S., Batterham, A. & Hanin, J. (2009) Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise, 41, 3-12. • http://sportsci.org/ 28 CONTACT Dr Sean Williams | [email protected] • Batterham, A. M. & Hopkins, W. G. (2006) Making meaningful inferences about magnitudes. International Journal of Sports Physiology and Performance, 1, 50-57.