Transcript 10 mistakes
Ecole Nationale Vétérinaire de Toulouse Statistics : the ten main mistakes July 2005 Didier Concordet
Statistical mistakes are frequent
• Many surveys of statistical errors in the medical literature with error rates ranging from 30%-90% (Altman, 1991; Gore et. al.,1976; Pocock et. al., 1987 and MacArthur, 1984) • Reviews of the biomedical literature have consistently found that about half the articles use incorrect statistical methods (Glantz, 1980) 2
When do they occur ?
• When designing the experiment • When collecting data • When analysing data • When interpreting results 3
Design
•
Lack of a proper randomisation
the inference space is not defined poor balance of the groups to be compared lack of control group
(maybe les frequent now)
there exist confounding factors •
Lack of power
the sample size is not large enough to answer the question the statistical unit is not well defined 4
Inference space definition (M1)
An experiment in 2 years old beagles showed that the temperature of dogs treated with the antipyretic drug A decreased by 2 °C.
Does this result still hold for all 2 years old beagles 3 years olds beagles beagles dogs man 5
Poor balance (M2)
Clinical trial comparison of 2 antipyretics rectal temperature after treatment
New TRT REFERENCE X = 39 N = 100 SD = 1 X = 37 N = 100 SD = 1 Reference < New TRT ( P<0.001) 6
Poor balance
Clinical trial 1 New TRT
Clinical trial comparison of 2 antipyretics rectal temperature after treatment
REFERENCE Clinical trial 2 REFERENCE New TRT X = 40 N = 90 SD = 1 X = 42 N = 50 SD = 1 X = 30 N = 10 SD = 1 X = 32 N = 50 SD = 1 New TRT< Ref P<0.001
New TRT < Ref P<0.001
Conclusion : Reference > New TRT 7
Power (M3)
A clinical study to compare efficacy of two treatments (Ref. and Test)
For the efficacy variable Expected difference between the treatments = 4 SD
2.
A parallel two groups design is planned with 5 dogs in each groups
What to think about this study ?
35 % of power for a type I risk of 5% Even if the expected difference exists, only 35% of the samples (of size 5)of dogs actually exhibits it !
8
Power
Efficacy variable on two groups of dogs N Mean SD Ref 5 15.4
2.4
Test 5 20.0
2.6
Student t-test :P = 0.18
Actually no conclusion 9
A real story
A study was performed in order to study the effect of diet on several biochemical compounds (about 20).
To this end, a dog was fed with a "normal" diet during 3 months and then with the new diet during 3 months.
Every two days, a blood sample was taken and the biochemical compounds were dosed.
At the end of the experiment 90 data were available for each biochemical compound.
There was a significant difference between the effects of the two diets for 10 biochemical compounds (P<0.001).
This result was obtained with a sample size of 90 10
Statistical unit (M4)
The
statistical unit
(an individual) is a statistical object that cannot be divided.
We want to generalise results obtained on a finite collection of
units
(a
sample
) to a
population
of units.
Despite the appearance of "wealth", the sample size was equal to 1 not 90.
At the end of the experiment, the only dog of the experiment was well known but what about the other dogs of the population ?
11
Experiment
• Missing data not adequately reported • Extreme values excluded • Data ignored because they did not support the hypothesis ?
12
Analysis
• Failure to check assumptions of the statistical methods
(M5)
homoscedasticity (for a t-test, a linear regression,…) using a linear regression without first establishing linearity… correlation • Ignoring informative "missing" data death and its consequences data below LOQ • Choosing the question to get an answer • Multiple comparisons 13
Homoscedasticity (M5)
What the t-test can see t-test P-value = 0.56
After log-transf P-value = 0.026
Treatment 1 2
14
Linearity/Correlation (M5)
Linear regression Correlation R = -0.93
Linear regression Correlation R = -0.002
15
Linearity/Correlation
Linear regression Correlation R = 0.84
A linear model with 3 groups Within group Correlation R = -0.92
16
Ignoring data (M6)
2 2 3 3 4 4 5 5 6 6 17
Ignoring data
18
Choosing the question to get an answer (M7)
Occurs frequently in the presentation of clinical trials results The question becomes random : it changes with the sample of animals. The question is chosen with its answer in hands…
Think about a flip coin game where you win 1€ when tail or head occurs. You choose the decision rule once you know the result of the flip !
Such an approach increases the number of false discoveries.
19
Multiple comparisons (M8)
One wants to compare the ADG obtained with 5 different diets in pig Mean SD 1 700 48 2 880 50 3 730 55 4 790 44 5 930 60 Ten T-tests 1 3 4 2 5 A risk of 5% for each comparison : the global risk can be very large 20
Interpretation/presentation
• Standard error and standard deviation • P values : non significant effects • False causality 21
Standard error / standard deviation (M9)
The clairance of the drug was equal to 68 ± 5 mL/mn Two possible meanings depending on the meaning of 5
If
5 is the standard error of the mean (se)
there is 95 % chance that the population mean clearance belongs to [68 - 2 5 ; 68 + 2 5 ] If
5 is the standard deviation (SD)
95 % of animals have their clearance within [68 - 2 5 ; 68 + 2 5 ] 22
P values (M10)
The difference between the effect of the drugs A and B is not significant (P = 0.56) therefore drug A can be substituted by drug B.
NO
The only conclusion that can be drawn from such a P value is that you didn't see any difference between the effect of the drugs A and B. That does not mean that such a difference does not exist.
Absence of evidence is not evidence of absence
23
P values (M10)
The drug A has a higher efficacy than the drug B (P = 0.001) The drug C has a higher efficacy than the drug B (P = 0.04) The only conclusion that can be drawn from such a P value is that you are sure than A>B and less sure than C>B.
This does not presume anything about the amplitude of the differences.
Significant does not mean important
24
False causality : lying with statistics
There is a strong positive correlation between the number of firefighters present at a fire and the amount of fire damage.
Thus, the firefighters present at fire create higher fire damage !
The correlation coefficient is nothing else than a measure of the strength of a linear relationship between 2 variables.
Correlation cannot establish causality
.
A strong correlation between X and Y can occurs when "X" causes "Y" "Y" causes "X" "Z" causes "X" and "Y" (Z = fire size in the previous example) Incidentally with small samples size when X and Y are independent 25
How to avoid these mistakes ?
• Consult your prefered statistician for help in the design of complicated experiments • Use basic descriptive statistics first (graphics, summary statistics,…) • Use common sense • Consider to learn more statistics 26