Transcript 10 mistakes

Ecole Nationale Vétérinaire de Toulouse Statistics : the ten main mistakes July 2005 Didier Concordet

[email protected]

Statistical mistakes are frequent

• Many surveys of statistical errors in the medical literature with error rates ranging from 30%-90% (Altman, 1991; Gore et. al.,1976; Pocock et. al., 1987 and MacArthur, 1984) • Reviews of the biomedical literature have consistently found that about half the articles use incorrect statistical methods (Glantz, 1980) 2

When do they occur ?

• When designing the experiment • When collecting data • When analysing data • When interpreting results 3

Design

Lack of a proper randomisation

the inference space is not defined poor balance of the groups to be compared lack of control group

(maybe les frequent now)

there exist confounding factors •

Lack of power

the sample size is not large enough to answer the question the statistical unit is not well defined 4

Inference space definition (M1)

An experiment in 2 years old beagles showed that the temperature of dogs treated with the antipyretic drug A decreased by 2 °C.

Does this result still hold for all 2 years old beagles 3 years olds beagles beagles dogs man 5

Poor balance (M2)

Clinical trial comparison of 2 antipyretics rectal temperature after treatment

New TRT REFERENCE X = 39 N = 100 SD = 1 X = 37 N = 100 SD = 1 Reference < New TRT ( P<0.001) 6

Poor balance

Clinical trial 1 New TRT

Clinical trial comparison of 2 antipyretics rectal temperature after treatment

REFERENCE Clinical trial 2 REFERENCE New TRT X = 40 N = 90 SD = 1 X = 42 N = 50 SD = 1 X = 30 N = 10 SD = 1 X = 32 N = 50 SD = 1 New TRT< Ref P<0.001

New TRT < Ref P<0.001

Conclusion : Reference > New TRT 7

Power (M3)

A clinical study to compare efficacy of two treatments (Ref. and Test)

For the efficacy variable Expected difference between the treatments = 4 SD

2.

A parallel two groups design is planned with 5 dogs in each groups

What to think about this study ?

35 % of power for a type I risk of 5% Even if the expected difference exists, only 35% of the samples (of size 5)of dogs actually exhibits it !

8

Power

Efficacy variable on two groups of dogs N Mean SD Ref 5 15.4

2.4

Test 5 20.0

2.6

Student t-test :P = 0.18

Actually no conclusion 9

A real story

A study was performed in order to study the effect of diet on several biochemical compounds (about 20).

To this end, a dog was fed with a "normal" diet during 3 months and then with the new diet during 3 months.

Every two days, a blood sample was taken and the biochemical compounds were dosed.

At the end of the experiment 90 data were available for each biochemical compound.

There was a significant difference between the effects of the two diets for 10 biochemical compounds (P<0.001).

This result was obtained with a sample size of 90 10

Statistical unit (M4)

The

statistical unit

(an individual) is a statistical object that cannot be divided.

We want to generalise results obtained on a finite collection of

units

(a

sample

) to a

population

of units.

Despite the appearance of "wealth", the sample size was equal to 1 not 90.

At the end of the experiment, the only dog of the experiment was well known but what about the other dogs of the population ?

11

Experiment

• Missing data not adequately reported • Extreme values excluded • Data ignored because they did not support the hypothesis ?

12

Analysis

• Failure to check assumptions of the statistical methods

(M5)

homoscedasticity (for a t-test, a linear regression,…) using a linear regression without first establishing linearity… correlation • Ignoring informative "missing" data death and its consequences data below LOQ • Choosing the question to get an answer • Multiple comparisons 13

Homoscedasticity (M5)

What the t-test can see t-test P-value = 0.56

After log-transf P-value = 0.026

Treatment 1 2

14

Linearity/Correlation (M5)

Linear regression Correlation R = -0.93

Linear regression Correlation R = -0.002

15

Linearity/Correlation

Linear regression Correlation R = 0.84

A linear model with 3 groups Within group Correlation R = -0.92

16

Ignoring data (M6)

2 2 3 3 4 4 5 5 6 6 17

Ignoring data

18

Choosing the question to get an answer (M7)

Occurs frequently in the presentation of clinical trials results The question becomes random : it changes with the sample of animals. The question is chosen with its answer in hands…

Think about a flip coin game where you win 1€ when tail or head occurs. You choose the decision rule once you know the result of the flip !

Such an approach increases the number of false discoveries.

19

Multiple comparisons (M8)

One wants to compare the ADG obtained with 5 different diets in pig Mean SD 1 700 48 2 880 50 3 730 55 4 790 44 5 930 60 Ten T-tests 1 3 4 2 5 A risk of 5% for each comparison : the global risk can be very large 20

Interpretation/presentation

• Standard error and standard deviation • P values : non significant effects • False causality 21

Standard error / standard deviation (M9)

The clairance of the drug was equal to 68 ± 5 mL/mn Two possible meanings depending on the meaning of 5

If

5 is the standard error of the mean (se)

there is 95 % chance that the population mean clearance belongs to [68 - 2  5 ; 68 + 2  5 ] If

5 is the standard deviation (SD)

95 % of animals have their clearance within [68 - 2  5 ; 68 + 2  5 ] 22

P values (M10)

The difference between the effect of the drugs A and B is not significant (P = 0.56) therefore drug A can be substituted by drug B.

NO

The only conclusion that can be drawn from such a P value is that you didn't see any difference between the effect of the drugs A and B. That does not mean that such a difference does not exist.

Absence of evidence is not evidence of absence

23

P values (M10)

The drug A has a higher efficacy than the drug B (P = 0.001) The drug C has a higher efficacy than the drug B (P = 0.04) The only conclusion that can be drawn from such a P value is that you are sure than A>B and less sure than C>B.

This does not presume anything about the amplitude of the differences.

Significant does not mean important

24

False causality : lying with statistics

There is a strong positive correlation between the number of firefighters present at a fire and the amount of fire damage.

Thus, the firefighters present at fire create higher fire damage !

The correlation coefficient is nothing else than a measure of the strength of a linear relationship between 2 variables.

Correlation cannot establish causality

.

A strong correlation between X and Y can occurs when "X" causes "Y" "Y" causes "X" "Z" causes "X" and "Y" (Z = fire size in the previous example) Incidentally with small samples size when X and Y are independent 25

How to avoid these mistakes ?

• Consult your prefered statistician for help in the design of complicated experiments • Use basic descriptive statistics first (graphics, summary statistics,…) • Use common sense • Consider to learn more statistics 26