Lecture 5 Using Baselines

Download Report

Transcript Lecture 5 Using Baselines

On the interpretation of responder
analyses and NNTs
Stephen Senn
IMMPACT
(c) Stephen Senn 2011
1
An apology
• I will talk mainly about responder analysis
• I have little to say about numbers needed
to treat
IMMPACT
(c) Stephen Senn 2011
2
Genes, Means and Screens
It will soon be possible for patients in clinical trials to undergo genetic tests
to identify those individuals who will respond favourably to the drug
candidate, based on their genotype…. This will translate into smaller, more
effective clinical trials with corresponding cost savings and ultimately better
treatment in general practice. … individual patients will be targeted with
specific treatment and personalised dosing regimens to maximise efficacy
and minimise pharmacokinetic problems and other side-effects.
Sir Richard Sykes, FRS, 1997
IMMPACT
(c) Stephen Senn 2011
3
Soon?
IMMPACT
(c) Stephen Senn 2011
4
Articles on pharmacogenetics by publication year
1750
1500
1250
1000
750
500
Source: Web of Science 9 June 2011
250
0
1960
1970
1980
1990
2000
2010
Year
IMMPACT
(c) Stephen Senn 2011
5
Articles on pharmacogenetics by publication year
17500
15000
12500
10000
7500
5000
Source: Web of Science 9 June 2011
2500
0
1960
1970
1980
1990
2000
2010
Year
IMMPACT
(c) Stephen Senn 2011
6
The Pharmacogenomic
Revolution?
• Clinical trials
– Cleaner signal
– Non-responders eliminated
• Treatment strategies
– “Theranostics”
• Markets
– Lower volume
– Higher price per patient day
IMMPACT
(c) Stephen Senn 2011
7
Implicit Assumptions
• Most variability seen in clinical trials is genetic
– Furthermore it is not revealed in obvious phenotypes
• Example: height and forced expiratory volume (FEV1) in one second
• Height predicts FEV1 and height is partly genetically determined but
you don’t need pharmacogenetics to measure height
• We are going to be able to find it
– Small number of genes responsible
– Low (or no) interactive effects (genes act singly)
– We will know where to look
• In fact we simply don’t know if most variation in clinical
trials is due to individual response let alone genetic
variability
IMMPACT
(c) Stephen Senn 2011
8
My Opinion
• Most of the hype is due to a failure to
understand response
• Responder analysis is to blame
• And related to this is an obsession with
Numbers Needed to Treat
– Which is increasing the pressure to use
dichotomies
IMMPACT
(c) Stephen Senn 2011
9
A Thought Experiment
• Imagine a cross-over trial in hypertension
• Patients randomised to receive ACE II
inhibitor or placebo in random order
• Then we do it again
• Each patient does the cross-over twice
• We can compare each patient’s response
under ACE II to placebo twice
IMMPACT
(c) Stephen Senn 2011
10
Second cross-over
Difference to placebo in DBP mmHg
10
10
5
5
0
0
-5
-5
-10
-10
-15
-15
-20
-20
-25
-25
-30
-30
-30
-25
-20
-15
-10
-5
0
5
....
..
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
.
..
...
..
.
..
..
...
.
..
..
...
.
..
..
...
.
..
..
...
.
..
..
...
.
....
....
.
..
....
...
...
.
10
First cross-over
..................................................
..........
.............
..............
.........
.
.
.
.
.
.
.
.
...............
............
.............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...................................................................
.
.
.
.....................................................
-30
IMMPACT
-25
-20
-15
-10
-5
0
5
(c) Stephen Senn 2011
10
11
795
 0.95
832
46
 0.27
168
IMMPACT
(c) Stephen Senn 2011
NB These are conditional
probabilities of response on the
second occasion. They are not
conditional probabilities of
being a ‘true’ responder.
12
Second cross-over
Difference to placebo in DBP mmHg
10
10
5
5
0
0
-5
-5
-10
-10
-15
-15
-20
-20
-25
-25
-30
-30
-30
-25
-20
-15
-10
-5
0
5
..
..
...
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
..
..
...
..
.
..
..
...
.
..
..
...
..
...
...
..
...
....
.
...
....
..
....
...
...
...
.
....
.
10
First cross-over
.......................................................
..............
.......
.........
.........
.
.
.
.
.
.
.
..................
..
.................
............
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.................................................
.
.
.
.
.
.
........................................
-30
IMMPACT
-25
-20
-15
-10
-5
0
(c) Stephen Senn 2011
5
10
13
678
 0.82
826
140
 0.81
174
IMMPACT
(c) Stephen Senn 2011
14
?
.............................................
..............
.
......
.
.
.
.
.
.
.
.
........
...
.
.
.
.
.
...............
.
.
.
.
.
.
.
.
.
.
..................
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.....................
.
.
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....................................................
-30
IMMPACT
-25
-20
-15
-10
(c) Stephen Senn 2011
-5
0
5
10
15
NOTE FOR GUIDANCE ON
CLINICAL INVESTIGATION OF MEDICINAL PRODUCTS
IN THE TREATMENT OF HYPERTENSION
1998
P2
Arbitrarily, response criteria for antihypertensive therapy include the
percentage of patients with a normalisation of blood pressure (reduction
SBP < 140 mmHg and DBP < 90 mmHg) and/or reduction of SBP ≥ 20
mmHg and/or DBP ≥ 10 mmHg. Results obtained should be discussed in
terms of statistical significance and in relation to their clinical relevance.
The first word in this paragraph is the most important
IMMPACT
(c) Stephen Senn 2011
16
Dichotomania
• Continuous measurements taken and referred to
baseline
• Patients dichotomised as responder/nonresponder
– Inefficient
– Arbitrary
• Sheep versus goats
– Ignores geep and shoats
• Analysis on risk difference scale to calculate NNT
(C) Stephen Senn 2006
17
Fig ure 1 Ill u strati on of resp on se reg i on
Outcome DBP
100
Region( X )
90
80
90
95
100
105
110
X
Baseline DBP
Y = outcome, X = baseline
If (Y < 90  X > 95)  (Y < 0.9X) patient ‘responds’
(C) Stephen Senn 2006
18
Why I mistrust the NNT
• It has very poor properties as a scale
– Reciprocal of risk difference
• It is theoretically unlikely to be stable from study
to study
– And this theoretical instability has been demonstrated
by empirical research
• It is an impatient measure
– It tries to shortcut the steps from study to practice
• It is an illusion that this can be done
• Those who advocate it are preferring an easy lie
to a difficult truth
Pharmacogenetics: A cutting-edge
science that will start delivering miracle
cures the year after next.
IMMPACT
(c) Stephen Senn 2011
20
Moerman and Placebos
• Paper of 1984
• Investigated 31 placebo-controlled trials of
cimetidine in ulcer
• Found considerable variation in response
• Considered placebo response rate was an
important factor
• Has been cited by others as proof of
variation in treatment effect from trial to
trial
IMMPACT
(c) Stephen Senn 2011
21
IMMPACT
(c) Stephen Senn 2011
22
Lessons from Moerman
•
There is no evidence of variation in the treatment
effect from trial to trial
We should be wary about concluding that apparent
variation signals true variation
We need to be cautious and think carefully about
analysis
Of course…it is always possible that there was
exactly the same genetic mix in each trial
•
•
•
–
•
in which case gene by treatment would not manifest itself
as trial by treatment interaction
We need to understand components of variation
IMMPACT
(c) Stephen Senn 2011
23
Pharmacogenomics:
A subject with great promise.
IMMPACT
(c) Stephen Senn 2011
24
What you learn in your first
ANOVA course
• Completely randomised design
– One way ANOVA
• Randomised blocks design
– Two way ANOVA
• Randomised blocks design with replication
– Two way ANOVA with interaction
• No replication, no interaction
IMMPACT
(c) Stephen Senn 2011
25
1.
Senn SJ. Individual Therapy: New Dawn or False Dawn. Drug
Information Journal 2001;35(4):1479-1494.
IMMPACT
(c) Stephen Senn 2011
26
IMMPACT
(c) Stephen Senn 2011
27
A Word of Caution
• What is additive on one scale is not additive on
another
• The Moerman example suggests a constant
effect on the log-odds ratio scale
• If the background risk varies this translates into
a varying effect on the risk-difference scale
• The biological interpretation of this is then moot
• However the practical implication of this is
summarise on the additive scale
IMMPACT
(c) Stephen Senn 2011
28
IMMPACT
(c) Stephen Senn 2011
29
The Mottos
• Additive at the point of study
• Relevant at the point of application
• If NNTs have their place it is in decision making for
individual patients
• Not in reporting results from individual trials
• The additive scale has to be transformed into the
relevant scale at the point of treatment
• The fact that NNTs might be relevant when making an
individual decision is not an excuse for summarising
results this way
IMMPACT
(c) Stephen Senn 2011
30
Tiotropium v Placebo
in Chronic Obstructive Pulmonary Disease
From the UPLIFT Study, NEJM, 2008
Significant differences in favor of tiotropium were observed at all time points for
the mean absolute change in the SGRQ total score (ranging from 2.3
to 3.3 units, P<0.001), although the differences on average were below what is
considered to have clinical significance (Fig. 2D). The overall mean
between-group difference in the SGRQ total score at any time point was
2.7 (95% confidence interval [CI], 2.0 to 3.3) in favor of tiotropium
(P<0.001). A higher proportion of patients in the tiotropium group than in
the placebo group had an improvement of 4 units or more in the SGRQ
total scores from baseline at 1 year (49% vs. 41%), 2 years (48% vs. 39%), 3
years (46% vs. 37%), and 4 years (45% vs. 36%) (P<0.001 for all comparisons).
(My emphasis)
IMMPACT
(c) Stephen Senn 2011
31
Two Normal
distributions with the
same spread but the
Active treatment has a
mean 2.7 higher.
If this applies every
patient under active
can be matched to a
corresponding patient
under placebo who is
2.7 worse off
IMMPACT
(c) Stephen Senn 2011
32
A cumulative plot
corresponding to
the previous
diagram.
If 4 is the threshold,
placebo response
probability is 0.36,
active response
probability is 0.45.
IMMPACT
(c) Stephen Senn 2011
33
In summary…this is rather silly
• If there is sufficient measurement error
even if the true improvement is identically
2.7, some will show an ‘improvement’ of 4
• The conclusion that there is a higher
proportion of true responders by the
standard of 4 points under treatment than
under placebo is quite unwarranted
• So what is the point of analysing
‘responders’?
IMMPACT
(c) Stephen Senn 2011
34
Who are the authors?
1.
Tashkin, DP, Celli, B, Senn, S, Burkhart, D, Kesten, S, Menjoge, S,
Decramer, M. A 4-Year Trial of Tiotropium in Chronic Obstructive Pulmonary
Disease, N Engl J Med 2008.
Personal note. I am proud to have been involved in this important study and
have nothing but respect for my collaborators. The fact that, despite the fact
that two of us are statisticians, we have ended up publishing something like
this shows how deeply ingrained the practice of responder analysis is in
medical research. We must do something to change this.
IMMPACT
(c) Stephen Senn 2011
35
In conclusion
• Responder analysis is the source of much confusion
• It is leading trialists to overestimate the individual
element of response to treatment
• The key to understanding response is replication and
careful analysis
• Stupid dichotomies do not help this understanding
• NNTs may be relevant at the point of application but they
are not relevant at the point of study
• Personalised medicine may be about to happen ‘soon’
for quite a few years to come yet
IMMPACT
(c) Stephen Senn 2011
36