The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD 20850 *No official support.

Download Report

Transcript The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD 20850 *No official support.

The Application of Propensity Score
Analysis to Non-randomized Medical
Device Clinical Studies:
A Regulatory Perspective
Lilly Yue, Ph.D.*
CDRH, FDA, Rockville MD 20850
*No official support or endorsement by the Food and Drug
Administration of this presentation is intended or should be
inferred.
Outline
1. Randomized clinical trials
2. Non-randomized studies and a potential
problem
3. Propensity scores methods for bias reduction
4. Practical issues with the application of
propensity score methodology
5. Limitations of propensity score methods
6. Conclusions
2
Randomized Trials
• All patients have a specified chance of receiving
each treatment.
• Treatments are concurrent.
• Data collection is concurrent, uniform, and high
quality.
• Expect that all patient covariates, measured or
unmeasured, e.g., age, gender, duration of
disease, …, are balanced between the two
treatment groups.
3
Randomized Trials
• Assumptions underlying statistical comparison
tests are met.
• So, the two trt groups are comparable and
observed treatment difference is an unbiased
estimate of true treatment difference.
• But, the above advantages are not guaranteed
for small, poorly designed or poorly conducted
randomized trials.
4
Nonrandomized Studies and a Potential Problem
• None of advantages provided by randomized
trials is available in non-randomized studies.
• A potential problem:
Two treatment groups were not comparable
before the start of treatment.
i.e., not comparable due to imbalanced
covariates between two treatment groups.
• So, direct treatment comparisons are invalid.
5
Adjustments for Covariates
• Three common methods of adjusting for
confounding covariates:
– Matching
– Subclassification (stratification)
– Regression (Covariate) adjustment
6
• Question: When there are many confounding
covariates needed to adjust for, e.g., age, gender, …
– Matching based on many covariates is not practical.
– Subclassification is difficulty: As the number of
covariates increases, the number of subclasses grows
exponentially:
Each covariate: 2 categories 5 covariates: 32 subclasses
– Regression adjustment may not be possible:
Potential problem: over-fitting
7
Propensity Score Methodology
• Replace the collection of confounding
covariates with one scalar function of these
covariates: the propensity score.
Age
Gender
Duration
…….
1 composite covariate:
Propensity Score
Balancing score
8
Propensity Score Methodology (cont.)
• Propensity score (PS): conditional prob. of
receiving Trt A rather than Trt B, given a
collection of observed covariates.
• Purpose: simultaneously balance many
covariates in the two trt groups and thus reduce
the bias.
9
• Propensity scores construction
– Statistical modeling of relationship between
treatment membership and covariates
– Statistical methods: multiple logistic regression or
others
– Outcome: event -- actual trt membership: A or B
– Predictor variables: all measured covariates, some
interaction terms or squared terms, e.g.,
age, gender, duration of disease,…, age*duration,…
10
• Propensity scores construction
– Clinical outcome variable, e.g., major complication
event, is NOT involved in the modeling
– No concern of over-fitting
– Obtain a propensity score model: a math equation
PS = f (age, gender, …)
– Calculate estimated propensity scores for all
patients
11
• Properties of propensity scores
– A group of patients with the same propensity
score are equally likely to have been assigned to
trt A.
– Within a group of patients with the same
propensity score, e.g., 0.7, some patients actually
got trt A and some got trt B, just as they had
been randomly allocated to whichever trt they
actually received.
12
“Randomized After the Fact”
PS=0.7
Trt A
Trt B
13
– When the propensity scores are balanced across
two treatment groups, the distribution of all the
covariates are balanced in expectation across the
two groups.
– Use the propensity scores as a diagnostic tool to
measure treatment group comparability.
– If the two treatment groups overlap well enough
in terms of the propensity scores, we compare
the two treatment groups adjusting for the PS.
14
• Compare treatments adjusting for
propensity score
– Matching
– Subclassification (stratification)
– Regression (Covariate) adjustment
15
•Matching based on propensity scores (PS)
PS
Trt A
vs.
Trt B
PS1
PS2
PSm
• Compare treatments based on matched pairs
• Problem: may exclude unmatched patients
•Stratification
–
–
All patients are sorted by propensity scores.
Divide into equal-sized subclasses.
1
2
…….
5
PS
– Compare two trts within each subclass, as in a
randomized trial; then estimate overall trt effect as
weighted average.
–
It is intended to use all patients.
–
But, if trial size is small, some subclass may contain
patients from only one treatment group.
17
• Regression (covariate) adjustment
Treatment effect estimation model fitting:
the relationship of clinical outcome and treatment
Outcome: Clinical outcome, e.g., adverse events
Predictor variables: trt received, propensity score, a
subset of important covariates
Statistical method: e.g., regression or logistical
regression
18
Propensity Score Methods
• Summary
Fit propensity score (PS) model
using all measured covariates
Estimate PS for all patients
using PS model
Compare treatments
adjusting for propensity scores
19
Practical Issues
•
Issues in propensity score estimation
–
–
–
–
•
How to handle missing baseline covariate values
What terms of covariates should be included
Evaluation of treatment group comparability
Assessment of the resulting balance of the distributions
of covariates
Issues in treatment comparison:
–
•
Which method: matching, stratification, regression
Issues in study design with PS analysis
–
–
–
Pre-specified vs. post hoc PS analysis
Pre-specify the covariates needed to collect in the study
and then included in PS estimation
Sample size estimation adjusting for the propensity
scores
20
Example – Device A
• Non-concurrent, two-arm, multi-center study
• Control: Medical treatment without device,
N=65, hospital record collection
• Treatment: Device A, N = 130
• Primary effectiveness endpoint: Treatment success
• Hypothesis testing: superiority in success rate
• 20 imbalanced clinically important baseline
covariates, e.g., prior cardiac surgery
• 22% patients with missing baseline covariate values
21
Enrollment Time
Ctl
Trt
25
20
15
10
5
0
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
22
• Two treatment groups are not comparable
– Imbalance in multiple baseline covariates
– Imbalance in the time of enrollment
• So, any direct treatment comparisons on the effectiveness
endpoint are inappropriate.
• And, p-values from direct treatment comparisons are uninterpretable.
• What about treatment comparisons adjusting for the
imbalanced covariates?
– Traditional covariate analysis
– Propensity score analysis
23
• Performed propensity score (PS) analysis
• Handed missing values
– MI: generate multiple data sets for PS analysis
– Generate one data set: generalized PS analysis
– Others
• Included all statistically significant and/or clinically
important baseline covariates in PS modeling.
• Checked comparability of two trt groups through
estimated propensity score distributions.
• Found that the two trt groups did not overlap well.
24
Estimated Propensity Scores (with time)
Estimated Propensity Score
1.0
0.8
0.6
0.4
0.2
0.0
Ctl
Trt
25
Estimated Propensity Scores (w/o time)
Estimated Propensity Score
1.0
0.8
0.6
0.4
0.2
0.0
Ctl
Trt
26
Patients in Propensity Score Quintile
1
2
3
4
5 Total
Ctl
(w/time)
Trt
Ctl
(w/o time)
Trt
38
18
8
58% 28% 12%
1
0
2%
0%
1
21
31
38
39
1%
16%
24%
29%
30%
29
24
8
4
0
45%
37%
12%
6%
0%
10
14
32
35
39
8%
11%
24%
27%
30%
65
130
65
130
27
Treatment Success
Crl
Trt
1
2
3
4
5
Total
S
N
16
38
8
18
1
8
0
1
0
25
65
S
N
0
1
14
21
25
31
24
38
23
39
86
130
• Tried Cochran-Mantel-Haenszel test controlling for
PS quintile, Logistic regression using PS as a
continuous covariate
• However, the sig. p-values are un-interpretable
28
• Conclusion:
– The two treatment groups did not
overlap enough to allow a sensible
treatment comparison.
– So, any treatment comparisons adjusting
for imbalanced covariates are
problematic.
29
Example: Device B
• New vs. control in a non-randomized study
• Primary endpoint: MACE incidence rate at 6month after treatment
• Non-inferiority margin: 7%, in this study
• Sample size: new: 290, control: 560
• 14 covariates were considered.
30
Covariate balance checking before and after
propensity score stratification adjustment
Mean
New
Control
p-value
Before
After
--------------------------------------------------------------------------------------
Mi
0.25
0.40
Diab
0.28
0.21
0.0421
0.8608
CCS
2.41
2.75
0.0003
0.3096
Lesleng 11.02
12.16
Preref
3.00
3.08
Presten
62.75
66.81
<.0001
<.0001
0.0202
<.0001
0.4645
0.5008
0.2556
0.4053
31
Model Building
• The PS is conditional Prob. that a patient would
have been assigned to new device, based on his
or her baseline covariates.
• A hierarchical logistic regression model with a
stepwise selection process was used to build the
propensity score model.
• The final propensity score model includes all
covariates as well as a quadratic term.
32
Table 2. Distribution of patients at five strata
Subclass Control New Total
1
2
3
4
5
Total
142
127
122
119
50
560
28
43
48
51
120
290
170
170
170
170
170
850
33
Estimated Propensity Scores
N(new)=560, N(control)=290
Estimated Propensity Score
1.0
0.8
0.6
0.4
0.2
0.0
Control
New
34
Covariate balance checking before and after
propensity score stratification adjustment
Mean
New
Control
p-value
Before
After
--------------------------------------------------------------------------------------
Mi
0.25
0.40
Diab
0.28
0.21
0.0421
0.8608
CCS
2.41
2.75
0.0003
0.3096
Lesleng 11.02
12.16
Preref
3.00
3.08
Presten
62.75
66.81
<.0001
<.0001
0.0202
<.0001
0.4645
0.5008
0.2556
0.4053
35
•After adj. balance check:
Prior Mi rate:
• Overall:
Group
New
Control
Diff
% patients with prior Mi
25
40
15
• After:
Quintile
Group
1
2
3
4
5
New
70.4 32.6 25.0 17.6 15.0
Control 75.2 32.8 30.0 24.8 10.4
36
Percentage of patients with prior Mi
80
70
60
50
40
New
Ctl
30
20
10
0
1st
2nd
3rd
subcla
4th
5th
Before
Adj
37
•
•
•
•
•
Adjusted Difference: Mew – Control:
Point estimate: -1.5%
2-sided 95% C.I. : (-6.6%, 3.6%)
Non-inferiority margin: 7%
Claim: Non-inferiority w.r.t. Mace 6-month
38
Study Design
• Plan in advance
• Pre-specify clinically relevant baseline
covariates: as many as possible
• Sample size estimation:
– Ignore the propensity score adjustment?
– Could be inappropriate
39
Limitations
• Propensity score methods can only adjust for
observed confounding covariates and not for
unobserved ones.
• Propensity score is seriously degraded when
important variables influencing selection have
not been collected.
• Propensity score may not eliminate all
selection bias.
40
Limitations
• Propensity score methods work better in larger
samples.
• Propensity score is not only way of adjusting for
covariates. And, it may or may not be helpful in
a particular comparison study.
• Randomized trials are considered the highest
level of evidence for trt comparison. Propensity
score methods lack the discipline and rigor of
randomized trials, and not as definitive as
randomized trials.
41
Conclusions
• Propensity score methods generalize technique with one
confounding covariate to allow simultaneous
adjustment for many covariates and thus reduce bias.
• Propensity score methodology is an addition to, not a
substitute of traditional covariate adjustment methods.
• Plan ahead and carefully consider the practical issues
discussed above.
• Randomized studies are still preferred and strongly
encouraged whenever possible!
42
References
• Rubin, DB, Estimating casual effects from large data
sets using propensity scores. Ann Intern Med 1997;
127:757-763
• Rosenbaum, PR, Rubin DB, Reducing bias in
observational studies using subclassification on the
propensity score. JASA 1984; 79:516-524
• D’agostino, RB, Jr., Propensity score methods for
bias reduction in the comparison of a treatment to a
non-randomized control group, Statistics in medicine,
1998,17:2265-2281
43
References
• Blackstone, EH, Comparing apples and oranges, J.
Thoracic and Cardiovascular Surgery, January 2002;
1:8-15
• Grunkemeier, GL and et al, Propensity score analysis
of stroke after off-pump coronary artery bypass
grafting, Ann Thorac Surg 2002; 74:301-305
• Wolfgang, C. and et al, Comparing mortality of elder
patients on hemodialysis versus peritoneal dialysis: A
propensity score approach, J. Am Soc Nephrol 2002;
13:2353-2362
44
Thanks!
45