KCASUG PRESENTATION
Download
Report
Transcript KCASUG PRESENTATION
Propensity Score Analyses: A good
looking cousin of an RCT
KCASUG Q1: March 4, 2010
Kevin Kennedy, MS
Saint Luke’s Hospital, Kansas City, MO
John House, MS
Saint Lukes’s Hospital, Kansas City, MO
Phil Jones, MS
Saint Luke’s Hospital, Kansas City, MO
Motivation
• Estimating Treatment effect is important!
– Is Drug “A” advantageous to Placebo?
– Do same sex classes increase academic performance?
– Do Titanium golf clubs increase distance of drives?
• Designing ways to answer these questions should
be:
– Ethical
– Practical
– Cost Effective
The Gold Standard
• Randomized Control Trials
– Randomization of subjects to treatment groups (essentially coin
flip determines group)
– On average all subject characteristics will be balanced between
groups
Treatment
Control
(n=100)
(n=100)
Age
57±3.2
57±3.1
.78
Male
57%
58%
.65
History
Diabetes
22%
22%
.99
History
8%
9%
.75
Heart Failure
P-value
Benefits of a RCT
• A pure link between Treatment and Outcome
– Random allocation of subjects removes the possibility of
a third factor being associated with treatment and
outcome
• Can blind subjects and researchers to treatment
allocation
Potential Caveats with an RCT
• Ethical Issues:
– Not assigning subjects to a treatment generally thought
to improve outcomes is often thought unethical
• Practical Issues:
– Problems with recruitment of subjects
• Consenting to “alternatives”, and substantial ‘drop out’
– Cost and Time Issues:
• Enrolling subjects, training staff, designing trial, treatment
• May be “too” controlled
– Specific subject criteria and treatment use
– Population may not represent the “real world” experience
Spaar A, Frey M, Turk A, Karrer W, Puhan MA. Recruitment barriers in a randomized controlled trial from the physicians'
perspective: a postal survey. BMC Med Res Methodol. 2009 Mar 2;9:14
So…what now?
• Observational data is popular
– Treatment is not given due to randomization, only observed
– Unfortunately…Subject characteristics will likely not be
balanced
Treatment
Control
(n=100)
(n=100)
Age
57±3.2
62±5
.031
Male
57%
42%
.047
History
Diabetes
22%
30%
<.001
History
8%
15%
..035
Heart Failure
P-value
So…what now?
• Need to account for the differences between
treatment and control
– Common in modeling to “adjust” away differences
between groups
• However, sample size constraints restrict the # of
variables to adjust for
• Solution: Propensity Scores
Propensity Score Outline
I.
II.
i.
ii.
Introduction
How to use the score
Matching
Stratifying
III. Accessing Balance
i.
Standardized Difference
IV. Propensity Scores Using SAS
V. Concluding remarks
i.
ii.
Other uses
Issues with publications
Introduction
• Definition:
– Propensity score (PS): the conditional probability of being
treated given the individual’s covariates
– Notation:
e( xi ) P(Z i 1 | X i xi )
W here: Z i 1 if treatment and 0 if control
and xi are observedcovariates
– Estimating Propensity Score can be done with the common
logistic regression model predicting treatment on selected
covariates needing balanced
– Will be used to balance characteristics between groups
Introduction
Treatment
Control
P-value
(n=100)
(n=100)
Age
57±3.2
62±5
.031
Male
57%
42%
.047
History Diabetes
22%
30%
<.001
History
8%
15%
..035
Heart Failure
Here we would develop a PS for being in the treatment group
conditioned on: age, gender, diabetes history, and heart failure
Introduction-why important?
• Important: For a specific value of the PS the
difference between treatment and control is an
unbiased estimate of the average treatment effect
at that PS (Rosenbaum & Rubin, 1983; Theorem 4)
• “Quasi-Randomized” experiment
– Take 2 subjects (one from treatment and other control)
with the same PS then you could “imagine” these 2
subjects were “randomly” assigned to each group. (since
they are equally likely to be treated.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal
effects. Biometrika. 1983;70:41–55.
Introduction
• It’s not just a side analysis anymore……
# of publications with PS in Search
PubMed
500
400
300
200
100
09
20
08
20
07
20
06
20
05
20
04
20
03
20
02
20
01
20
20
00
0
Ways to use the PS
• Common strategies include:
– Matching
• Match treatment and controls on PS
– Stratification
• Keep all subjects but analyze in Strata (usually quintiles of
PS)
– Regression adjustment
Matching
• Most common use of PS analyses.
– Since the PS is a single scalar quantity Matching is comparatively
easier (as opposed to matching on: age, gender, history, etc…)
– Matching 1 Control to 1 Treatment makes for an easily understood
analyses
– Common to match on the Logit of the PS since it is approximately
normal
1 e( X )
L( x) log
e( X )
Matching
• Nearest Neighbor matching (w/o replacement)
– Randomly Order Treated and Control Subjects
– Take the first treated subject and find the Control with the
closest Propensity Score. Remove both from list
– Move to the second Treated subject and find control with
closest PS……continue until you run out of treated
patients
• This will create a 1:1 match of treated and control
patients
– Note: methods exist for 1:many matches also
Matching
• Problem: The “Nearest” neighbor may not be that
“Near”
• May want to enforce a caliper width for acceptable
matches
– E.g. if there is no control within the ‘caliper’ of a case
then no match occurs and case will be removed
• Common in Literature to use:
.2*stddev[L(x)] as the caliper
• For a matching macro see:
mayoresearch.mayo.edu/biostat/upload/gmatch.sas
Matching: Ideal Scenario
Treatment
Control
(n=543)
(n=1598)
Age
57±3.2
62±5
.031
Male
57%
42%
.047
History
Diabetes
22%
30%
<.001
History
8%
15%
..035
Treatment
Control
P-value
(n=500)
(n=500)
Age
57±3.2
57.3±3
.45
Male
57%
57%
.88
History
Diabetes
22%
23%
.48
History
8%
7%
.77
• Before Match
P-value
Heart Failure
• After Match:
Heart Failure
Stratification
• Matching will inevitably result in a smaller dataset
• Stratifying analyses on PS will keep all data.
– Create the PS
– Cut the PS into equal groups (Quartile, Quintiles)
• (Rosenbaum & Rubin, 1983) claim quintile strata will remove
90% of bias
– Conduct the analyses within these strata
Example
• Comparison of Angiography (vs not) in elderly patients with
Chronic Kidney Disease (CKD)
– Propensity score for receiving an Angio
• Based on Demographics, History, and Hospital Characteristics
Propensity
Quintile
Group
# of
patients
1-year
Mortality
OR (95%CI)
1(0-.06)
Angio
46
56.5%
1.02 (.56-1.84)
No Angio
1307
56.2%
Angio
133
36.8%
No Angio
1221
50.7%
Angio
303
34.7%
No Angio
1051
44.7%
Angio
557
30.7%
No Angio
797
38.3%
Angio
967
18.9%
No Angio
387
34.1%
Angio
2014
26.7%
No Angio
4780
47.4%
2(.06-.16)
3 (.16-.30)
4 (.30-.54)
5 (.54-1)
Overall
.57 (.39-.82)
.66 (.50-.86)
.72 (.57-.90)
.45 (.35-.59)
.62 (.54-.70)
Chertow GM, Normand SL, McNeil BJ. "Renalism": inappropriately low rates of coronary angiography in elderly
individuals with renal insufficiency. J Am Soc Nephrol. 2004 Sep;15(9):2462-8
Covariate Adjustment
• This use would be the least recommended.
• Do a model for PS, and then use that PS in a
model as an adjustment when evaluating
association between treatment and outcome
• Advantage over normal covariate adjustment
– Simpler final model
– Can have many more covariates in the PS model
Assessing Balance
• Remember: the main purpose of a PS is to
balance characteristics between treated and
controls…so how do we show success?
• P-values
– Function of Sample Size
– May be misleading for Stratification or 1:many match
• Standardized Differences
– Not a function of Sample Size
– Can be used for Stratification and 1:many matches
Standardized Differences
• Formula: Continuous Variables
d 100*
x treatment x control
2
2
s treatment
s control
2
• Formula: Dichotomous Variables
d 100*
pˆ treatment pˆ control
pˆ t (1 pˆ t ) pˆ c (1 pˆ c )
2
• For Stratified analyses: compute d in each strata and take
average
Standardized Differences
• Sample Calculations for a 1:1 match:
• Before Match
Age
Treatment
Control
(n=543)
(n=1598)
57±3.2
62±5
P-value
d 100*
.031
57 62
3.2 2 5 2
119
2
• After Match
Age
Treatment
Control
(n=500)
(n=500)
57±3.2
57.3±3
P-value
.45
d 100*
57 57.3
3.2 2 3 2
2
.9
Standardized Differences
• What value constitutes balance?
– Peter Austin Commonly states values less than 10
constitute balance between groups
– The closer to ‘0’ then more balanced
Propensity Analysis (Matching)
Using SAS
• Simulated Data
• Data specifics
– N=5000 (~1000 Group1, ~4000 Group2)
Group1
N=1011
Group2
N=3989
P-value
Age
59.4 ± 4.0
63.5 ± 4.0
< 0.001
Male_Gender
560( 55.4% )
2009 ( 50.4% )
0.004
History of
Diabetes
689 ( 16.9% )
516 ( 21.4% )
< 0.001
Example: Create PS
proc logistic data=dataset descending;
model group1= age gender diabetes {+others};
output out=pred p=pred xbeta=logit;
run;
Predicted
probabilities of
being in group 1
On Logit scale
Example: Define Caliper
proc means data=pred stddev;
var logit;
output out=lstd;
run;
data _null_;
set lstd;
if _stat_='STD' THEN do;
call symputx('std',logit/5);
end;
run;
Creating “caliper”
of .2*stddev(logit)
Example: Perform Match
%gmatch(data=pred, group=group1, id=id,
mvars=logit, wts=1 , dmaxk=&std, ncontls=1,
seedca=987896, seedco=425632, out=match);
Group1
N=858
Group2
N=858
P-value
Age
60.1 ± 3.6
60.17 ± 3.62
.678
Male_Gender
469( 54.66% )
478 ( 55.71% )
.662
History of
Diabetes
261 ( 30.42% )
256 ( 29.84% )
.792
mayoresearch.mayo.edu/biostat/upload/gmatch.sas
Example: Assess Balance
• Original Data
– %std_diff(data=fulldata, group=group1, continuous=age {+others},
binary=male diabetes {+others}, out=before)
• Matched Data
– %std_diff(data=matched_data, group=group1, continuous=age
{+others}, binary=male diabetes {+others}, out=after)
• Combine
data after;
set after(rename=(stddiff=after_stddiff));
run;
proc sql;
create table both as select *
from before as a join after as b on a.variable=b.variable
;
quit;
Example: Assess Balance
Variable
label
STD DIFF
Before
STD DIFF AFTER
V1
V2
V3
…
Age
Gender
Diabetes
…
99.65
9.22
15.9
…
.3
.45
3.3
…
proc gplot data=both;
title 'Standardized difference plot';
plot label*StdDiff=1 label*after_stddiff=2/overlay vaxis=axis1
haxis=axis2
href=10 legend=legend1 AUTOVREF chref=black lhref=3;
run;
quit;
Standardized difference plot
Running out of Names
Random Variable
Made Up
Hmmm…a bit ugly
Kevin Rules
KCASUG
Gender
Diabetes History
Blah Blah
Also Made Up
Before Match
After Match
Age
0
10
20
30
40
50
60
70
80
90
Standardized Difference
100
110
120
Format macro
proc sort data=both;by stddiff;run;
/*attach formats to variables*/
%macro doformat(data=);
data &data;
set &data;
Counter Variable
count+1;
run;
proc sql;
select label into :label separated by '*' from &data;
quit;
%let numvar=%words(&label,delim=%str(*));
proc format;
value fmt
%do i=1 %to &numvar ;
&i=%qscan(&var,&i,*)
%end;;
run;
data &data;
set &data;
format count fmt.;
run; %mend;
%doformat(data=both);
Sort by stddiff
before match
Read in Label
names into &label
Count # of
Variables
Format (i) counter
with (i) label
Assessing Balance
Variable
label
STD DIFF Before
STD DIFF AFTER
Count
V1
V3
V2
…
Age
Diabetes
Gender
…
99.65
15.9
9.22
…
.3
3.3
.45
…
Age
Diabetes
Gender
…
proc gplot data=both;
title 'Standardized difference plot';
plot count*StdDiff=1 count*afterstddiff=2/overlay vaxis=axis1 haxis=axis2
href=10 legend=legend1 AUTOVREF chref=black lhref=3;
run;
quit;
Standardized difference plot
Age
Blah Blah
Kevin Rules
KCASUG
Diabetes History
Gender
Random Variable
Also Made Up
Running out of Names
Before Match
After Match
Made Up
0
10
20
30
40
50
60
70
80
90
Standardized Difference
100
110
120
Standardized difference plot
stemi
emergency
elective
age
currentsmoke
nstemi
apr_mort
cardiogenic_shock
prior_PCI
self_pay
apr_sev
hypertension
hyperlipidemia
diabetes
race_white
male
chronic_kidney_dis
formersmoke
race_black
prior_MI
anemia
PVD
oth_aterialdisease
rheumatic_HD
CVD
heartfailure
stroke
renal_insufficiency
tia
COPD
obese
dialysis
otherheart_disease
renal_failure
underweight
Before Match
After Match
0
10
20
30
40
Standardized Difference
50
60
70
Now What?
• Variable Standardized differences are <10,
indicating balance
• Now we can see if group membership has an
impact on our outcome
– Caution: this is matched data so statistically we need to
account for this
• Paired t-tests, McNemars Test, Conditional Logistic
Regression, Stratified Proportional Hazard Regression
Other Uses…
• A way to show just how different 2 groups are…
Distribution of Propensity Scores
1.0
0.9
Probability of Group 2
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Group 1
Group 2
Distribution of Propensity Scores
1.0
0.9
0.7
Probability of CAS
Probability Group 2
0.8
0.6
0.5
0.4
0.3
0.2
0.1
0.0
CEA
Group
1
CAS
Group
2
Concluding Remarks
• If you want more information: Search for Ralph
D’Agostino Jr. (Wake Forest) and Peter Austin
(Univ of Toronto)
• Introductory Read:
– D’Agostino JR: Tutorial in Biostatistics: Propensity Score Methods for
Bias Reduction in the comparison of treatment to a non-randomized
control group. Statist. Med 17 (1998), 2265-2281
• 1:Many Matching
– Austin P. Assessing balance in measured baseline covariates when using
many-to-one matching on the propensity score. Pharmacoepidemiology
and drug safety (2008) 17: 1218-1225
Concluding Remarks…things to
avoid
• Austin (2008) performed a literature review and found
many propensity score matching papers were done
incorrectly
– 47 Articles reviewed from medical literature which did
Propensity Score Matching
• Only 2 studies used Standardized Differences to access
match (most relied on p-values)
• Only 13 used correct statistical methods for matched data
• See paper for the common errors
– Only 2 studies assessed balance correctly and used
correct statistical methods
Austin PC. A critical appraisal of propensity-score matching in the medical
literature between 1996 and 2003. Stat Med. 2008 May 30;27(12):2037-49
Concluding Remarks…things to
avoid
•
Austin’s Recommendations
1. Strategy for creating pairings should be specifically
stated with appropriate statistical citation
2. The distribution of baseline characteristics between
treated and control should be described
3. Differences in distributions should be assessed with
methods not influenced by sample size
4. Use appropriate statistical methods to account for
match
i.
ii.
McNemar’s Test for Binary data
Use of strata statement in proc logistic or phreg
What have we learned…if anything
1. RCT may be the gold standard but Propensity
Scores are their attractive cousin
2. Using PS can remove a lot of bias in determining
treatment effect
3. You can: Match, stratify, or adjust for the PS
4. Use the standardized difference to determine
balance (unaffected by sample size)
Name:
Kevin Kennedy
Company: Mid America Heart Institute: St. Luke’s Hospital
Address: 4401 Wornall Rd, Kansas City, MO
Email:
[email protected] or [email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.