Propensity Score Matching

Download Report

Transcript Propensity Score Matching

X 11 X 12 X 13
X21 X22
X23
X31 X32
X33
Research Question
• Are nursing homes dangerous for seniors? Does admittance
to a nursing home increase risk of death in adults over 65
years of age when controlling for age, gender, race, and
number of emergency room visits?
Propensity Score Matching
or
Do nursing homes kill
you?
ANNMARIA DE MARS, PH.D.
&
CHELSEA HEAVEN
THE JULIA GROUP
WHY YOU NEED IT
TWO NONEQUIVALENT
GROUPS
Patients in specialized units
People who attend a
fundraising event
Any time you can
ask the question ….
Is there a difference on OUTCOME between levels
of “treatment” A, controlling for X, Y and Z ?
Examples
OUTCOME
“TREATMENT”
LEVELS
COVARIATES
DROP OUT
PUBLIC, PRIVATE
INCOME
PARENT EDUCATION
GR. 8 ACHIEVEMENT
BMI
DAILY SOFT DRINKS
NO SOFT DRINKS
GENDER
AGE
RACE
EXERCISE FREQ.
DEATH
LIVES AT HOME
NURSING HOME
AGE
GENDER
TOTAL ER VISITS
1. Make sure there are
pre-existing differences
(Thank you, Captain Obvious)
2a. Decide on
covariates
• Are the differences pre-existing or could they
possibly be due to the different “treatment” levels?
• Race and gender are good choices for covariates. If
more students at private vs public schools are black
or female, the schooling probably didn’t cause that
• Differences in grade 10 math scores may be a result
of the type of school
2b. Decide on
covariates
Don’t use your outcome variable as one
of your covariates
3. Run logistic regression to
generate propensity scores
PROC LOGISTIC DATA= datasetname ;
CLASS categorical variables ;
MODEL dependent = list-of-covariates ;
OUTPUT OUT = newdataset
PREDICTED= propensity-score;
4. Select matching method
1. Quintiles
2. Nearest neighbors
3. Calipers
ALL OF THE ABOVE CAN BE DONE EITHER
WITH OR WITHOUT REPLACEMENT
5. Run matching
program & test its
effectiveness
6. Run your analysis using the
matched data set
An actual example
Do nursing homes kill you?
Our data
Kaiser Permanente Study of the Oldest Old, 1971-1979
and 1980-1988: [California]
DEPENDENT VARIABLE:
Dthflag = 1 if Died during study period
0 if alive at end of study period
Our data
TREATMENT VARIABLE
athome = 1 if lived at home continuously
0 if admitted to nursing home any
time during study period
Before matching
AT HOME >
NO
YES
TOTAL
DIED
Frequency
(Column %)
=======
==
========
=
NO
184
(14.6)
2,486
(52.6)
2,670
(44.6)
YES
1,077
(85.4)
2,239
(47.4)
3,316
(55.4)
TOTAL
1,261
4,725
5,986
Covariates *
• AGE
• RACE
• GENDER
• TOTAL Emergency Room VISITS **
* Three out of four were DEFINITELY pre-existing differences
** Proxy for health
Create propensity scores
PROC LOGISTIC
PROC LOGISTIC DATA= saslib.old ;
CLASS athome race sex ;
MODEL athome = race sex age_comp vissum1;
OUTPUT OUT =study.allpropen PREDICTED = prob;
NOTE: No DESCENDING option
ODDS Ratios
ODDS Ratios
Yes, pre-existing
differences
TYPE 3 ANALYSIS OF EFFECTS
Wald
Effect
DF Chi-Square Pr > ChiSq
RACE
4
18.7017
0.0009
SEX
1
12.5424
0.0004
age_comp
1 412.8103
<.0001
VISSUM1
1 212.9695
<.0001
QUINTILE MATCHING
EXAMPLE ONE
Part on creating
quintiles blatantly
copied (almost)
http://www.pauldickman.com/teaching
/sas/quintiles.php
Calculate Quintile
Cutpoints
PROC UNIVARIATE DATA= saslib.allpropen;
VAR prob;
OUTPUT OUT=quintile
PCTLPTS=20 40 60 80 PCTLPRE=pct;
Remember the dataset we created with the predicted probabilities
saved in it?
PROC UNIVARIATE
VAR prob;
*** predicted probability as variable
OUTPUT OUT=quintile
PCTLPTS=20 40 60 80 PCTLPRE=pct;
*** output to a dataset named quintile,
*** create four variables at these percentiles
*** with the prefix pct ;
/* write the quintiles to macro
variables */
data _null_ ;
set quintile;
call symput('q1',pct20) ;
call symput('q2',pct40) ;
call symput('q3',pct60) ;
call symput('q4',pct80) ;
Just because I am too lazy to write down the percentiles
Create quintiles
data STUDY.AllPropen;
set STUDY.AllPropen ;
if prob =. then quintile = .;
else if prob le &q1 then quintile=1;
else if prob le &q2 then quintile=2;
else if prob le &q3 then quintile=3;
else if prob le &q4 then quintile=4;
else quintile=5;
Quintiles
Cumulative Cumulative
Quintile Frequency Percent Frequency
Percent
1
1075
19.76
1075
19.76
2
1101
20.24
2176
40.00
3
1088
20.00
3264
60.00
4
1088
20.00
4352
80.00
5
1088
20.00
5440
100.00
The matching part
Try to control your excitement
Create case & control
data sets
DATA small large ;
SET study.allpropen ;
IF athome = 0 THEN OUTPUT small ;
ELSE IF athome = 1 THEN OUTPUT large ;
Create data set of
sampling percentages
PROC FREQ DATA = small ;
quintile / OUT = samp_pct ;
Quintiles in smaller
data set
Cumulative Cumulative
Quintile Frequency Percent Frequency
Percent
1
50
4.06
50
4.06
2
115
9.33
165
13.39
3
208
16.88
373
30.28
4
338
27.44
711
57.71
5
521
42.29
1232
100.00
Create data set of
sampling percentages
PROC FREQ DATA = small ;
quintile / OUT = samp_pct ;
Create sampling data
set
DATA samp_pct ;
SET samp_pct ;
_NSIZE_ = 1 ;
_NSIZE_ = _NSIZE_ * COUNT ;
DROP PERCENT ;
Just here to make it easy to modify
PROC SURVEYSELECT
SAMPSIZE= input data set can provide
stratum sample sizes in the _NSIZE_ variable
STRATA groups should appear in the same
order in the secondary data set as in the
DATA= data set.
SELECT RANDOM
SAMPLE
PROC SORT DATA = large ;
BY quintile ;
PROC SURVEYSELECT DATA= large
SAMPSIZE = samp_pct OUT = largesamp ;
STRATA quintile ;
Concatenate data
sets
DATA study.psm_sample ;
SET largesamp small ;
Did it work?
Variable
Before
After
AT
Home
NOT
Home
Prob
AT Home
NOT
Home
Prob
Age
75.0
79.3
.0001
79.2
79.3
.60
ER visits
4.5
2.4
.0001
4.5 ****
3.8 ****
.0001
Female
49%
54%
.01
52%
54%
.36
Race
** P <.01 **** P < .0001
.0001
.97
Before odds ratio 6.5 : 1
Point
Estimate
Effect
athome 0 vs 1
0.154
95% Wald
Confidence Limits
0.130
0.182
AFTER ODDS RATIO = 3.7: 1
Effect
quintile
athome 0 vs 1
Point
Estimate
0.661
0.273
95% Wald
Confidence Limits
0.610
0.716
0.223
0.334