Transcript Document

POPULATION RESEARCH SEMINAR SERIES
Sponsored by the Statistics and Survey Methods Core of the U54 Partnership
Using Propensity Score Analysis in
Behavioral Studies
Jie Chen, Ph.D.
University of Massachusetts Boston
Outline:
2


Introduction/Motivation
 Randomized Clinical Trial (RCT)
Propensity Scores Estimation and Matching


Post matching analysis:




Balance check before and after matching
Estimation of the average treatment effect
Propensity score stratification and weighting
Statistical package for Propensity Scores Matching
 R, Matching, Matchit
 Stata, psmatch2
An illustration example
7/17/2015
Randomized Controlled Trial (RCT)
3


In a randomized controlled trial, participants are
assigned to treatment conditions at random (i.e., they
have an equal probability of being assigned to any
group).
The allocation doesn’t depends on participants’
background (covariates).
WX

RCT is the gold standard in for a clinical trial

With RCT, the outcome difference is due to the
intervention, not participants’ background.
7/17/2015
Selection bias in observational studies
4

In some health research, RCT are not practical




Covariates which determine the treatment decision also
determine the outcome variable of interest.
The difference of outcome between treatment and control
groups may not due to the treatment, but due to the selection,
the covariates
It is important to have balanced covariates in testing
treatment effect.
Propensity scores summarize covariate information about
treatment selection into a scalar value
7/17/2015
7/17/2015
4
Propensity scores
Rosenbaum and Rubin (1983)
5


Propensity score is a balancing score
Treatment assignment and the observed covariates
are conditionally independent given the Propensity
Score (PS):
W  X | e( X )


With the same values of propensity scores, it is
assumed to have the same distribution of covariates
Propensity scores reduce the dimension of covariates
7/17/2015
When propensity score analysis is needed?
6


To remove selection bias
 Rosenbaum and Rubin (1983) proposed Propensity
Score Matching (PSM) as a method to reduce the
bias in the estimation of treatment effects with
observation data sets.
o Selection bias: a bias that occurs as a result of
using samples from a non-randomly selected data.
To analyze causal effects in non-RCT
7/17/2015
Why propensity score analysis is needed?
7



If the data is not generated by RCT
The assumption of ordinary least square (OLS)
regression model is violated
The violation of OLS assumption will cause an inflated
and asymptotically biased estimate of treatment effect.



OLS regression model using dichotomous indicator of
treatment doesn’t work
The error term is correlated with explanatory variables
PS is a non-parametric (semi-parametric) approach
7/17/2015
A biased and inconsistent estimation
8

Assuming a centered simple linear regression:
Y  1 X  

The OLS estimate is:
n
ˆ1  ( X ' X ) 1 X ' Y 
 x y  x ( x   )
i 1
n
i
i
x
i 1

n
2
i

i 1
i
i
n
2
x
 i
i 1
i
n

x
i 1
n
i i
2
x
 i
i 1
When x and ε are correlated, E[ X '  ]  0 the
estimate will be biased and inconsistent.
7/17/2015
The omitted variable formula
9


Suppose that a correctly specified regression model
would be
Y  W  Xβ  
If we regress Y on W without using X then
ˆ  (W 'W ) 1W 'Y  (W 'W ) 1W ' W  βX  ε 
   (W 'W ) 1W ' Xβ  (W 'W ) 1W 'ε

If W is correlated with X or W is correlated with ε,
then the estimate is biased.
7/17/2015
The assumption of Strongly Ignorable
Treatment Assignment (SITA)
10

SITA assumption (Rosenbaum & Rubin 1983)
(Y0 , Y1 )  W | X
where Y0 and Y1 are the outcome measurements for
control and treatment groups respectively.
Unconfoundedness: assignment to treatment is
independent of the outcomes, conditional on the
covariates.
Overlap: common support condition:
0  P(W  1 | X )  1
7/17/2015
The assumption of SITA
11

The assumption of strongly ignorable treatment
assignment (SITA), embedded in OLS regression, then
the treatment effect can be estimated using OLS
regression:
Yi    Wi  1 X i   i
error term  i and Wi are assumed to be
independent.
If SITA is not hold, there is a bias in the estimation of
treatment due to the endogeneity.
the

7/17/2015
Propensity scores (PS)
12

Definition: propensity scores are defined as the
conditional probability of assignment to a particular
treatment (Wi = 1) versus control (Wi = 0) given a
vector of observed covariates, xi:
e(xi )  PrWi  1 | Xi  x.


It reduces the dimensions in matching or
stratification.
Propensity scores summarize covariate information
about treatment selection into a scalar value
7/17/2015
Propensity score estimation
13


Propensity score can be estimated as the predicted
probability that saved from an estimated logistic
regression
In practice, one can use the logit of the predicted
probability as a propensity score:

1  eˆ( x) 
qˆ ( xi )  log
.
 eˆ( x) 
The distribution of qˆ ( xi ) approximates to normal.
7/17/2015
Propensity Score
14

If SITA holds, the treatment assignment and the
observed covariates are conditionally independent
given the propensity score (PS):
W  X | e( x)


For a given propensity score, exposure to treatment
and control as RCT.
With given PS scores, the expected difference in the
outcome measurements is due to the treatment effect
that is equal to the Average Treatment Effect (ATE).
7/17/2015
Propensity score models
15




A correct specification of PS model (logistic
regression) is important in PSM process
To include all important conditioning variables in the
model
Specify correct function forms, linear, interration or
polynomial function of covariates
Rosenbaum and Rubin (1984) suggested to use
high-order polynomial terms and/or cross-product
interaction terms in logistic regression
7/17/2015
Propensity score properties:
(Rosenbaum and Robin 1983)
16

Average treatment effect (ATE)
ATE    E(Y1 | W  1)  E(Y0 | W  0)

The expected difference in responses of treatment
and control with the same value of PS e(x) equals
to ATE
ATE :  E(Y1 | W  1)  E(Y0 | W  0)  E(Y1  Y0 | e( x))

Subclassification of propensity scores:
 All units within a stratum have the same e(x)
7/17/2015
The average treatment effect at treated
(ATT)
17

Heckman (1997), suggest to use the average
treatment effect (ATT)
ATT    E(Y1  Y0 | W  1)  ATE  bias
where the selection bias is
bias  E(Y0 | W  1)  E(Y0 | W  0)

ATT is identified without selection bias
ATT  ATE .
7/17/2015
Propensity score matching
18




To find a units in control group that similar to the
treated subjects in all relevant pre-treatment
covariates
Subjects with same propensity score can imagine that
they were “randomly” assigned to each group
In this way, differences in outcomes of this well
selected and thus adequate control group can be
attributed to the treatment.
Matching procedures based on PS are known as
Propensity Score Matching (PSM)
7/17/2015
Matching algorithms
19

Greedy matching



Mahalanobis metric matching


“Nearest available”: for each treated subject, to find a
closest control subject based on PS
It depends on the overlap of treatment and control
groups
Uses PS and individual covariance values
Optimal matching

To minimize the average of distances among matched
units
7/17/2015
Greedy Matching
20

Greedy matching
 Nearest
neighbors: a closest propensity score pj from
control group is selected for i subject from treatment
C( Pi )  min j | Pi  Pj |, j  I 0
 With
caliper: imposing a tolerance on the maximum
distance,
| Pi  Pj |  , j  I 0
 Where
δ is a pre-specified tolerance,
 Recommended
caliper size: .25σp
7/17/2015
Mahalanobis metric matching
21

Without propensity scores:
 The Mahalanobis distance on covariates x between
units I and j is:
dij  ( xi  x j )T 1 ( xi  x j )
where ∑ is the sample covariance matrix.
with propensity scores: PS is added as one of the
covariates
Mahalanobis matching can be used to find out
matched pairs when the units number is small.



7/17/2015
Optimal matching
22


OPSM is the process of developing matched sets such
that the total sample distance of propensity scores is
minimized
Post Pair matching (1-to -1 treated-to-control subjects)
 Paired test
 Regression the difference scores on the outcome
variable between treated and control participants on
difference scores of covariates
d (Y )  0  1d ( X1 )  2d ( X 2 )   

The intercept ˆ0 is the estimated ATE
7/17/2015
Balance check before and after matching
23



Before matching: bivariate test
 Continuous covariates: Wilcoxon rank sum test or
independent sample t-test
 Categorical covariates: Chi-square test
After matching: bivariate test with some adjustments
 Wilcoxon matched pairs signed-rank test
 McNemar’s test
If the postmatching bivariate tests are nonsignificant,
then PS has removed group differences on the observed
covariates.
7/17/2015
Balance check
24

Haviland, Nagin and Rosenbaum (2007) developed
absolute standardized difference (Cohen’s d) in
covariate means
dX 

| M Xt  M Xc |
SX
, S X  (S X2 t  S X2 c ) / 2
The absolute standardized difference after
matching is d X
It is expected to have d X  d X
m

m
7/17/2015
Post-matching analysis
25


To estimate the average treatment effect (ATE)
To perform multivariate analysis after matching:

Multiple regression





Propensity scores weighting
Generalized linear model such as logistic or Poisson
regression
Survival analysis
Hierarchical linear modeling (HLM)
Structural equation modeling (SEM)
7/17/2015
LaLonde (1986) experimental data
26




Data used by Lalonde (1986)
It has been used by many aothors (Abadie et at.
2004, Dehejia and Wahba, 1999)
We use a sub data set in in R Package ‘CBPS’.
To test the effect of participation in a job training
program on individuals earnings in 1978


re78: the outcome of interest is real earnings in 1978
dollars.
treat: treatment variable, participation in job training
program
7/17/2015
LaLonde (1986) experimental data
27

There are eight pre-treatment covariates:





age (age),
years of education (educ),
real earnings in 1974, (re74)
Real earnings in 1975, (re75)
Indicator variables:
• black (black),
• Hispanic (hisp),
• married (married)
• lack of a high school diploma (nodegr).
7/17/2015
Group comparisons prior to matching
Descriptive statistics and P-values
28
Treatment (n = 297)
Control (n = 2915)
P-value
Effect Size
Covariates
Age
Mean /%
SD/N
Mean /%
SD/N
(t-test)
(d)
24.63
6.69
33.33
10.63
.000
13.84
Education
10.38
1.82
11.84
2.99
.000
8.22
Black
80.1%
238
33.1%
964
.000
16.64
Hispanic
9.4%
28
4.4%
129
.001
3.82
Married
16.8%
50
76.3%
2224
.000
23.19
Nodegr
73.1%
217
37.9%
1106
.000
11.97
re74
3571.00
5772
17133.32
13807.4
.000
16.78
re75
3066.10
4874.89
16725.08
13927.2
.000
16.80
re78
5976.35
6923.80
19152.79
15657
.000
14.36
7/17/2015
Group comparisons after matching
(from R output based on NN)
29
Treatment (n = 297)
Control (n = 297)
P-value
Effect Size
Mean /%
SD/N
Mean /%
SD/N
(t-test)
(d)
Age
24.63
6.69
25.05
7.44
.465
.659
Education
10.38
1.82
10.39
2.03
.932
.730
Black
80.1%
238
84.2%
250
.198
.953
Hispanic
9.4%
28
8.1%
24
.561
.431
Married
16.8%
50
15.8%
47
.739
.110
Nodegr
73.1%
217
74.1%
220
.780
.01
re74
3571.00
5772
3937.20
5365.39
.420
.955
re75
3066.10
4874.89
3327.86
4603.25
.501
.136
re78
5976.35
6923.80
6513.40
7130.44
.352
.415
Covariates
7/17/2015
Simple linear regression model:
30

Let outcome variable be, Y = re78
E (Y | t )    t

Where the treatment t is a dummy variable

The average outcome for the treated is:
E (Y | t  1)    

The average outcome for the control is:
E (Y | t  0)  
So the average treatment effect (ATE) is estimated by
the slope of t:
  E (Y | t  1)  E (Y | t  0)

7/17/2015
Stata commands and results
31
reg re78 treat
The expected average earning for the treated is lower than
7/17/2015
for control. It this a reliable result?
Multiple regression, controlling for
covariates
32
reg re78 treat educ
reg re78 treat educ age black hisp married nodegr re74 re75
7/17/2015
PS matching with STATA psmatch2
33

Install psmatch2



ssc install psmatch2
There are many subjects with re74 = 0 and re75 =
0. Those are unemployed.
In order to balance the proportion of unemployed
in the treatment and control groups, we create two
dummy indicators or unemployment:


gen un74 = (re74==0)
gen un75 = (re75==0)
7/17/2015
PS matching with STATA psmatch2
34

Estimate a logit model for the PS



logistic treat age educ black hisp married nodegr re74
re75 un74 un75
predict pscore,pr
Us psmatch2 for matching:


A simple NN matching without replacement
psmatch2 treat, pscore(pscore) outcome(re78)
noreplacement

The noreplacement option perform 1-to-1 matching without
replacement.
7/17/2015
psmatch2 output
35


Estimated treatment effect ATT
The average treatment effect is estimated as 202.95 with SE of 559.80 that is not significant.
7/17/2015
Matching with cliper
36

with a caliper, to impose a maximum tolerance of
.25SD.
| Pi  Pj | .25 p , j  I 0

The average treatment effect is estimated as 72.89
with SE of 556.71 that is not significant.
7/17/2015
Check balance after matching
37
Covariates are well balanced, %bias < 5%
7/17/2015
R syntax
38

To load R libraries and data set
library(MatchIt)
 library(Matching)
 data("LaLonde")


library(optmatch)
library(CBPS)
attach(LaLonde)
Balance check before matching using bivariate
analysis
t.test(LaLonde$age ~ LaLonde$treat,data = LaLonde)
 tb1<-table(LaLonde$treat, LaLonde$black)
 chisq.test(tb1)

7/17/2015
R syntax using “matchit” with
Nearest Neighbor (NN)
39




m.out<-matchit(treat~age + educ + black + hisp +
married + nodegr + re74 + re75+un74+un75,
data = LaLonde, method = "nearest",ratio = 1)
summary(m.out)
outdata<-match.data(m.out)
write.csv(outdata,file =
"F:/Workshops/PSM/lalondematchif.csv")
7/17/2015
plot(m.out,type = "jitter")
40
7/17/2015
plot(m.out,type = "hist")
41
7/17/2015
R syntax using “matchit” with 1-1
optimal matching
42



mopt.out<-matchit(treat~age + educ + black +
hisp + married+nodegr+re74+ re75+un74+un75,
data = LaLonde, method = "optimal", ratio = 1)
outdataopt<-match.data(mopt.out)
write.csv(outdataopt,file =
"F:/Workshops/PSM/lalondeopt.csv")
7/17/2015
Data file with Optimal 1-1 matching
43
7/17/2015
R syntax using “Matching”
44
# to estimated PS using logistic regression
glm1<-glm(Treat~ age + educ + black + hisp + married + nodegr + re74
+ re75+un74+un75, family = binomial, data = LaLonde)

boxplot(glm1$fitted ~ LaLonde$treat,xlab = "Treatment",ylab = " Propensity
Scores", main="LaLonde Data")

m1<-Match(Y = Y, Tr = Tr, X = glm1$fitted, estimand="ATT", M=1,
replace=FALSE)

sub<-rbind(m1$index.treated,m1$index.control)

subdata<-LaLonde[sub,] # to select sub data set with matched subjects

# check balance before and after matching
MatchBalance(Treat ~ age +educ+black+hisp+married + nodegr+ re74 +
re75+un74+un75, match.out = m1, nboots = 1000, data = LaLonde)

7/17/2015
Boxplot of propensity scores for
LaLonde data
45
boxplot(glm1$fitted ~ LaLonde$treat,xlab = "Treatment",ylab = " Propensity Scores",
main="LaLonde Data")
7/17/2015
Propensity Score Subclassifiation
46
1.
2.
•
3.
4.
5.
Sort the sample by estimated propensity scores
Divide the sample into K strata using quantiles of the
estimated propensity scores
A rule of thumb: discard all observations with PS
outside the range [.1,.9]
Evaluate treatment effect within each stratum
Estimate mean difference (ATE) for the whole sample
(all K strata combined) through aggregating
Test whether the sample difference on outcome is
statistically significant
7/17/2015
Stratification
47

Sorted data on PS into s stratums.
For each stratum s, to calculate the differences of
mean outcome between treated and control:

ˆs  E( yˆ s1 | ws  1)  E( yˆ s 0 | ws  0)
To calculate the average of treatment effect using
arithmetic mean:

n
ˆ   s Y1s  Y0 s ,
s 1 N
S

2
n 
Var(ˆ)    s  varY1s  Y0 s .
s 1  N 
S
To perform a statistical significance test
7/17/2015
Subclassification
48
Means
Standard Errors
No. of
subjects
Stratum
Control
Treat
Differences
Control
Treat
Diff.
Subclass1
5180.7
6708.9
1528.20
528.4
981.3
1050.48
150
Subclass2
6653.3
4836.6
-1816.72
726.3
854.1
1153.44
133
Subclass3
5702.0
7945.8
2243.78
789.8
1519.8
1572.18
125
Subclass4
3907.3
5001.1
1093.79
468.0
720.1
836.05
148
Subclass5
4140.2
4481.7
341.49
509.7
724.1
860.35
127
495
683
ATE
ˆs
ˆ  693
7/17/2015
Calculation of ATE based on
subclassification
49

To calculate the average of treatment effect using arithmetic
mean:S
ns

Y1s  Y0 s 
s 1 N
150
1528  1331817  125 2243  148 1094  127 341  693

683
683
683
683
683
ˆ  

To calculate 2weighted variance of ATE,
SE(ˆ)  Var(ˆ)  495
n 
var(ˆ)    s  varY1s  Y0 s 
s 1  N 
S
2
2
2
2
2
 150 
 150 
 133 
 125 
 127 
2
2
2
2
2

 1051  
 1153  
 2572  
 836  
 860
 683
 683
 683
 683
 683
 245025

Significance test:
z*   / SE(ˆ)  1.4
7/17/2015
Boxplot of PS for subclasses
50
7/17/2015
Boxplot of PS for subclasses
0.1<PS < .95
51
7/17/2015
Propensity Score Weighting
52

The estimated PS can be treated as a sampling
weight in multivariate analysis. (McCaffrey et al., 2004)
 To

use all subjects in a study
Calculation of weight for ATE
Wi
1  Wi
i  


e( X i ) 1 e( X i )
 For
treatment case, wi = 1,
 For control case, wi = 0, i 
1
e( Xi )
i  
1

1 e( X i )
7/17/2015
Comparisons of the Average Treatment
Effect (ATE)
53
Matching
Estimate ATE
ES (d) Analysis methods
Software
Nearest Neighbor
-202.95
.36
Independent sample t
Nearest Neighbor
-241.57
.415
Independent sample t
Stata
psmatch2
R matchit
Optimal matching
-85.56
.16
Paired t-test
R matchit
Optimal matching
49.42
.09
Regression of the
differences
R matchit
Nearest Neighbor
-537.05
.931
Independent sample t
R Matching
Sub classification
693.02
1.4
Sub classification
formula
R matchit
.10< PS < .90
PS weighting
Before matching
-10837
22.54 Weighted regression
SAS
-13176.44
14.36 Independent sample t
SPSS
7/17/2015
References
54




Guo, S. & Fraser, W.M. (2010). Propensity Score
Analysis Statistical Methods and Applications. Thousand
Oaks, CA: Sage Publications.
Heckman, J. J. (1979). Sample selection bias as a
specification error. Econometrica, 47, 153 – 161.
Rosenbaum, P.R. (2002) Observational Studies (2nd
ed.). New York: Springer.
Wooldridge, J. M. (2002). Econometric analysis of
cross section and panel data. Cambridge: MIT Press.
7/17/2015
References
55





Connors Jr AF, Speroff T, Dawson NV, et al. The effectiveness of right heart
catheterization in the initial care of critically ill patients. JAMA 1996; 276:
889-897.
D'Agostino Jr RB, Tutorial in biostatistics: propensity score methods for bias
reduction in the comparison of a treatment to a non-randomized control
group. Stat Med 1998; 17: 2265-2281.
Dehejia, R., & Wahba, S. (1999). Causal effects in nonexperimental studies:
Reevaluating the evaluation of training programs. Journal of the American
Statistical Association, 94, 1053-1062.
Freedman, D.A., & Berk, r. A. (2008). Weighting regressions by propensity
scores. Evaluation Review, 32, 392 – 409.
Guo S. and Fraser W. M. (2009). Propensity Score Analysis: Statistical
Methods and Applications. Thousand Oaks, CA: Sage publications.
7/17/2015
References
56





Heckman, J. J. (1979). Sample selection bias as a specification error.
Econometrica, 47, 153 – 161.
Heckman, J. J. (1997). Instrumental variables: A study of implicit behavioral
assumptions used in making program evaluations. Journal of Human
Resources, 32, 441 – 4632.
Haviland, A., Nagin, D. S. & Rosenbaum, P. R. (2007). Combining propensity
score matching and group-based trajectory analysis in an observational
study. Psychological Methods, 12, 247 – 267.
LaLonde, R.J. (1986). Evaluating the econometric evaluations of training
programs with experimental data. American Economic Review 76, 4, 604620.
McCaffrey et al., (2004). Propensity score estimation with boosted
regression for evaluating causal effects in observational studies.
Psychological Methods, 9, 403 – 425.
7/17/2015
References
57





Rosenbaum PR. Observational Studies. New York, NY: Springer-Verlag,
2002.
Rosenbaum P.R & Rubin D.B. (1983). The central role of the propensity
score in observational studies for causal effects. Biometrika 1983; 70:4155.
Rosenbaum P.R & Rubin D.B. (1984). Reducing bias in observational studies
using subclassification on the propensity score. Journal of the American
Statistical Association, 79, 516-524.
Rubin DB. Estimating causal effects from large data sets using propensity
scores. Annal of Internal Medicine 1997; 127: 757-763.
Sekhon, J. S. (2011). Multivariate and propensity score matching software
with automated balance optimization: The matching package for R. Journal
of Statistical Software. 42(7).
7/17/2015
POPULATION RESEARCH SEMINAR SERIES
Sponsored by the Statistics and Survey Methods Core of the U54 Partnership
Questions? Comments?
Send us an email!
[email protected]