A workshop introducing doubly robust estimation of treatment effects Michele Jonsson Funk, PhD UNC/GSK Center for Excellence in Pharmacoepidemiology University of North Carolina at Chapel.
Download ReportTranscript A workshop introducing doubly robust estimation of treatment effects Michele Jonsson Funk, PhD UNC/GSK Center for Excellence in Pharmacoepidemiology University of North Carolina at Chapel.
A workshop introducing doubly robust estimation of treatment effects Michele Jonsson Funk, PhD UNC/GSK Center for Excellence in Pharmacoepidemiology University of North Carolina at Chapel Hill Conflict of Interest Statement Macro development funded by the Agency for Healthcare Research and Quality via a supplemental award to the UNC CERTs (U18 HS10397-07S1) Additional support from the UNC/GSK Center for Excellence in Pharmacoepidemiology and Public Health. No potential conflicts of interest with respect to this work. 2 Regression models assume that… The parametric form is correct. We have included correct predictors. Should we use logistic regression, or logbinomial? Should we really include age in this model? Those predictors have been specified correctly. Should age be coded continuously or in 10 year categories? Is there an interaction with race? What about higher order terms? Etc… 3 What if the model is wrong? Lunceford & Davidian, Stat Med, 2004 Omit a true confounder (extreme example) True relationships known (simulated data) Vary associations between factor – outcome Confounder – exposure Risk 4 ML outcome regression: false model %bias 0 -10 -20 -11 -18 -15 -21 -28 -35 -30 -40 Str Mod None Risk factor – outcome assn Lunceford & Davidian, Stat Med, 2004 5 Doubly robust (DR) estimator: false model for outcome regression -1 %bias 0 -1 -1 -1 -1 -1 -10 -20 -30 -40 Str Mod None Risk factor – outcome assn Lunceford & Davidian, Stat Med, 2004 6 ML outcome regression: true model CI Coverage 100 95 90 94.7 95 Str Mod 94.8 96.4 85 80 None Risk factor – outcome assn Lunceford & Davidian, Stat Med, 2004 7 DR: true models for propensity score & outcome regression CI Coverage 100 95 90 95.9 95.6 94.5 94.3 94.9 95.6 Str Mod None 85 80 Risk factor – outcome assn Lunceford & Davidian, Stat Med, 2004 8 ML outcome regression: false model CI Coverage 100 95 90 85 80 0 0 Str 0 0 Mod 0 0 None Risk factor – outcome assn Lunceford & Davidian, Stat Med, 2004 9 DR: true model for propensity score & false model for outcome regression CI Coverage 100 95 90 95.7 95.2 93.2 94.4 93.9 Str Mod None 85 95 80 Risk factor – outcome assn Lunceford & Davidian, Stat Med, 2004 10 Doubly robust (DR) estimation from 30,000 feet Robins & colleagues recognized the doubly robust property in mid-90’s Combines standardization (or reweighting) with regression Part of the family of methods that includes propensity scores and inverse probability weighting 11 Conceptual description Doubly robust (DR) estimation uses two models: Propensity score model for the confounder - exposure (or treatment) relationship Outcome regression model for the confounder – outcome relationship, under each exposure condition These two stages can use: different subsets of covariates, and different parametric forms. If either model is correct, then the DR estimate of treatment effect is unbiased. 12 Two stages Risk factors (potential confounders) Exposure (Treatment) Outcome 13 Causal effect of interest Comparing counterfactual scenarios E(Y1): Whole population treated (exposed) vs. E(Y0): Whole population untreated (unexposed) Average causal effect of treatment – E(Y0) : difference E(Y1) / E(Y0) : ratio E(Y1) In non-randomizes studies, the unexposed may not fairly reflect what would have happened to the exposed had they been unexposed (confounding) 14 Doubly robust estimator Y: outcome Z: binary treatment (exposure) X: baseline covariates (confounders plus other prognostic factors) e(X,β): model for the true propensity score m0(X,α0) and m1(X,α1): regression models for true relationship between covariates and the outcome within each strata of treatment Causal effect of interest (deltaDR): difference in mean response if everyone in the population received treatment versus everyone receiving no treatment; E(Y1)-E(Y0). ΔDR = E(Y1) - E(Y0) Adapted from Davidian M, DR Presentation, 2007 15 Doubly robust estimator E(Y1): average popn response with treatment / exposure Adapted from Davidian M, DR Presentation, 2007 16 Average population response with treatment (μ1,DR) IPTW Estimator Adapted from Davidian M, DR Presentation, 2007 “Augmentation” 17 True PS model; false regression model (I) Propensity score model Adapted from Davidian M, DR Presentation, 2007 Regression model 18 True PS model; false regression model (II) Assuming no unmeasured confounders ! Adapted from Davidian M, DR Presentation, 2007 19 False PS model; true regression model (I) Propensity score model Adapted from Davidian M, DR Presentation, 2007 Regression model 20 False PS model; true regression model (II) Assuming no unmeasured confounders ! Adapted from Davidian M, DR Presentation, 2007 21 Overly simplified statistics ΔDR = [E(Y1) + junk] - [E(Y0) + junk] Where junk = 0 if either the propensity score or the regression model is true… ΔDR = E(Y1) - E(Y0) Adapted from Davidian M, DR Presentation, 2007 22 Standard errors Option 1: Sandwich estimator Option 2: Bootstrap Adapted from Davidian M, DR Presentation, 2007 23 Simulation findings Bang & Robins 2005 N=500, 1000 iterations False propensity score model 1 of 4 true predictors of tx 1 ‘noise’ variable, independent of tx False outcome regression model Omit one risk factor, an higher order term and an interaction term 24 Bias under false models Analysis True Method Model(s) PS False Model PS OR Both -0.01 0.86 OR 0.00 DR 0.00 0.00 -0.09 0.92 H Bang & JM Robins, Biometrics (2005). -1.56 25 Variance under false models Analysis True Model False Model PS OR PS 0.21 0.15 OR 0.07 DR 0.09 0.08 0.28 H Bang & JM Robins, Biometrics (2005). Both 0.07 0.15 26 Recapping L&D simulations Compare performance of propensity score analysis, IPW, outcome regression (OR) and DR Omit a true confounder (extreme example) True relationships known (simulated data) Vary associations between factor – outcome Confounder – exposure Risk Vary sample size 27 If all models are true… Bias <3% for all methods except for PS analysis using strata (due to residual confounding) Variance similar in general VarOR < VarDR (slightly) if confounder-exp relationship is strong VarDR < VarIPW If OR model is right, most efficient. But we have no way of knowing whether or not it’s right. Lunceford & Davidian, Stat Med, 2004 28 If outcome regression model is false… Bias o Efficiency o o o DR nearly as efficient as correct model except when conf-exp relationship strong DR always more efficient than IPW Confidence interval coverage o DR always <1%; OR biased by 10-20% in most scenarios DR coverage nominal ML coverage poor Adding risk factors to PS model improves precision If both are nearly right (only a little wrong), bias is small Lunceford & Davidian, Stat Med, 2004 29 Discussion If method offers some protection against model misspecification, why isn’t it being used by pharmacoepidemiologists? 30 SAS macro for DR estimation Objectives Facilitate wider use of DR estimation Improve performance by implementing sandwich estimator for SEs Enhance usability by following SAS conventions Provide user with relevant diagnostic details 31 http://www.harryguess.unc.edu SAS macro for doubly robust estimation including documentation Dataset for sample analyses (1.7MB, optional) 32 Running the DR macro By design, the DR macro uses common SAS® syntax for specifying the source dataset, variables for modeling, and additional options: %dr(%str(options data=SAS-data-set descending; wtmodel exposure = x y z / method=dr dist=bin showcurves; model outcome = x y z / dist=n ; ) ); 33 Running the DR macro %dr(%str(options data=SAS-data-set descending; wtmodel exposure = x y z / method=dr dist=bin showcurves; model outcome = x y z / dist=n; ) ); 34 Running the DR macro %dr(%str(options data=SAS-data-set descending; wtmodel exposure = x y z / method=dr dist=bin showcurves; model outcome = x y z / dist=n; ) ); 35 Running the DR macro %dr(%str(options data=SAS-data-set descending; wtmodel exposure = x y z / method=dr dist=bin showcurves; model outcome = x y z / dist=n; ) ); 36 DR macro: output Propensity score (wtmodel) results Descriptive statistics for weights Graph of propensity score curves by exposure status Reweighted regression model among the unexposed (dr0) Reweighted regression model among the exposed (dr1) Doubly robust estimate and standard error 37 DR macro: output average response had all been unexposed, adjusted for risk factors n used in the analysis. usedobs<totalobs due to missing data or use of common support option average response had all been exposed, adjusted for risk factors dr1 – dr0; difference in mean response for continuous outcome; risk difference for dichotomous outcome n in dataset Obs 1 totalobs 100000 SE of deltaDR usedobs dr0 79292 .005546853 dr1 0.034117 deltadr se 0.028570 .002026204 38 Example analysis CVD Outcomes Continuous: CVD score (i.e. LDL) Binary: acute MI Exposure (treatment): statin use (yes/no) 50% of population exposed 10 covariates (5 continuous, 5 binary) Data are simulated, so true relationships among exposure, covariates & outcome are known 39 Example analysis %dr(%str(options data=final descending; wtmodel statin=hs smk hxcvd black age bmi exer chol income / method=dr dist=bin showcurves common_support=.99; model cvdscore=hs female smk hxcvd age age2 bmi bmi2 exer chol / dist=n; )); 40 Propensity scores from ‘showcurves’ option Unexposed 4. 0 0 3. 5 3. 0 P e r c e n t 2. 5 2. 0 1. 5 1. 0 0. 5 0 4. 0 Exposed 3. 5 1 3. 0 P e r c e n t 2. 5 2. 0 1. 5 1. 0 0. 5 0 - 0. 09 - 0. 01 0. 07 0. 15 0. 23 0. 31 0. 39 0. 47 0. 55 0. 63 0. 71 0. 79 0. 87 0. 95 1. 03 E s t i ma t e d P r o b a b i l i t y Propensity Score 41 Results from sample analysis Effect Estimates True Crude Maximum likelihood Doubly robust PS model Outcome model Correct Correct Correct Incorrect Incorrect Correct Incorrect Incorrect Result -1.099 1.869 -1.089 %bias SE 270.0% 0.089 0.9% 0.023 -1.102 -1.117 -0.3% 0.025 -1.7% 0.089 -1.093 0.397 0.5% 0.022 136.1% 0.049 42 Validation: simulation methods Draw random sample (n) from simulated population Vary n from 100 to 5000 Estimate doubly robust effect of treatment and standard error Repeat 1000 times 43 Continuous outcome n D R Estim a te %b ia s S D (D R ) m e a n S E S D (d r)/m e a n S E C I co ve ra g e T ru e R D = 0 rm i1a 5000 0.01 0.8% 0.036 0.036 1.00 94.6 1000 0.01 0.9% 0.083 0.081 1.03 94.5 500 0.00 0.2% 0.119 0.113 1.05 93.7 100 0.02 2.1% 0.312 0.245 1.27 87.3 5000 -0.40 0.3% 0.035 0.036 0.98 95.4 1000 -0.40 0.9% 0.078 0.079 0.98 95.9 500 -0.41 0.0% 0.118 0.113 1.04 94.4 100 -0.41 -0.6% 0.299 0.246 1.21 91.2 5000 -1.10 -0.3% 0.035 0.035 1.01 94.6 1000 -1.10 -0.1% 0.081 0.078 1.03 94 500 -1.10 -0.3% 0.113 0.111 1.01 94.6 100 -1.09 0.9% 0.297 0.239 1.24 89.5 5000 -1.61 0.0% 0.034 0.035 0.98 95.6 1000 -1.61 -0.1% 0.082 0.078 1.05 93.6 500 -1.61 -0.1% 0.114 0.109 1.05 93.1 100 -1.62 -0.5% 0.318 0.246 1.30 87 T ru e R D = -0.41 rm i2a T ru e R D = -1.10 rm i3a T ru e R D = -1.61 rm i4a 44 Continuous outcome n D R Estim a te %b ia s S D (D R ) m e a n S E S D (d r)/m e a n S E C I co ve ra g e T ru e R D = 0 rm i1a 5000 0.01 0.8% 0.036 0.036 1.00 94.6 1000 0.01 0.9% 0.083 0.081 1.03 94.5 500 0.00 0.2% 0.119 0.113 1.05 93.7 100 0.02 2.1% 0.312 0.245 1.27 87.3 5000 -0.40 0.3% 0.035 0.036 0.98 95.4 1000 -0.40 0.9% 0.078 0.079 0.98 95.9 500 -0.41 0.0% 0.118 0.113 1.04 94.4 100 -0.41 -0.6% 0.299 0.246 1.21 91.2 5000 -1.10 -0.3% 0.035 0.035 1.01 94.6 1000 -1.10 -0.1% 0.081 0.078 1.03 94 500 -1.10 -0.3% 0.113 0.111 1.01 94.6 100 -1.09 0.9% 0.297 0.239 1.24 89.5 5000 -1.61 0.0% 0.034 0.035 0.98 95.6 1000 -1.61 -0.1% 0.082 0.078 1.05 93.6 500 -1.61 -0.1% 0.114 0.109 1.05 93.1 100 -1.62 -0.5% 0.318 0.246 1.30 87 T ru e R D = -0.41 rm i2a T ru e R D = -1.10 rm i3a T ru e R D = -1.61 rm i4a 45 Continuous outcome n D R Estim a te %b ia s S D (D R ) m e a n S E S D (d r)/m e a n S E C I co ve ra g e T ru e R D = 0 rm i1a 5000 0.01 0.8% 0.036 0.036 1.00 94.6 1000 0.01 0.9% 0.083 0.081 1.03 94.5 500 0.00 0.2% 0.119 0.113 1.05 93.7 100 0.02 2.1% 0.312 0.245 1.27 87.3 5000 -0.40 0.3% 0.035 0.036 0.98 95.4 1000 -0.40 0.9% 0.078 0.079 0.98 95.9 500 -0.41 0.0% 0.118 0.113 1.04 94.4 100 -0.41 -0.6% 0.299 0.246 1.21 91.2 5000 -1.10 -0.3% 0.035 0.035 1.01 94.6 1000 -1.10 -0.1% 0.081 0.078 1.03 94 500 -1.10 -0.3% 0.113 0.111 1.01 94.6 100 -1.09 0.9% 0.297 0.239 1.24 89.5 5000 -1.61 0.0% 0.034 0.035 0.98 95.6 1000 -1.61 -0.1% 0.082 0.078 1.05 93.6 500 -1.61 -0.1% 0.114 0.109 1.05 93.1 100 -1.62 -0.5% 0.318 0.246 1.30 87 T ru e R D = -0.41 rm i2a T ru e R D = -1.10 rm i3a T ru e R D = -1.61 rm i4a 46 Dichotomous outcome n D R Estim a te %b ia s S D (D R ) m e a n S E S D (d r)/m e a n S E C I co ve ra g e T ru e R D = 0.0000 m i1 5000 -0.002 -0.2% 0.008 0.008 0.98 96.0 1000 -0.001 -0.1% 0.019 0.018 1.10 92.4 500 0.000 0.0% 0.028 0.024 1.16 91.2 100 0.001 0.1% 0.078 0.040 1.93 68.5 5000 -0.025 -9.3% 0.009 0.008 1.03 93.7 1000 -0.024 -4.2% 0.020 0.018 1.08 92.9 500 -0.023 -2.6% 0.030 0.025 1.18 90.8 100 -0.016 31.4% 0.075 0.041 1.85 69.5 5000 -0.068 -2.0% 0.008 0.008 1.01 94.6 1000 -0.069 -2.7% 0.019 0.018 1.05 93.0 500 -0.069 -2.9% 0.027 0.025 1.06 92.6 100 -0.061 8.8% 0.074 0.043 1.74 72.1 5000 -0.098 -1.0% 0.009 0.008 1.04 93.6 1000 -0.099 -2.3% 0.018 0.018 1.00 94.0 500 -0.097 0.2% 0.029 0.025 1.13 91.6 100 -0.088 9.0% 0.074 0.043 1.70 73.7 T ru e R D = -0.0228 m i2 T ru e R D = -0.0670 m i3 T ru e R D = -0.0970 m i4 47 Dichotomous outcome n D R Estim a te %b ia s S D (D R ) m e a n S E S D (d r)/m e a n S E C I co ve ra g e T ru e R D = 0.0000 m i1 5000 -0.002 -0.2% 0.008 0.008 0.98 96.0 1000 -0.001 -0.1% 0.019 0.018 1.10 92.4 500 0.000 0.0% 0.028 0.024 1.16 91.2 100 0.001 0.1% 0.078 0.040 1.93 68.5 5000 -0.025 -9.3% 0.009 0.008 1.03 93.7 1000 -0.024 -4.2% 0.020 0.018 1.08 92.9 500 -0.023 -2.6% 0.030 0.025 1.18 90.8 100 -0.016 31.4% 0.075 0.041 1.85 69.5 5000 -0.068 -2.0% 0.008 0.008 1.01 94.6 1000 -0.069 -2.7% 0.019 0.018 1.05 93.0 500 -0.069 -2.9% 0.027 0.025 1.06 92.6 100 -0.061 8.8% 0.074 0.043 1.74 72.1 5000 -0.098 -1.0% 0.009 0.008 1.04 93.6 1000 -0.099 -2.3% 0.018 0.018 1.00 94.0 500 -0.097 0.2% 0.029 0.025 1.13 91.6 100 -0.088 9.0% 0.074 0.043 1.70 73.7 T ru e R D = -0.0228 m i2 T ru e R D = -0.0670 m i3 T ru e R D = -0.0970 m i4 48 Dichotomous outcome n D R Estim a te %b ia s S D (D R ) m e a n S E S D (d r)/m e a n S E C I co ve ra g e T ru e R D = 0.0000 m i1 5000 -0.002 -0.2% 0.008 0.008 0.98 96.0 1000 -0.001 -0.1% 0.019 0.018 1.10 92.4 500 0.000 0.0% 0.028 0.024 1.16 91.2 100 0.001 0.1% 0.078 0.040 1.93 68.5 5000 -0.025 -9.3% 0.009 0.008 1.03 93.7 1000 -0.024 -4.2% 0.020 0.018 1.08 92.9 500 -0.023 -2.6% 0.030 0.025 1.18 90.8 100 -0.016 31.4% 0.075 0.041 1.85 69.5 5000 -0.068 -2.0% 0.008 0.008 1.01 94.6 1000 -0.069 -2.7% 0.019 0.018 1.05 93.0 500 -0.069 -2.9% 0.027 0.025 1.06 92.6 100 -0.061 8.8% 0.074 0.043 1.74 72.1 5000 -0.098 -1.0% 0.009 0.008 1.04 93.6 1000 -0.099 -2.3% 0.018 0.018 1.00 94.0 500 -0.097 0.2% 0.029 0.025 1.13 91.6 100 -0.088 9.0% 0.074 0.043 1.70 73.7 T ru e R D = -0.0228 m i2 T ru e R D = -0.0670 m i3 T ru e R D = -0.0970 m i4 49 Caveats SEs conservative when sample size is small; bootstrapping may be used in this case to get more appropriate SEs Macro only provides difference estimates (not RR or OR) for now Exposure must be dichotomous; outcome must be continuous or dichotomous (time-to-event analysis not supported) Some SAS conventions not recognized within the macro code where and class statements not recognized interaction terms and higher order polynomials must be created in a prior data step 50 Practical considerations How to choose which covariates to include? Good question. Based on simulations from PS literature Include all risk factors for outcome May omit predictors of tx that do not affect outcome 51 Practical considerations What to do with estimates from various models that differ? Effect Estimates Crude Maximum likelihood Propensity score Doubly robust III. SAS Macro Result 1.90 -1.09 -1.50 -1.12 %bias ? ? ? ? SE 0.089 0.023 0.050 0.024 52 Practical considerations What sort of diagnostics should be checked? Potentially influential obs with extreme PS values ‘common_support’ option in SAS macro Distribution of PS scores stratified by treatment / exposure group ‘showcurves’ option in SAS macro 53 Checking PS distribution Strata 1 2 3 4 5 6 Tx=0 Tx=1 0 0.5 Propensity score 1 54 Checking PS distribution Strata 1 2 3 4 5 6 Tx=0 Tx=1 0 0.5 Propensity score 1 55 Checking PS distribution Strata 1 2 3 4 5 6 Tx=0 Tx=1 0 0.5 Propensity score 1 56 Limitations DR estimation is not a panacea for unmeasured confounding. Recall- ‘junk’ only reduces to 0 with assumption of no unmeasured confounders One of the models must be correct for the estimator to be unbiased Bang & Robins suggest that it will be minimally biased if both models are nearly right… Standard errors tend to be slightly larger compared to a single correctly specified regression model Explaining DR estimation in your methods section could be interesting… 57 Applications DR estimation potentially valuable for comparative effectiveness studies, and in particular for head-to-head comparisons of treatment effectiveness or adverse events from observational data when RCTs can’t or won’t be done... for ethical reasons, for economic reasons, for reasons of rare or late-effect outcomes, or for reasons of the need to conduct faster analyses of possible sentinel events 58 Extensions Missing data Incomplete follow-up in RCTs Longitudinal marginal structural models Goodness of fit test? 59 Summary Observational studies of treatment effects depend on statistical models to disentangle causal effects from confounding We can never be certain that the statistical model we have chosen is correct DR estimate unbiased if at least one of the two component models is right and therefore provides some protection against model misspecification The ‘price’ of double robustness is slightly larger standard errors than a single correctly specified regression model Assumption of no unmeasured confounders required 60 References Bang, H. & J.M. Robins: Doubly-robust estimation in missing data and causal inference models. Biometrics 2005, 61, 962–973. Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine 23, 2937–2960. Robins, J. M. (2000). Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association Section on Bayesian Statistical Science, 6–10. Robins, J. M., Rotnitzky, A., and Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866. Rotnitzky, A., Robins, J. M., and Scharfstein, D. O. (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association 93, 1321–1339. Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association 94, 1096–1120 (with Rejoinder, 1135–1146). Van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. New York: Springer-Verlag. 61 Acknowledgements Collaborators on the development of the SAS macro: Chris Wiesen, PhD, Odum Institute for Research in Social Science, University of North Carolina, Chapel Hill, NC Daniel Westreich, MSPH, Department of Epidemiology, University of North Carolina, Chapel Hill, NC Marie Davidian, PhD, Department of Statistics, North Carolina State University, Raleigh, NC 62 Acknowledgements (II) Agency for Healthcare Research and Quality Supplemental Award to the UNC CERTs (U18 HS10397-07S1) UNC/GSK Center for Excellence in Pharmacoepidemiology and Public Health Kevin Anstrom, Lesley Curtis, Brad Hammill, and Rex Edwards from the Duke CERTs team for valuable feedback on the alpha version. Thanks to students from UNC’s EPID 369/730, a causal modeling course, for valuable feedback on the beta version. Presented in memory of Harry Guess, MD, PhD, 1940-2006, who co-authored the initial proposal to develop a SAS macro for doubly robust estimation. 63 Contact Information Michele Jonsson Funk, PhD Research Assistant Professor Department of Epidemiology University of North Carolina Chapel Hill NC 27599-7521 [email protected] 919-966-8431 (ph) 919-843-3120 (fax) http://www.harryguess.unc.edu 64 Questions & Discussion 65