Transcript Slide 1
Using SAS® to Perform Individual Matching in Design of Case-Control Studies 061-2010 Greg Grandits Jacqueline Neuhaus Division of Biostatistics, University of Minnesota, Minneapolis, MN Case Control Study • Cases are subjects with disease • Controls are subjects without disease • Assess (new) risk factor (RF) for disease • New RF available upon further measurement • Control for standard RF • Match each case with 1 or more controls on other risk factors (e.g. age, gender) Cases Pick control to match case Variables: Age Gender Race Previous disease New RF Pool of Controls Variables: Age Gender Race Previous disease New RF MRFIT PSA Case-Control Study Matched Case-Control Program Algorithm Sort control dataset randomly Select next case Match not found Output (append) case to no match dataset Select next case Select first case from case dataset Read through sorted control dataset to find match Match found If another control needed for case Output (append) case and matched control to match dataset Remove matched control from control dataset No more controls needed for case Matched Case-Control Program Algorithm Sort control dataset randomly Select next case Match not found Output (append) case to no match dataset Select next case Select first case from case dataset Read through sorted control dataset to find match Match found If another control needed for case Output (append) case and matched control to match dataset Remove matched control from control dataset No more controls needed for case Matched Case-Control Program Algorithm Sort control dataset randomly Select next case Match not found Output (append) case to no match dataset Select next case Select first case from case dataset Read through sorted control dataset to find match Match found If another control needed for case Output (append) case and matched control to match dataset Remove matched control from control dataset No more controls needed for case Matched Case-Control Program Algorithm Sort control dataset randomly Select next case Match not found Output (append) case to no match dataset Select next case Select first case from case dataset Read through sorted control dataset to find match Match found If another control needed for case Output (append) case and matched control to match dataset Remove matched control from control dataset No more controls needed for case Matched Case-Control Program Algorithm Sort control dataset randomly Select next case Match not found Output (append) case to no match dataset Select next case Select first case from case dataset Read through sorted control dataset to find match Match found If another control needed for case Output (append) case and matched control to match dataset Remove matched control from control dataset No more controls needed for case Matching Criteria for PSA Study • Age at specimen draw ± 1 year • Same clinical center (1-22) • Control lived as long or longer as case • 63 cases and 63 matched controls Call to Macro %match_cc ( casedata = cases, controldata= controls, matchvar= nclinic age , matchval= 0 1 , fopvar = fopdays) Match exactly on clinic, within one year of age Sort Controls Randomly * Sort control dataset by a random number; proc sql; create table random_controls as select *, ranuni(12345) as random from controls order by random; quit; Select Case Read Through Controls Looking for a Match data match nomatch used ; set active ; * Reads in case; do i = 1 to totobs; *loops through controls; set random_controls point=i nobs=totobs; if abs (nclinic - c_nclinic) <= 0 and abs (age - c_age) <= 1 then do ; if c_fopdays >= fopdays then do; * We have a match!; Output Case and Matched Control * We have a match!; ccstat= 1; output match; * This is the case data, adding var ccstat = 1; * Store control values in variables with same name as case; * Then output again to same dataset; nclinic = c_nclinic; age = c_age; fopdays = c_fopdays; ptid = c_ptid; ccstat= 2; output match; * Output control to USED dataset; output used; stop; * End DATA step since we have a match; end; end; end; * ends I loop; output nomatch; * If I loop is exhausted then no match; run; Remove used control * Remove used control from control dataset; data random_controls; merge random_controls used (in=used); by ptid; if used ne 1; run; Matching Results List of Matched Cases and Controls (First 6) Obs setnumber PTID AGE fopdays nclinic ccstat 1 1 B049403 46 4329 2 1 2 1 B190124 45 8942 2 2 3 2 B111989 52 8565 2 1 4 2 B015594 51 9352 2 2 5 3 B150888 50 8894 2 1 6 3 B003897 49 9450 2 2 Study Results PSA Quartile (ng/ml) RR 95% CI 0.2 - 0.6 1.00 0.7 – 1.0 1.51 0.44 – 5.22 1.1 – 1.8 3.16 0.99 – 10.1 1.9 – 20.7 7.26 1.95 – 27.0 Summary • Program used regularly at Division of Biostatistics • Can be easily modified for more complex matching algorithms • Macro found at: http://www.biostat.umn.edu/~greg-g