Transcript Slide 1

Using SAS® to Perform Individual Matching in
Design of Case-Control Studies
061-2010
Greg Grandits
Jacqueline Neuhaus
Division of Biostatistics, University of Minnesota, Minneapolis, MN
Case Control Study
• Cases are subjects with disease
• Controls are subjects without disease
• Assess (new) risk factor (RF) for disease
• New RF available upon further measurement
• Control for standard RF
• Match each case with 1 or more controls on
other risk factors (e.g. age, gender)
Cases
Pick control to
match case
Variables:
Age
Gender
Race
Previous disease
New RF
Pool of
Controls
Variables:
Age
Gender
Race
Previous disease
New RF
MRFIT PSA Case-Control Study
Matched Case-Control Program
Algorithm
Sort control dataset
randomly
Select next case
Match not found
Output (append) case
to no match dataset
Select next case
Select first case from case
dataset
Read through sorted control
dataset to find match
Match found
If another control needed for
case
Output (append) case
and matched control to
match dataset
Remove matched control
from control dataset
No more
controls
needed for
case
Matched Case-Control Program
Algorithm
Sort control dataset
randomly
Select next case
Match not found
Output (append) case
to no match dataset
Select next case
Select first case from case
dataset
Read through sorted control
dataset to find match
Match found
If another control needed for
case
Output (append) case
and matched control to
match dataset
Remove matched control
from control dataset
No more
controls
needed for
case
Matched Case-Control Program
Algorithm
Sort control dataset
randomly
Select next case
Match not found
Output (append) case
to no match dataset
Select next case
Select first case from case
dataset
Read through sorted
control dataset to find
match
Match found
If another control needed for
case
Output (append) case
and matched control to
match dataset
Remove matched control
from control dataset
No more
controls
needed for
case
Matched Case-Control Program
Algorithm
Sort control dataset
randomly
Select next case
Match not found
Output (append) case
to no match dataset
Select next case
Select first case from case
dataset
Read through sorted control
dataset to find match
Match found
If another control needed for
case
Output (append) case
and matched control to
match dataset
Remove matched control
from control dataset
No more
controls
needed for
case
Matched Case-Control Program
Algorithm
Sort control dataset
randomly
Select next case
Match not found
Output (append) case
to no match dataset
Select next case
Select first case from case
dataset
Read through sorted control
dataset to find match
Match found
If another control needed for
case
Output (append) case
and matched control to
match dataset
Remove matched
control from control
dataset
No more
controls
needed for
case
Matching Criteria
for PSA Study
• Age at specimen draw ± 1 year
• Same clinical center (1-22)
• Control lived as long or longer as case
• 63 cases and 63 matched controls
Call to Macro
%match_cc (
casedata = cases,
controldata= controls,
matchvar= nclinic age ,
matchval= 0 1 ,
fopvar = fopdays)
Match exactly on
clinic, within one
year of age
Sort Controls Randomly
* Sort control dataset by a random number;
proc sql;
create table random_controls as select *,
ranuni(12345) as random from controls
order by random;
quit;
Select Case
Read Through Controls
Looking for a Match
data match nomatch used ;
set active ; * Reads in case;
do i = 1 to totobs; *loops through controls;
set random_controls point=i nobs=totobs;
if abs (nclinic - c_nclinic) <= 0 and
abs (age - c_age) <= 1 then do ;
if c_fopdays >= fopdays then do;
* We have a match!;
Output Case and Matched Control
* We have a match!;
ccstat= 1;
output match; * This is the case data, adding var ccstat = 1;
* Store control values in variables with same name as case;
* Then output again to same dataset;
nclinic = c_nclinic;
age = c_age;
fopdays = c_fopdays;
ptid = c_ptid;
ccstat= 2;
output match;
* Output control to USED dataset;
output used;
stop; * End DATA step since we have a match;
end;
end;
end; * ends I loop;
output nomatch; * If I loop is exhausted then no match;
run;
Remove used control
* Remove used control from control dataset;
data random_controls;
merge random_controls used
(in=used); by ptid;
if used ne 1;
run;
Matching Results
List of Matched Cases and Controls (First 6)
Obs setnumber
PTID
AGE fopdays nclinic ccstat
1
1 B049403
46
4329
2
1
2
1 B190124
45
8942
2
2
3
2 B111989
52
8565
2
1
4
2 B015594
51
9352
2
2
5
3 B150888
50
8894
2
1
6
3 B003897
49
9450
2
2
Study Results
PSA Quartile
(ng/ml)
RR
95% CI
0.2 - 0.6
1.00
0.7 – 1.0
1.51
0.44 – 5.22
1.1 – 1.8
3.16
0.99 – 10.1
1.9 – 20.7
7.26
1.95 – 27.0
Summary
• Program used regularly at Division of
Biostatistics
• Can be easily modified for more complex
matching algorithms
• Macro found at:
http://www.biostat.umn.edu/~greg-g