SAS Macro 101 - University of British Columbia

Download Report

Transcript SAS Macro 101 - University of British Columbia

SAS Macro:
Some Tips for Debugging
Stat Talk @ St. Paul’s Hospital
April 2, 2007

When is SAS Macro usually written?
 same
analyses are repeated for a number of
variables
 same logic search to be performed on a
number of variables
 reports to be generated on regular basis
 passing value(s) from one data step to
another
…

How to find the error(s)?
 Log
file
 Results
 Via
obtained different from expected
“MPRINT” (i.e. options mprint;)
- Any other ways?
Some extra SAS system options:

SYMBOLGEN


MLOGIC


the value of each Macro variable resolves to
keep track of the parameter values, the logic that
drives %DO loops and %IF logic checks
MFILE

similar to MPRINT, this option is used to write out the
resolved macro code (proper SAS code) to a file
Example 1
data stattalk;
input id:$2. age wt ht;
cards;
1 60 75 180
2 25 55 165
3 45 80 170
;
run;
This macro computes the mean of continuous variable from PROC MEANS:
* varlist: List of continuous variables for computation
* nvar: total number of variables for computation
%macro getave(varlist,nvar);
%do i=1 %to &nvar;
%let var=%scan(&varlist,&i,' ');
proc means data=stattalk noprint;
var &var;
output out=&var.out mean=mean;
run;
%end;
%mend getave;
Log output from SYMBOLGEN
SAS Code:
options symbolgen;
%getave(age wt ht,3);
SAS Log File:
:
:
SYMBOLGEN:
SYMBOLGEN:
SYMBOLGEN:
SYMBOLGEN:
SYMBOLGEN:
Macro variable NVAR resolves to 3
Macro variable VARLIST resolves to age wt ht
Macro variable I resolves to 1
Macro variable VAR resolves to age
Macro variable VAR resolves to age
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.AGEOUT has 1 observations and
3 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.07 seconds
cpu time
0.07 seconds
SYMBOLGEN:
SYMBOLGEN:
SYMBOLGEN:
SYMBOLGEN:
Macro variable VARLIST resolves to age wt ht
Macro variable I resolves to 2
Macro variable VAR resolves to wt
Macro variable VAR resolves to wt
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.WTOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.07 seconds
cpu time
0.06 seconds
SYMBOLGEN:
SYMBOLGEN:
SYMBOLGEN:
SYMBOLGEN:
Macro variable VARLIST resolves to age wt ht
Macro variable I resolves to 3
Macro variable VAR resolves to ht
Macro variable VAR resolves to ht
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.HTOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.06 seconds
cpu time
0.05 seconds
Log output from MLOGIC
SAS Code:
options mlogic;
%getave(age wt ht,3);
SAS Log File:
MLOGIC(GETAVE):
MLOGIC(GETAVE):
MLOGIC(GETAVE):
MLOGIC(GETAVE):
Beginning execution.
Parameter VARLIST has value age wt ht
Parameter NVAR has value 3
%DO loop beginning; index variable I; start
value is 1; stop value is 3; by value is 1.
MLOGIC(GETAVE): %LET (variable name is VAR)
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.AGEOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.06 seconds
cpu time
0.06 seconds
MLOGIC(GETAVE): %DO loop index variable I is now 2; loop will
iterate again.
MLOGIC(GETAVE): %LET (variable name is VAR)
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.WTOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.06 seconds
cpu time
0.06 seconds
MLOGIC(GETAVE): %DO loop index variable I is now 3; loop will
iterate again.
MLOGIC(GETAVE): %LET (variable name is VAR)
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.HTOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.06 seconds
cpu time
0.05 seconds
MLOGIC(GETAVE): %DO loop index variable I is now 4; loop will
not iterate again.
MLOGIC(GETAVE): Ending execution.
Log output from MPRINT
SAS Code:
options mprint;
%getave(age wt ht,3);
SAS Log File:
MPRINT(GETAVE):
MPRINT(GETAVE):
MPRINT(GETAVE):
MPRINT(GETAVE):
proc means data=stattalk noprint;
var age;
output out=ageout mean=mean;
run;
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.AGEOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.06 seconds
cpu time
0.06 seconds
MPRINT(GETAVE):
MPRINT(GETAVE):
MPRINT(GETAVE):
MPRINT(GETAVE):
proc means data=stattalk noprint;
var wt;
output out=wtout mean=mean;
run;
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.WTOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.06 seconds
cpu time
0.05 seconds
MPRINT(GETAVE):
MPRINT(GETAVE):
MPRINT(GETAVE):
MPRINT(GETAVE):
proc means data=stattalk noprint;
var ht;
output out=htout mean=mean;
run;
NOTE: There were 3 observations read from the data set
WORK.STATTALK.
NOTE: The data set WORK.HTOUT has 1 observations and 3
variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.06 seconds
cpu time
0.06 seconds
What does MFILE look like?
SAS Code:
filename mprint “~/mfileoutput.sas”;
options mprint mfile;
%getave(age wt ht,3);
SAS Log:
…same log as in MPRINT…
But you will find in a file “mfileoutput.sas”
in your (main) directory!
And it looks like this:
proc means data=stattalk noprint;
var age;
output out=ageout mean=mean;
run;
proc means data=stattalk noprint;
var wt;
output out=wtout mean=mean;
run;
proc means data=stattalk noprint;
var ht;
output out=htout mean=mean;
run;
Example 2 – MLOGIC output for %IF logic check
SAS Code:
%macro sillyeg(catvar);
%if &catvar=1 %then %do;
%put This is a categorical variable;
%end;
%else %if &catvar=0 %then %do;
%put This is not a categorical variable;
%end;
%mend sillyeg;
%sillyeg(0);
SAS Log:
46 %sillyeg(0);
MLOGIC(SILLYEG): Beginning execution.
MLOGIC(SILLYEG): Parameter CATVAR has value 0
MLOGIC(SILLYEG): %IF condition &catvar=1 is FALSE
MLOGIC(SILLYEG): %IF condition &catvar=0 is TRUE
MLOGIC(SILLYEG): %PUT This is not a categorical variable
This is not a categorical variable
MLOGIC(SILLYEG): Ending execution.
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time
0.35 seconds
cpu time
0.26 seconds
Example 3 – combination of different options can be helpful!
Common Mistakes







IF or %IF; DO or %DO
MISSING vs. NULL VALUE
SCAN, %SCAN, %QSCAN
SUBSTR, %SUBSTR, %QSUBSTR
%STR, %NRSTR, %BQUOTE, %NRBQUOTE
Doing math in MACRO environment
Range comparison
IF or %IF; DO or %DO

%IF (and %DO) can only be used within a
MACRO declaration, to control what code
is written or how the logic is evaluated
within the MACRO.

IF (and DO) statement can be used in a
MACRO, but will be executed as part of
DATA step code within the MACRO.
Example 4
Dataset:
data stattalk;
input id:$2. age wt ht;
cards;
1 60 75 180
2 25 55 165
3 45 80 170
;
run;
SAS code:
%macro whatif(condition=gt 50);
data subset;
set stattalk;
%if age &condition %then output;;
run;
proc print data=subset;
run;
%mend whatif;
%whatif;
SAS output:
Obs
id
age
wt
ht
1
2
3
1
2
3
60
25
45
75
55
80
180
165
170
Why did we get such incorrect output??
Macro code is ALWAYS executed before the DATA step is even compiled
AGE in the %IF is not seen as a DATA step variable, but rather as the letters
a-g-e
Since numbers are smaller than letters alphabetically, the letter ‘a’ comes
after 50.
So, an example where both IF and %IF are used in a MACRO….
SAS code:
%macro ifagain(condition=gt 30, print=1);
data subset;
set stattalk;
if age &condition then output;
run;
proc means data=subset %if &print^=1 %then noprint;;
var age;
output out=subset_out mean=mean std=sd;
run;
%if &print>=1 %then %do;
proc print data=subset_out;
run;
%end;
%mend ifagain;
%ifagain;
Missing vs. NULL

In the DATA step, there is no such thing as a
truly NULL value.

Character or numeric variable has a “value” for missing, a single blank
space or a period, respectively.


E.g.) if sex=‘ ‘ then delete; if age=. then delete.
In the MACRO language, there are no
characters used to represent a missing value.
So when a MACRO variable is NULL, it truly has
no value.

E.g.) %if &age=. %then %do; – WRONG!!
%if &gender=“ “ %then %do; – WRONG!!
3 ways to specify NULL in the logic check:
Method 1:
Method 2:
Method 3:
%macro sillycheck(age=);
%macro sillycheck(age=);
%macro sillycheck(age=);
%if &age= %then %do;
%put It works;
%if “&age”=“” %then %do;
%put It works;
%if &age=%str() %then %do;
%put It works;
%end;
%end;
%end;
%else %do;
%else %do;
%else %do;
%put It did not work;
%end;
%mend sillycheck;
%put It did not work;
%end;
%mend sillycheck;
%put It did not work;
%end;
%mend sillycheck;
A side remark:
In MACRO language, everything is TEXT!
SAS code:
SAS LOG:
%macro sillyeg(age=50,sex=F);
%if &age=50 %then %do;
%put Patient is 50 years old;
%end;
%if &sex=“F” %then %do;
%put Female patient;
%end;
%mend sillyeg;
%sillyeg;
SYMBOLGEN: Macro variable AGE resolves to 50
Patient is 50 years old
SYMBOLGEN: Macro variable SEX resolves to F
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA
27513-2414
NOTE: The SAS System used:
real time
0.52 seconds
cpu time
0.41 seconds
SAS code:
SAS LOG:
%macro sillyeg(age=50,sex=F);
%if &age=50 %then %do;
%put Patient is 50 years old;
%end;
%if &sex=F %then %do;
%put Female patient;
%end;
%mend sillyeg;
%sillyeg;
SYMBOLGEN: Macro variable AGE resolves to 50
Patient is 50 years old
SYMBOLGEN: Macro variable SEX resolves to F
Female patient
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA
27513-2414
NOTE: The SAS System used:
real time
0.52 seconds
cpu time
0.41 seconds
SCAN, %SCAN, %QSCAN
In DATA step:
SAS output:
data example;
Obs
string=“XYZ,A*BC&HOS”;
1
string
word1
word2
A
A*BC&HOS
XYZ,A*BC&HOS
word1=scan(string,2);
word2=scan(string,2,’,’);
run;
In MACRO:
SAS Log:
%let hos=SPH;
word1=A
%let string=%nrstr(XYZ,A*B&HOS);
word2=A*BCSPH
%let word1=%scan(&string,2);
Word3=A*BC&HOS
%scan DOES NOT mask
& (and %) as regular text
%let word2=%scan(&string,2,%str(,));
%let word3=%qscan(&string,2,%str(,));
%put word1=&word1;
%put word2=&word2;
%put word3=&word3;
%qscan masks & (and %)
as regular text
SUBSTR, %SUBSTR, %QSUBSTR
SAS Code:
SAS Log:
%let stuff = clinics;
word1=clinics*
%let
string=%nrstr(*&stuff*&dsn*&morestuff
);
word2=&stuff*
%let word1=%substr(&string,2,7);
%let word2=%qsubstr(&string,2,7);
%put word1=&word1;
%put word2=&word2;

Syntax for %SUBSTR and %QSUBSTR is exactly the same as in
SUBSTR in data step

The difference between %SUBSTR and %QSUBSTR:

%SUBSTR does not mask & (and %) as part of the text
%QSUBSTR
treats & (and %) as part of the text
Macro Quoting Functions

Macro language is a character-based language, and is composed of some
of the special characters (e.g. % & ;) or mnemonic (e.g. GE AND LE OR)

Macro quoting functions tells the macro processor to interpret special
characters/mnemonic simply as text

The special characters/mnemonic might require masking are: blank ; ^ ~ , ‘ “
) ( + -- * / < > = | AND OR NOT EQ NE LE LT GE GT IN % & #

The most commonly macro quoting functions are:
%STR, %NRSTR, %BQUOTE, %NRBQUOTE, %SUPERQ

Two types of macro quoting functions:
a) Compilation functions – processor masks the special characters as text in open code or while
compiling a macro. E.g. %STR, %NRSTR
b) Execution functions – processor will first resolve a macro expression and then masks the
special characters in the result as text. E.g. %QUOTE, %NRQUOTE, %BQUOTE,
%NRBQUOTE
Example 5
%macro fileit(infile);
%if %bquote(&infile) NE %then %do;
%let char1 = %bquote(%substr(&infile,1,1));
%if %bquote(&char1) = %str(%') or %bquote(&char1) = %str(%") %then
%let command=FILE &infile;
%else %let command=FILE "&infile";
%end;
%put &command;
%mend fileit;
%fileit(‘stattalk.sas’)

%bquote is used to quote the realization of a macro variable or
expression

%str is used to quote constant value (i.e. right side of logic check)

Unmatched single or double quotation, or unmatched parenthesis
should always be accompanied by % in %str, but no need to add %
in %bquote (B=by itself)
Example 5
data test;
store="Susan's Office Supplies";
call symput('s',store);
run;
%macro readit;
%if %bquote(&s) ne %then %put *** valid ***;
%else %put *** null value ***;
%mend readit;
%readit;
- If you change %BQUOTE to %STR, you will get error message! Try it…
Example 6
SAS Code:
Options ps=36 ls=69 nocenter;
SAS Output:
Authors1: Smith&Jones
data _null_;
call symput(‘authors’,’Smith&Jones’);
call symput(‘macroname’,’%macro test;’);
Authors2: &authors
Authors3: Smith&aa
run;
%let aa=SPH;
Authors4: SmithSPH
%let jones=%nrstr(&aa);
title1 “Authors1: %SUPERQ(authors)”;
title2 “Authors2: %NRSTR(&authors)”;
title3 “Authors3: %NRBQUOTE(&authors)”;
title4 “Authors4: %UNQUOTE(%NRBQUOTE(&authors));
footnote1 “Name of Macro: %SUPERQ(macroname)”;
Name of Macro: %macro test;

%NRSTR – mask & as part of the text during compilation

%NRBQUOTE – resolve the macro variable during execution; if the result
contains &, it will be treated as part of the text

NR = Not Resolved
Doing Math in the Macro Language

%EVAL and %SYSEVALF allow the language to handle arithmetic
operations

%EVAL: only for integer arithmetic

%SYSEVALF: for non-integer arithmetic (e.g. 1.0, .3, 2.)

Error message if %SYSEVALF should be used instead of %EVAL:
Example 7
%let x=5;
%let y=&x+1;
%let z=%EVAL(&x+1);
%let w=%SYSEVALF(&x+1.8);
%put &x &y &z &w;
The %PUT writes the following to the LOG:
5 5+1 6 6.8
Range Comparisons
SAS Code:
data _null_;
do val=-10,-2,2,10;
if -5 le val le 0 then do;
put val " is in the negative range (-5 to 0)";
end;
else if 1 le val le 5 then do;
put val " is in the positive range (1 to 5)";
end;
else put val " is WAY out of range“;
run;
SAS Code:
%macro checkit(val=);
%if -5 le &val le 0 %then %put &val is in the negative range
(-5 to 0);
%else %if 1 le &val le 5 %then %put &val is in the positive
range (1 to 5);
%else %put &val is WAY out of range;
%mend checkit;
SAS Log:
-10 is WAY out of range
-2 is in the negative range (-5 to 0)
2 is in the positive range (1 to 5)
10 is WAY out of range
NOTE: DATA statement used (Total process time):
real time
0.00 seconds
cpu time
0.00 seconds
SAS Log:
182
%checkit(val=-10);
-10 is in the negative range (-5 to 0)
183
%checkit(val=-2);
-2 is in the positive range (1 to 5)
184
%checkit(val=2);
2 is in the positive range (1 to 5)
185
%checkit(val=10);
10 is in the positive range (1 to 5)
%checkit(val=-10);
%checkit(val=-2);
%checkit(val=2);
%checkit(val=10);
????

In DATA step:
if -5 le val le 0 then do; is

interpreted as if -5 le val and val le 0 then do;
In Macro Language:
%if -5 le &val le 0 %then %put &val is in negative range (-5 to 0);
is interpreted as
%if (-5 le &val) le 0 %then %put &val is in the negative range (-5 to 0);
So, if &val=-10, the %if becomes
%if (-5 le -10) le 0 %then …
The comparison will first check if -5 is less than or equal to -10. If it is FALSE, a zero
is returned, and the expression becomes
%if 0 le 0 %then …;
And this comparison is true, and hence it printed “-10 is in the negative range (-5 to 0)
in the LOG file.

In summary, for range comparison in Macro Language, always use a
compound expression (e.g. -5 le &val AND &val le 0)