SAS Tips & Tricks

Download Report

Transcript SAS Tips & Tricks

SAS Tips & Tricks
Brought to you by your fellow
MASUG members
Jun Tang, Data Analyst, MS
 UT Health Science center
 Missing Data in Multivariate Analysis
Missing Data in Multivariate Analysis
In multivariate analysis, we
encountered this: 76 of 306
observations in data set
LIB.FILENAME omitted due to
missing values. It’s 25% of the
sample. Sample size is drastically
reduced.
Missing Data in Multivariate Analysis
To solve this problem, we consider two situations:
1. Do not use Multivariate Analysis, in stead, use Univariate Test. This will not
affect the sample size of those non-missing value variables.
2. Example: MANOVA change to TTEST.
proc anova;
class AAMC;
model ratevar1 ratevar2 ratevar3 …=AAMC;
manova h=AAMC/printe printh;
run;
Change to:
proc ttest data=lib.filename;
class AAMC;
VAR ratevar1 ratevar2 ratevar3 …;
run;
Missing Data in Multivariate Analysis
To replace the missing data :
Mean Estimation;
Hot-Deck imputation;
Predict missing values from Regression;
Multiple Imputation (which is still in experimental form in Release
8.1 of the SAS system).
SAS procedures for doing MI:
a. PROC MI creates Multiple Imputed data sets;
b. PROC MIANALYZE combines results after analysis.
Example:
proc MI data=lib.filename <seed=37851> <mu0=50 10 180> out=outmi;
VAR ratevar1 ratevar2 ratevar3…;
run;
Refs: 1. Kim Chantala and C.Suchindran, Multiple Imputation for missing data;
2. www.ats.ucla.edu/stat/sas/v8/miv802.pdf, chapter 9, the MI procedure;
Shelly Lensing, Sr. Biostatistician II
Deqing Pei, Statistical Analyst II
 St. Jude Children's Research Hospital
 Missing Readings Without a Missing
Record
Missing Readings Without a Missing Record
 This is a problem that just came up:
 We needed to group each set of readings
(3h, 7h, 23h, 44h) for each ID to indicate
group 1, 2, etc. over time.
 Some readings were missing, but there was
no missing record
Missing Readings Without a Missing Record
data one;
input id sample $ 10-18 mtx_con @37 hdmtx_date mmddyy8.;
sampnum=compress(sample,'MTXHR')+0;
format hdmtx_date mmddyy8.;
datalines; *** example data;
11111
11111
11111
11111
11111
11111
11111
11111
22222
22222
22222
22222
22222
22222
22222
22222
22222
22222
22222
22222
33333
33333
33333
33333
33333
;
run;
MTX 3HR
MTX 7HR
MTX 23HR
MTX 44HR
MTX 3HR
MTX 44HR
MTX 7HR
MTX 23HR
MTX 3HR
MTX 7HR
MTX 23HR
MTX 7HR
MTX 23HR
MTX 3HR
MTX 7HR
MTX 23HR
MTX 44HR
MTX 7HR
MTX 44HR
MTX 44HR
MTX 3HR
MTX 23HR
MTX 23HR
MTX 3HR
MTX 44HR
81.29
22.07
2.24
1.49
92.16
0.18
24.27
3.34
127.22
29.73
0.65
45.00
1.88
132.48
37.96
1.53
0.13
85.71
30.01
35.01
65.02
34.13
24.13
74.23
24.06
11/20/95
11/20/95
11/21/95
11/22/95
01/19/96
01/21/96
02/20/96
02/21/96
12/17/94
12/17/94
12/18/94
12/28/94
12/29/94
02/06/95
02/06/95
02/07/95
02/08/95
02/09/95
02/11/95
02/13/95
02/13/95
02/14/95
02/15/95
02/16/95
02/17/95
Missing Readings Without a Missing Record
proc sort data=one; *** make sure data are in order;
by id hdmtx_date sampnum;
run;
data two;
set one;
by id;
lagsamp=lag(sampnum);
* Uses logic of when hour-based measurements should occur relative to each other;
* to find first record of each set;
if sampnum=3 then first=1;
if sampnum=7 and lagsamp ne 3 then first=1;
if sampnum=23 and lagsamp not in(7,3) then first=1;
if sampnum=44 and lagsamp not in(23,7,3) then first=1;
if first.id then do;
* restarts group number with each new id;
group=0;
end;
retain group;
if first=1 then group+1; * increments group number for each grouping;
run;
proc print;
run;
Vaibhav Gardé , Senior Marketing
Analyst
 FedEx Services - Marketing, Planning and
Analysis Group
 Interactive Mode
– Beats the Batch Mode in most cases
I like to use the SAS Interactive Mode on UNIX. It is almost as convenient to use
as Windows, and my code gets neatly color coded! This makes it easy to catch
minor error in syntax – like forgetting a quote, semi-colon, etc.
In the interactive mode, it is quite easy to see the exact location, size, modification
date, etc. of all your tables.
You can even right click to open (view) Oracle tables, SAS datasets to see the
records contained therein. Much easier than the usual UNIX SQL options! (Can
also delete, copy, save, etc.)
You can see a continuously visible log file (again color coded) to monitor the
progress/success of your program. In case there is an error, it shows up in RED.
There are other benefits of using the Interactive mode, but I have highlighted just
these few. Hope you find it worthwhile to check it out!
Alan Teal, Chief, Technical
Support Branch
 Defense Contract Audit Agency
 Macro to Document Date/Time of
Processing
Macro to Document Date/Time
of Processing
 Purpose: It is a good practice to document the
date and time when a SAS job was run, especially
if the same job is often run but with different data
sets.
 The following macro writes the date and time to
the SAS log.
 It can also be used to determine how long each
Data step or PROC processed, in order to evaluate
which steps might be made more efficient.
Macro to Document Date/Time
of Processing
SAS Log Shown Below:
1
* TheTimeIs.SAS
2
3
4
5
6
7
8
9
10
11
12
Writes the date and time to the SAS Log to
document processing.;
* Created 03-25-2003, Alan Teal;
%MACRO Tim;
DATA _Null_;
TheTimeIs=Datetime();
PUT TheTimeIs= DateTime20.;
%MEND Tim;
%Tim
RUN;
TheTimeIs=25MAR2003:12:36:47
NOTE: DATA statement used:
real time
0.32 seconds
When the %Tim macro is called TheTimeIs (date/time) is written to
the SAS Log, as shown above.
The Unknown SAS Instructor
 I just have one comment to make…
I just have one comment to make…
/* Here’s an easy way to comment out a chunk of code: */
%macro skip;
data new;
set old;
.
.
.
run;
%mend skip;
/* And the best part is, you can use it as many times as you
want within the same program! */
Stephanie Thompson, Sr.
Merchandising Analyst
 AutoZone
 Extra rsubmits
Extra rsubmits
 Ever forget to remove an extra rsubmit
from your code and it crashes?
 Keep them there and use them at will
with…...
Extra rsubmits
rsubmit;
proc sql;
create table payprelim_vp as
(SELECT VP, sum(purchase_cost) as p_cost,
sum(inventory_cost) as i_cost
FROM payprelim_t
group by vp);
*rsubmit;
data q.payables_vp;
set payprelim_vp;
payables = p_cost / i_cost;
run;
proc print data = q.payables_vp;
format purchase_cost dollar12. inventory_cost
dollar12. payables percent6.2;
run;
Higlight from after the ‘*’ to the end and submit.
Walter Smith, Marketing Project
Manager
 WorldWide Forecasting, FedEx
 Dates and Fixing Capitalization
Creating SAS Dates From Character
Data
 Often date data is stored in files or tables as
character data and is more useful when
converted to SAS date values (number of
days since Jan 1, 1960)
Sample Code
*-- read in some dates as character --;
data chardates;
infile cards;
input yyyymm $6. ;
cards;
200206
200207
200208
200303
run;
*---------- create SAS dates ----------;
data sasdates;
set chardates;
yy = input( substr( yyyymm , 1 , 4 ) , 4. );
mm = input( substr( yyyymm , 5 ) , 2. );
sasdate = mdy( mm , 1 , yy );
date9 = sasdate;
mmddyy = sasdate;
attrib sasdate label='Numeric Date'
format=5.;
attrib date9 label='Displayed As Date9'
format=date9.;
attrib mmddyy label='Displayed As
mmddyy' format=mmddyy10.;
keep yyyymm sasdate date9 mmddyy;
tt = 1;
put 'tt=' tt 'tt=' tt date9.;
run;
title 'Character & SAS Dates';
proc print; run;
The Result
Character & SAS Dates
Obs
yyyymm
sasdate
1
2
3
4
200206
200207
200208
200303
15492
15522
15553
15765
date9
mmddyy
01JUN2002
01JUL2002
01AUG2002
01MAR2003
06/01/2002
07/01/2002
08/01/2002
03/01/2003
SAS Dates in Macro Variables
 Sometimes it is convenient to have date
values in the macro environment
 Additional manipulations can be done – the
macro function %sysfunc() gives access to
nearly all the datastep functions!
Sample Code & Results
*--- put min & max into macro vars ----;
proc sql noprint;
select put(min(sasdate),5.), max(sasdate)
into :mindate, :maxdate
from sasdates;
quit;
%put >> mindate="&mindate" maxdate="&maxdate";
%let nextmonth = %sysfunc( intnx( month , &maxdate , 1 ) , date9. );
%let sas_date_const = %unquote(%bquote(')&nextmonth%bquote('d));
%put >> nextmonth=&nextmonth sas_date_const=&sas_date_const;
>> mindate="15492" maxdate=" 15765“
>> nextmonth=01APR2003 sas_date_const='01APR2003'd
Capitalizing Proper Names
 Often names of people, places and things in
data is in mixed or case, or worse, IN ALL
CAPITAL LETTERS.
 Names are easier to read when each word of
the name has only the initial letter
capitalized.
Sample Code
*--------- read in some data ---------;
data names;
infile cards;
input fullname $30. ;
cards;
Chris Geith
FaRoUK HiJaZi unusual case
SOUTH KOREA
TARIQ AZIZ
Walter j SMITH
los angeles CA state abbrev
run;
*---- change names to initial caps ----;
data newnames;
set names;
oldname = fullname;
length newname $30;
fullname = lowcase( fullname );
n_wrds = length( compbl( fullname ) )
- length( compress( fullname , ' ' ) ) +1;
do i=1 to n_wrds;
wrd = scan( fullname , i , ' ' );
newwrd =
upcase(substr(wrd,1,1))||substr(wrd,2);
newname = trim(newname)||' '||newwrd;
end;
keep oldname newname;
run;
title 'Names With Initial Capitals';
proc print; run;
Results
Names With Initial Capitals
Obs
1
2
3
4
5
6
oldname
newname
Chris Geith
FaRoUK HiJaZi unusual case
SOUTH KOREA
TARIQ AZIZ
Walter j SMITH
los angeles CA state abbrev
Chris Geith
Farouk Hijazi Unusual Case
South Korea
Tariq Aziz
Walter J Smith
Los Angeles Ca State Abbrev