Transcript More Tales from the Help Desk: Solutions for Simple SAS
This is the Modern World: Simple, Overlooked SAS® Enhancements
Bruce Gilsen Federal Reserve Board June 2009
Introduction
Smaller, less dramatic enhancements fall thru cracks at • Federal Reserve (SAS consultant for 24 years) • SAS-L • Conferences Users still use more cumbersome methods Enhancements from V92, V9, V8, V7, even V6!
Impetus: yymmddn8. in earlier paper
Underutilized enhancements
Write date values in form yyyymmdd Increment date values w/INTNX function Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine Count # times character or substring occurs in a character string Concatenate character strings Sort by numeric part of character values Retrieve DB2 data on z/OS mainframes
1. Write date values in form yyyymmdd
Lots of informats, formats to read/write date values • 9.1.3: 22 date informats, 46 date formats Most common input form: yyyymmdd • YYMMDD8. informat reads dates in form yyyymmdd Most common output form (overwhelmingly): yyyymmdd • YYMMDD8. format writes dates as yy-mm-dd • YYMMDD10. format writes dates as yyyy-mm-dd
Writing dates as yyyymmdd, pre-V8
Some date output
data one; date1 = "18oct2008"d; * 10/18/2008 date value; put date1 yymmdd8.; * writes 08-10-18 ; put date1 yymmdd10.; * writes 2008-10-18 ;
Before Version 8 • PUT: character string of form yyyy-mm-dd • COMPRESS: remove dashes
datetemp = compress(put(date1,yymmdd10.),'-'); put datetemp $8. ; * writes 20081018 ;
Writing dates as yyyymmdd, V8
New format, YYMMDDxw • x: optional separator • w: number of bytes x, one of: B blank
N none
C colon P period D dash S slash
date1 = "18oct2008"d; * 10/18/2008 date value; put date1 yymmddn8.; * writes 20081018 ;
YYMMDDxw format, examples
data _null_ ; date1 = "18oct2008"d; * 10/18/08 date value ; put date1 yymmddb8.; * 08 10 18 ; put date1 yymmddc8.; * 08:10:18 ; put date1 yymmddd10.; * 2008-10-18 ; put date1 yymmddn8.; * 20081018 ; put date1 yymmddp8.; * 08.10.18 ; put date1 yymmdds10.; * 2008/10/18 ; put date1 yymmddn6.; * 081018 ;
2. Increment date values with INTNX
INTNX returns SAS
®
date value incremented by number of intervals (days, months, quarters, years,…) SAS date value: integer, number of days before or after January 1, 1960
December 31, 1959 -1 January 1, 1960 0 January 2, 1960 1 ...
October 18, 2008 16,727
INTNX Syntax (informal, incomplete)
INTNX(
interval
,
start-from
,
increment
<,
alignment
>) interval: unit (years, months, days,...)
start-from
is incremented by start-from: SAS date value to increment increment: # of intervals
start-from
is incremented by alignment: where
start-from
aligned after incrementing. Values: BEGINNING, MIDDLE, END, SAMEDAY (v9). Default: BEGINNING.
Before Version 9: 2 INTNX issues
Alignment BEGINNING (start of period) Leap year handling
INTNX issue 1: alignment to start of period
Increment by 2 days, months, years Unexpected result for months, years
data one; date1 = ‘18oct2008'd; dateplus2day = intnx(‘day',date1,2); dateplus2month = intnx(‘month',date1,2); dateplus2year = intnx(‘year',date1,2); run; Variable Description Expect Actual dateplus2day 10/18/08+2 days 10/20/08 10/20/08 dateplus2month 10/18/08+2 months 12/18/08 12/1/08 dateplus2year 10/18/08+2 years 10/18/10 1/1/10
dateplus2day = intnx(‘day',date1,2);
date1 = ‘18oct2008'd;
• = 17,823, date value for 10/18/2008 Increment by 2 days • First, increment by 2 days • Then, align to start of 10/20/2008 (beginning of interval, DAY), no effect DATEPLUS2DAY = 17,825, date value for 10/20/2008 as expected
dateplus2month = intnx(‘month',date1,2);
date1 = ‘18oct2008'd;
• = 17,823, date value for 10/18/2008 Increment by 2 months • First, increment by 2 months • Then, align to start of 12/2008 (beginning of interval, MONTH) DATEPLUS2MONTH = 17,867, date value for 12/1/2008
dateplus2year = intnx(‘year',date1,2);
date1 = ‘18oct2008'd;
• = 17,823, date value for 10/18/2008 Increment by 2 years • First, increment by 2 years • Then, align to start of 2010 (beginning of interval, YEAR) DATEPLUS2MONTH = 18,263, date value for 1/1/2010
INTNX issue 2: leap years
Example of shifting by years
offset = 2; * number of years to increment; if day(date1) = 29 and month(date1) = 2 and mod(offset,4) ne 0 and mod(offset, 400) ne 0 then date1 = mdy(2, 28, year(date1)+offset); else date1 = mdy(month(date1), day(date1), year(date1)+offset) ;
Need more generalized code for all frequencies
Solution: SAMEDAY
New alignment value in V9 Preserves date value’s alignment in interval after it’s incremented No effect when interval = DAY Use for all intervals except DAY
Solution: SAMEDAY
dateplus2day = intnx(‘day’, date1, 2, “sameday”) • 2 days after 10/18/2008 • Result: 17,825, SAS date value for 10/20/2008 dateplus2month = intnx(‘month’, date1, 2, “sameday”) • 2 months after 10/18/2008 • Result: 17,884, SAS date value for 12/18/2008 dateplus2year = intnx(‘year’, date1, 2, “sameday”) • 2 years after 10/18/2008 • Result: 18,553, SAS date value for 10/18/2010
Some interesting dates
date2 = intnx(‘year’, ’29feb2000’d, 1, “sameday”) • 1 year after 2/29/2000 • Result: 15,034, SAS date value for 2/28/2001 date3 = intnx(‘year’, ’29feb2000’d, 4, “sameday”) • 4 years after 2/29/2000 • Result: 16,130, SAS date value for 2/29/2004 2000, 2004 = leap years, 2001 not leap year
Some interesting dates
date4 = intnx(‘month’, ’31mar2003’d, -1, “sameday”) • 1 month before 3/31/2003 • Result: 15,764, SAS date value for 2/28/2003 date5 = intnx(‘month’, ’31mar2004’d, -1, “sameday”) • 1 month before 3/31/2004 • Result: 16,130, SAS date value for 2/29/2004 2000, 2004 = leap years, 2003 not leap year
SAMEDAY until 9.2: single, non-shifted date intervals only
Following intervals might return wrong answer, no error or warning • multiple (e.g., month2 = two-month interval) • shifted (e.g., month.4 = month interval starting on April 1) • time • datetime Interval = DAY, WEEK, WEEKDAY, TENDAY, SEMIMONTH, MONTH, QTR, SEMIYEAR, YEAR okay Fixed in 9.2
http://support.sas.com/techsup/unotes/SN/016/016184.html
3. Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine
Transport file • File organized in machine-independent format • SAS can read on any operating system • Use to copy SAS data sets from one operating system to another Starting w/ Version 6, two types of transport files • XPORT-style transport files • CPORT-style transport files
XPORT-style transport files: 3 steps
1. On original host, create transport file with data set(s)
libname inn 'directory_with_data_set(s)' ; libname trans xport 'transport_file_to_create'; proc copy in=inn out=trans ; run ;
2. Copy transport file to destination host with ftp command or another method 3. On destination host, transport file ---> SAS data set(s)
libname trans xport 'transport_file' ; libname sasdata1 ‘directory_for_data_sets‘; proc copy in=trans out=sasdata1 ; run ;
CPORT-style transport files: 3 steps
1. On original host, create transport file with data set(s)
libname sasdata1 'directory_with_SAS_data_set(s)'; filename trans 'transport_file_to_create'; proc cport library=sasdata1 file=trans ; run ;
2. Copy transport file to destination host with ftp command or another method 3. On destination host, transport file ---> SAS data set(s)
filename trans 'transport_file' ; libname sasdata1 ‘directory_for_data_sets' ; proc cimport infile=trans library=sasdata1 ; run ;
Version 7 changes and transport files
Version 7 changes • Variable name maximum length: 8 32 • Character variable maximum length: 200 32768 • Variable names display based on “first reference” Transport files • XPORT-style created in Version 6 format • CPORT-style understands Version 7 changes • Always use CPORT-style
XPORT-style transport files, Version 7 and beyond
options validvarname=v6; • Documented in V8, undocumented but works in V9 • Empty data set if any character variable length more than 200, error in log • Variable names longer than 8 characters truncated, unique • Variable names converted to upper case Otherwise • Empty data set if any character variable length more than 200 or variable name longer than 8 characters, error in log • Variable names converted to upper case
4. Count number of times a character or substring occurs in a character string
Typical objective, count number of times • "ab" occurs in " ab 1 ab a344444 ab b555 ab 4“ (4 times) • "a” or “b" occur in " ab 1 aba 344444 abb 555 ab 4“ (10 times) Pre-Version 9: iterative methods (loops) Version 9 • COUNT: # times substring occurs in a character string • COUNTC: # times 1 or more characters occur in a character string
Count occurrences of substring "ab" in "ab1aba344444abb555ab4" data one; char1 = "ab1aba344444abb555ab4"; * character variable to check; * Before Version 9 ; num_ab = 0; * counter for number of occurrences ; loc=index(char1,"ab"); * find 1st occurrence ; do while (loc ne 0); num_ab+1; * count occurrences of "ab" ; char1 = substr(char1,loc+2); * resume search after "ab" ; loc = index(char1, "ab"); * find next occurrence ; end; * Or, a single statement in Version 9 ; num_ab = count(char1,"ab"); * Version 9 ; run;
Count occurrences of characters "a" or "b" in "ab1aba344444abb555ab4" data one; char1 = "ab1aba344444abb555ab4"; * character variable to check; * Before Version 9 ; num_a_or_b = 0; * counter for number do while (loc ne 0); of occurrences ; loc=indexc(char1,"ab"); * find 1st occurrence ; num_a_or_b+1; * count occurrences of "a” or “b“ ; char1 = substr(char1,loc+1); * resume search end; after "a” or “b" ; loc = indexc(char1, "ab"); * find next occurrence ; * Or, a single statement in Version 9 ; num_a_or_b = countc(char1,"ab"); * Version 9 ; run;
Countw: Version 9.2
Counts number of words in a character string Can also replace iterative programming
5. Concatenate character strings
Pre-Version 9 • || to concatenate • TRIM removes trailing blanks • TRIM(LEFT) removes leading and trailing blanks Version 9, new functions concatenate character strings • CAT does not remove leading or trailing blanks • CATT removes trailing blanks • CATS removes leading and trailing blanks • CATX removes leading and trailing blanks, inserts delimiter (separator) between each argument • Note difference in length of result
data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; * Leading and trailing blanks included; cat1 = char1 || char2 || char3; cat1 = cat(char1, char2, char3); * V9 ; * Remove trailing blanks ; cat2 = trim(char1) || trim(char2) || trim(char3); cat2 = catt(char1, char2, char3); * V9 ; cat1=xxxxxxxx yyy zzzz cat2=xxxxxxxx yyyzzzz
data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; * Remove leading and trailing blanks ; cat3 = trim(left(char1)) || trim(left(char2)) || trim(left(char3)); cat3 = cats(char1, char2, char3); cat3=xxxxxxxxyyyzzzz
Important difference: length of result
data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; cat1 = char1 || char2 || char3; cat1 = cat(char1, char2, char3); etc.
If length of result (e.g., CAT1) not defined • ||: sum of length of contributing variables: 8+8+8=24 • CAT, CATT, CATS, CATX: 200
data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; aa = '123'; run; * Delimiter: value of AA, can vary by observation ; char_catx1 = catx(aa, char1, char2); * Delimiter: comma ; char_catx2 = catx(',', char1, char2, char3); * Delimiter: '123' ; char_catx3 = catx('123', char1, char2, char3); * Delimiter: single blank ; char_catx4 = catx(' ', char1, char2, char3); char_catx1=xxxxxxxx123yyy char_catx2=xxxxxxxx,yyy,zzzz char_catx3=xxxxxxxx123yyy123zzzz char_catx4=xxxxxxxx yyy zzzz
6. Sort by the numeric part of character values
Variable BANK:
bank2 bank100 bank1 bank10
Conventional PROC SORT • left to right character by character comparison • "1" in "bank100" less than "2" in "bank2“ Wanted: sort integer values w/in text by numeric value
Conventional sort order Desired sort order bank1 bank1 bank10 bank2 bank100 bank10 bank2 bank100
Before Version 9.2: one method
Make new data set with variable w/numeric part of BANK Sort by new variable This example assumes numeric part of value • always starts at same location (5th character) • at most three digits More general case slightly more complicated data two; set one; banktemp = input(substr(bank,5),3.); run; proc sort data=two out=three; by banktemp; run;
Version 9.2: linguistic collation
SORTSEQ=LINGUISTIC + NUMERIC_COLLATION sorts integer values w/in text by numeric part of value Don’t create extra data set
proc sort data=one out=two sortseq=linguistic (numeric_collation=on); by bank; run;
Sort order for BANK:
bank1 bank2 bank10 bank100
Numeric collation not just at end of character values
ADDRESS values
East 14th Street East 2nd Street
Conventional sort: “1” less than “2” Sortseq=linguistic (numeric_collation=on): 2 less than 14
Conventional sort order Linguistic collation order East 14th Street East 2nd Street East 2nd Street East 14th Street
7. Retrieve DB2 data on z/OS mainframes
Old methods • PROC DB2EXT • PROC ACCESS Modern methods • SAS/ACCESS LIBNAME statement • Pass-Through Facility
PROC DB2EXT
Since Version 6 (1990), undocumented but works In Version 9, message
NOTE: Beginning with SAS version 10, DB2EXT will no longer be available.
“Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names
PROC ACCESS
Introduced in Version 6 (1990) Use access and view descriptors Documented and supported, not recommended PROC CV2VIEW: convert access and view descriptors to SQL views “Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names
SAS/ACCESS LIBNAME statement
Introduced in Version 7 Assign libref directly to DB2 database Easiest method Long DB2 column names processed directly
Pass-through facility
Introduced in Version 6 Retrieve DB2 data by sending DBMS-specific SQL statements to DB2 • DBMS = Database Management System Long DB2 column names • Processed directly unless options validvarname=v6; • options validvarname=v6;
Long name truncated to 8 characters, with unique names as per PROC DB2EXT, PROC ACCESS Documented in V8, undocumented but works in V9
Retrieving data: 4 methods
options db2ssid=dsn; /* 1. PROC DB2EXT */ run; proc db2ext out=new1; select cntry_nm from nicua.cuv_country_nm; run; proc access dbms=db2; /* 2. PROC ACCESS */ create work.new1.access; table = nicua.cuv_country_nm; ssid = dsn; create work.new1.view; select cntry_nm; run;
Retrieving data: 4 methods
/* 3. SAS/ACCESS to DB2 LIBNAME statement */ libname in1 db2 ssid=dsn authid=nicua; data new1; set in1.cuv_country_nm (keep = cntry_nm); run; proc sql; /* 4. Pass_Through Facility */ connect to db2 (ssid=dsn); create table new1 as select * from connection to db2 (select cntry_nm from nicua.cuv_country_nm); disconnect from db2; quit;
Notes
SAS/ACCESS LIBNAME statement vs Pass-Through Facility • Both recommended • SAS/ACCESS LIBNAME statement easiest • Pass-Through Facility can be more efficient, especially for complex queries • Compared in SAS/ACCESS manual If convert from PROC DB2EXT or PROC ACCESS • DB2 column names could be longer • In subsequent code: RENAME or change code
For more information, please contact Bruce Gilsen Federal Reserve Board, mail stop 157 Washington, DC 20551 phone: 202-452-2494 e-mail: [email protected]
SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.