More Tales from the Help Desk: Solutions for Simple SAS

Download Report

Transcript More Tales from the Help Desk: Solutions for Simple SAS

This is the Modern World: Simple, Overlooked SAS® Enhancements

Bruce Gilsen Federal Reserve Board June 2009

Introduction

 Smaller, less dramatic enhancements fall thru cracks at • Federal Reserve (SAS consultant for 24 years) • SAS-L • Conferences  Users still use more cumbersome methods  Enhancements from V92, V9, V8, V7, even V6!

 Impetus: yymmddn8. in earlier paper

Underutilized enhancements

 Write date values in form yyyymmdd  Increment date values w/INTNX function  Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine  Count # times character or substring occurs in a character string  Concatenate character strings  Sort by numeric part of character values  Retrieve DB2 data on z/OS mainframes

1. Write date values in form yyyymmdd

 Lots of informats, formats to read/write date values • 9.1.3: 22 date informats, 46 date formats  Most common input form: yyyymmdd • YYMMDD8. informat reads dates in form yyyymmdd  Most common output form (overwhelmingly): yyyymmdd • YYMMDD8. format writes dates as yy-mm-dd • YYMMDD10. format writes dates as yyyy-mm-dd

Writing dates as yyyymmdd, pre-V8

 Some date output

data one; date1 = "18oct2008"d; * 10/18/2008 date value; put date1 yymmdd8.; * writes 08-10-18 ; put date1 yymmdd10.; * writes 2008-10-18 ;

 Before Version 8 • PUT: character string of form yyyy-mm-dd • COMPRESS: remove dashes

datetemp = compress(put(date1,yymmdd10.),'-'); put datetemp $8. ; * writes 20081018 ;

Writing dates as yyyymmdd, V8

 New format, YYMMDDxw • x: optional separator • w: number of bytes  x, one of: B blank

N none

C colon P period D dash S slash

date1 = "18oct2008"d; * 10/18/2008 date value; put date1 yymmddn8.; * writes 20081018 ;

YYMMDDxw format, examples

data _null_ ; date1 = "18oct2008"d; * 10/18/08 date value ; put date1 yymmddb8.; * 08 10 18 ; put date1 yymmddc8.; * 08:10:18 ; put date1 yymmddd10.; * 2008-10-18 ; put date1 yymmddn8.; * 20081018 ; put date1 yymmddp8.; * 08.10.18 ; put date1 yymmdds10.; * 2008/10/18 ; put date1 yymmddn6.; * 081018 ;

2. Increment date values with INTNX

 INTNX returns SAS

®

date value incremented by number of intervals (days, months, quarters, years,…)  SAS date value: integer, number of days before or after January 1, 1960

December 31, 1959 -1 January 1, 1960 0 January 2, 1960 1 ...

October 18, 2008 16,727

INTNX Syntax (informal, incomplete)

INTNX(

interval

,

start-from

,

increment

<,

alignment

>)  interval: unit (years, months, days,...)

start-from

is incremented by  start-from: SAS date value to increment  increment: # of intervals

start-from

is incremented by  alignment: where

start-from

aligned after incrementing. Values: BEGINNING, MIDDLE, END, SAMEDAY (v9). Default: BEGINNING.

Before Version 9: 2 INTNX issues

 Alignment BEGINNING (start of period)  Leap year handling

INTNX issue 1: alignment to start of period

 

Increment by 2 days, months, years Unexpected result for months, years

data one; date1 = ‘18oct2008'd; dateplus2day = intnx(‘day',date1,2); dateplus2month = intnx(‘month',date1,2); dateplus2year = intnx(‘year',date1,2); run; Variable Description Expect Actual dateplus2day 10/18/08+2 days 10/20/08 10/20/08 dateplus2month 10/18/08+2 months 12/18/08 12/1/08 dateplus2year 10/18/08+2 years 10/18/10 1/1/10

dateplus2day = intnx(‘day',date1,2);

date1 = ‘18oct2008'd;

• = 17,823, date value for 10/18/2008  Increment by 2 days • First, increment by 2 days • Then, align to start of 10/20/2008 (beginning of interval, DAY), no effect  DATEPLUS2DAY = 17,825, date value for 10/20/2008 as expected

dateplus2month = intnx(‘month',date1,2);

date1 = ‘18oct2008'd;

• = 17,823, date value for 10/18/2008  Increment by 2 months • First, increment by 2 months • Then, align to start of 12/2008 (beginning of interval, MONTH)  DATEPLUS2MONTH = 17,867, date value for 12/1/2008

dateplus2year = intnx(‘year',date1,2);

date1 = ‘18oct2008'd;

• = 17,823, date value for 10/18/2008  Increment by 2 years • First, increment by 2 years • Then, align to start of 2010 (beginning of interval, YEAR)  DATEPLUS2MONTH = 18,263, date value for 1/1/2010

INTNX issue 2: leap years

Example of shifting by years

offset = 2; * number of years to increment; if day(date1) = 29 and month(date1) = 2 and mod(offset,4) ne 0 and mod(offset, 400) ne 0 then date1 = mdy(2, 28, year(date1)+offset); else date1 = mdy(month(date1), day(date1), year(date1)+offset) ; 

Need more generalized code for all frequencies

Solution: SAMEDAY

 New alignment value in V9  Preserves date value’s alignment in interval after it’s incremented  No effect when interval = DAY  Use for all intervals except DAY

Solution: SAMEDAY

 dateplus2day = intnx(‘day’, date1, 2, “sameday”) • 2 days after 10/18/2008 • Result: 17,825, SAS date value for 10/20/2008  dateplus2month = intnx(‘month’, date1, 2, “sameday”) • 2 months after 10/18/2008 • Result: 17,884, SAS date value for 12/18/2008  dateplus2year = intnx(‘year’, date1, 2, “sameday”) • 2 years after 10/18/2008 • Result: 18,553, SAS date value for 10/18/2010

Some interesting dates

 date2 = intnx(‘year’, ’29feb2000’d, 1, “sameday”) • 1 year after 2/29/2000 • Result: 15,034, SAS date value for 2/28/2001  date3 = intnx(‘year’, ’29feb2000’d, 4, “sameday”) • 4 years after 2/29/2000 • Result: 16,130, SAS date value for 2/29/2004  2000, 2004 = leap years, 2001 not leap year

Some interesting dates

 date4 = intnx(‘month’, ’31mar2003’d, -1, “sameday”) • 1 month before 3/31/2003 • Result: 15,764, SAS date value for 2/28/2003  date5 = intnx(‘month’, ’31mar2004’d, -1, “sameday”) • 1 month before 3/31/2004 • Result: 16,130, SAS date value for 2/29/2004  2000, 2004 = leap years, 2003 not leap year

SAMEDAY until 9.2: single, non-shifted date intervals only

  Following intervals might return wrong answer, no error or warning • multiple (e.g., month2 = two-month interval) • shifted (e.g., month.4 = month interval starting on April 1) • time • datetime Interval = DAY, WEEK, WEEKDAY, TENDAY, SEMIMONTH, MONTH, QTR, SEMIYEAR, YEAR okay  Fixed in 9.2

 http://support.sas.com/techsup/unotes/SN/016/016184.html

3. Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine

 Transport file • File organized in machine-independent format • SAS can read on any operating system • Use to copy SAS data sets from one operating system to another  Starting w/ Version 6, two types of transport files • XPORT-style transport files • CPORT-style transport files

XPORT-style transport files: 3 steps

 1. On original host, create transport file with data set(s)

libname inn 'directory_with_data_set(s)' ; libname trans xport 'transport_file_to_create'; proc copy in=inn out=trans ; run ;

 2. Copy transport file to destination host with ftp command or another method  3. On destination host, transport file ---> SAS data set(s)

libname trans xport 'transport_file' ; libname sasdata1 ‘directory_for_data_sets‘; proc copy in=trans out=sasdata1 ; run ;

CPORT-style transport files: 3 steps

 1. On original host, create transport file with data set(s)

libname sasdata1 'directory_with_SAS_data_set(s)'; filename trans 'transport_file_to_create'; proc cport library=sasdata1 file=trans ; run ;

 2. Copy transport file to destination host with ftp command or another method  3. On destination host, transport file ---> SAS data set(s)

filename trans 'transport_file' ; libname sasdata1 ‘directory_for_data_sets' ; proc cimport infile=trans library=sasdata1 ; run ;

Version 7 changes and transport files

 Version 7 changes • Variable name maximum length: 8  32 • Character variable maximum length: 200  32768 • Variable names display based on “first reference”  Transport files • XPORT-style created in Version 6 format • CPORT-style understands Version 7 changes • Always use CPORT-style

XPORT-style transport files, Version 7 and beyond

 options validvarname=v6; • Documented in V8, undocumented but works in V9 • Empty data set if any character variable length more than 200, error in log • Variable names longer than 8 characters truncated, unique • Variable names converted to upper case  Otherwise • Empty data set if any character variable length more than 200 or variable name longer than 8 characters, error in log • Variable names converted to upper case

4. Count number of times a character or substring occurs in a character string

 Typical objective, count number of times • "ab" occurs in " ab 1 ab a344444 ab b555 ab 4“ (4 times) • "a” or “b" occur in " ab 1 aba 344444 abb 555 ab 4“ (10 times)  Pre-Version 9: iterative methods (loops)  Version 9 • COUNT: # times substring occurs in a character string • COUNTC: # times 1 or more characters occur in a character string

Count occurrences of substring "ab" in "ab1aba344444abb555ab4" data one; char1 = "ab1aba344444abb555ab4"; * character variable to check; * Before Version 9 ; num_ab = 0; * counter for number of occurrences ; loc=index(char1,"ab"); * find 1st occurrence ; do while (loc ne 0); num_ab+1; * count occurrences of "ab" ; char1 = substr(char1,loc+2); * resume search after "ab" ; loc = index(char1, "ab"); * find next occurrence ; end; * Or, a single statement in Version 9 ; num_ab = count(char1,"ab"); * Version 9 ; run;

Count occurrences of characters "a" or "b" in "ab1aba344444abb555ab4" data one; char1 = "ab1aba344444abb555ab4"; * character variable to check; * Before Version 9 ; num_a_or_b = 0; * counter for number do while (loc ne 0); of occurrences ; loc=indexc(char1,"ab"); * find 1st occurrence ; num_a_or_b+1; * count occurrences of "a” or “b“ ; char1 = substr(char1,loc+1); * resume search end; after "a” or “b" ; loc = indexc(char1, "ab"); * find next occurrence ; * Or, a single statement in Version 9 ; num_a_or_b = countc(char1,"ab"); * Version 9 ; run;

Countw: Version 9.2

 Counts number of words in a character string  Can also replace iterative programming

5. Concatenate character strings

 Pre-Version 9 • || to concatenate • TRIM removes trailing blanks • TRIM(LEFT) removes leading and trailing blanks  Version 9, new functions concatenate character strings • CAT does not remove leading or trailing blanks • CATT removes trailing blanks • CATS removes leading and trailing blanks • CATX removes leading and trailing blanks, inserts delimiter (separator) between each argument • Note difference in length of result

data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; * Leading and trailing blanks included; cat1 = char1 || char2 || char3; cat1 = cat(char1, char2, char3); * V9 ; * Remove trailing blanks ; cat2 = trim(char1) || trim(char2) || trim(char3); cat2 = catt(char1, char2, char3); * V9 ; cat1=xxxxxxxx yyy zzzz cat2=xxxxxxxx yyyzzzz

data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; * Remove leading and trailing blanks ; cat3 = trim(left(char1)) || trim(left(char2)) || trim(left(char3)); cat3 = cats(char1, char2, char3); cat3=xxxxxxxxyyyzzzz

Important difference: length of result

data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; cat1 = char1 || char2 || char3; cat1 = cat(char1, char2, char3); etc.

 If length of result (e.g., CAT1) not defined • ||: sum of length of contributing variables: 8+8+8=24 • CAT, CATT, CATS, CATX: 200

data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; aa = '123'; run; * Delimiter: value of AA, can vary by observation ; char_catx1 = catx(aa, char1, char2); * Delimiter: comma ; char_catx2 = catx(',', char1, char2, char3); * Delimiter: '123' ; char_catx3 = catx('123', char1, char2, char3); * Delimiter: single blank ; char_catx4 = catx(' ', char1, char2, char3); char_catx1=xxxxxxxx123yyy char_catx2=xxxxxxxx,yyy,zzzz char_catx3=xxxxxxxx123yyy123zzzz char_catx4=xxxxxxxx yyy zzzz

6. Sort by the numeric part of character values

 Variable BANK:

bank2 bank100 bank1 bank10

 Conventional PROC SORT • left to right character by character comparison • "1" in "bank100" less than "2" in "bank2“  Wanted: sort integer values w/in text by numeric value

Conventional sort order Desired sort order bank1 bank1 bank10 bank2 bank100 bank10 bank2 bank100

Before Version 9.2: one method

 Make new data set with variable w/numeric part of BANK  Sort by new variable   This example assumes numeric part of value • always starts at same location (5th character) • at most three digits More general case slightly more complicated data two; set one; banktemp = input(substr(bank,5),3.); run; proc sort data=two out=three; by banktemp; run;

Version 9.2: linguistic collation

 SORTSEQ=LINGUISTIC + NUMERIC_COLLATION sorts integer values w/in text by numeric part of value  Don’t create extra data set

proc sort data=one out=two sortseq=linguistic (numeric_collation=on); by bank; run;

 Sort order for BANK:

bank1 bank2 bank10 bank100

Numeric collation not just at end of character values

 ADDRESS values

East 14th Street East 2nd Street

 Conventional sort: “1” less than “2”  Sortseq=linguistic (numeric_collation=on): 2 less than 14

Conventional sort order Linguistic collation order East 14th Street East 2nd Street East 2nd Street East 14th Street

7. Retrieve DB2 data on z/OS mainframes

 Old methods • PROC DB2EXT • PROC ACCESS  Modern methods • SAS/ACCESS LIBNAME statement • Pass-Through Facility

PROC DB2EXT

 Since Version 6 (1990), undocumented but works  In Version 9, message

NOTE: Beginning with SAS version 10, DB2EXT will no longer be available.

 “Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names

PROC ACCESS

 Introduced in Version 6 (1990)  Use access and view descriptors  Documented and supported, not recommended  PROC CV2VIEW: convert access and view descriptors to SQL views  “Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names

SAS/ACCESS LIBNAME statement

 Introduced in Version 7  Assign libref directly to DB2 database  Easiest method  Long DB2 column names processed directly

Pass-through facility

 Introduced in Version 6   Retrieve DB2 data by sending DBMS-specific SQL statements to DB2 • DBMS = Database Management System Long DB2 column names • Processed directly unless options validvarname=v6; • options validvarname=v6;

Long name truncated to 8 characters, with unique names as per PROC DB2EXT, PROC ACCESS Documented in V8, undocumented but works in V9

Retrieving data: 4 methods

options db2ssid=dsn; /* 1. PROC DB2EXT */ run; proc db2ext out=new1; select cntry_nm from nicua.cuv_country_nm; run; proc access dbms=db2; /* 2. PROC ACCESS */ create work.new1.access; table = nicua.cuv_country_nm; ssid = dsn; create work.new1.view; select cntry_nm; run;

Retrieving data: 4 methods

/* 3. SAS/ACCESS to DB2 LIBNAME statement */ libname in1 db2 ssid=dsn authid=nicua; data new1; set in1.cuv_country_nm (keep = cntry_nm); run; proc sql; /* 4. Pass_Through Facility */ connect to db2 (ssid=dsn); create table new1 as select * from connection to db2 (select cntry_nm from nicua.cuv_country_nm); disconnect from db2; quit;

Notes

 SAS/ACCESS LIBNAME statement vs Pass-Through Facility • Both recommended • SAS/ACCESS LIBNAME statement easiest • Pass-Through Facility can be more efficient, especially for complex queries • Compared in SAS/ACCESS manual  If convert from PROC DB2EXT or PROC ACCESS • DB2 column names could be longer • In subsequent code: RENAME or change code

For more information, please contact Bruce Gilsen Federal Reserve Board, mail stop 157 Washington, DC 20551 phone: 202-452-2494 e-mail: [email protected]

SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.