Strong Growth

Download Report

Transcript Strong Growth

NOBS for Noobs
NOBS for Noobs
David B. Horvath, CCP, MS
PhilaSUG Winter 2015 Meeting
NOBS for Noobs
Copyright © 2015, David B. Horvath, CCP — All Rights Reserved
The Author can be contacted at:
504 Longbotham Drive, Aston PA 19014-2502, USA
Phone: 1-610-859-8826
Email: [email protected]
Web: http://www.cobs.com/
All trademarks and servicemarks are the
property of their respective owners.
2
Abstract
This mini-session will be a short discussion of the NOBS (number of
observations) option on the SET statement.
This includes one "gotcha" that I've run into with where clauses: NOBS is set
before WHERE processing. If you have a reason to know the number of
observations after the WHERE clause, another DATA step is needed.
3
My Background
• Base SAS on Mainframe, UNIX, and PC Platforms
• SAS is primarily an ETL tool or Programming Language for me
• My background is IT – I am not a modeler
• Not my first User Group presentation – presented sessions and
seminars in Australia, France, the US, and Canada.
• Undergraduate: Computer and Information Sciences, Temple Univ.
• Graduate: Organizational Dynamics, UPenn
• Most of my career was in consulting
• Have written several books (none SAS-related)
• Adjunct Instructor covering IT topics.
4
Basic NOBS
• The nobs statement is a handy way of discovering how many
observations are in your SAS Dataset:
data simple;
a = 42;
output;
run;
data _null_;
put nobs=;
stop;
set simple nobs=nobs;
run;
• Prints
NOBS=1
5
Macro NOBS
• Great information if you need it!
• Is available before first row is processed
• Can be stored in macro variable for global usage:
data _null_;
call symput('ALLOBS', nobs);
stop;
set simple nobs=nobs;
run;
data _null_;
put "number of obs are &ALLOBS.";
stop;
run;
• Prints
number of obs are
1
6
NOBS – the catch: where
• Processing is at the file level – before the where clause:
data _null_;
put nobs=;
stop;
set simple nobs=nobs;
run;
• And
data _null_;
put nobs=;
stop;
set simple nobs=nobs;
where a = 10;
run;
• Both print the same result:
NOBS=1
7
NOBS – the catch: where
• I found out the hard way
• I had a process that rsubmitted N jobs to process the objects within an
XML file
• Each of the N jobs processed 1/Nth of the objects to spread load
• Process worked fine until the user said "Don't bother with THESE
tables".
• I figured "Oh, this is SAS, this is an easy change: 'where TABLE not
in (THESE1, THESE2, ... THESEn)'".
• The process worked fine but runtimes went up – no longer were N
processes running; the last 2 never started up.
• Solution was to add another data step in front to execute the 'where'
• Input to the rsubmit process now had the correct nobs
8
NOBS – the catch: not a new variable
• Your nobs variable is special – it will not appear in the output dataset
data new;
set simple nobs=nobs;
run;
proc print data=new; run;
• Prints
Obs
1
a
42
• Coding the nobs variable in a keep statement is not a fix:
data new (keep=a nobs);
WARNING: The variable nobs in the DROP, KEEP, or RENAME list
has never been referenced.
• Only solution is an equal sign (even retain does not help):
nnobs=nobs;
9
NOBS – the catch: options obs=
• Is independent of options obs=;
options obs=2;
data _null_;
put nobs=;
stop;
set large nobs=nobs;
run;
• And (obs=)
data _null_;
put nobs=;
stop;
set large(obs=2) nobs=nobs;
run;
• Both print
nobs=915803
10
NOBS – the catch: not every engine
• The XML Engine does not properly implement:
filename
filename
libname
SXLEMAP "OUR_MAP_FILE.map";
test2 "OUR_INPUT_FILE.xml";
test2 xml xmlmap=SXLEMAP access=READONLY;
NOTE: Libref TEST2 was successfully assigned as follows:
Engine:
XML
Physical Name: TEST2
data _null_;
put nobs=;
stop;
set test2.application nobs=nobs;
run;
• Printing
nobs=9.0071993E15
• When the file only contained 17,383,357 bytes
11
Wrap Up
?!
?
!
!
Questions
?!
?
and
?
?!
?
Answers
!
!
12
?!