Transcript Document 7510756
“The DATA step is your most powerful programming tool.
So understand and use it well.”
Socrates
2
Objectives
understand DATA step:
processes
internals
defaults
3
processes internals defaults compilation of DATA step source code execution of resultant machine code 4
processes internals defaults compile and execute phases of:
INPUT (non SAS data)
SET
5
processes internals defaults
syntax scan
Compile Time Activities
source code translation to machine language
definition of input and output files
6
processes internals defaults
input buffer
Compile Time Activities
LPDV (logical program data vector)
data set descriptor information
7
processes internals defaults
Creation of LPDV
Variables added in the order seen by the compiler
during parsing and interpretation of source statements 8
processes internals defaults
location critical
BY
WHERE
ARRAY
ATTRIB
FORMAT
INFORMAT
LENGTH Compile Time Statements
location irrelevant
DROP
KEEP
LABEL
RENAME
RETAIN
9
processes internals defaults
Retained Variables
all SAS special variables
_N_ _ERROR_
all vars in RETAIN statement
all vars from SET, MERGE, or UPDATE
accumulator vars in SUM statement(s)
10
processes internals defaults
Variables Not Retained
Variables from input statement
user defined variables (other than SUM statement)
11
processes internals defaults
Type and Length of Variables
determined at compile time
by first reference to the compiler (in the DATA step)
Numerics:
length is 8 during DATA step processing
length is an output property
12
INPUT statement
reading non-SAS data
Compile Loop and LPDV
data a ; put _all_ ; *write LPDV to LOG; input idnum diagdate: mmddyy8.
sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; 1 09-09-52 F placebo 2 3 run; 11-15-64 M 300 mg.
04-07-48 F 600 mg.
14
input buffer logical program data vector
idnum diagdate sex rx_grp time numeric numeric char char numeric 8 8 8 10 8
Building descriptor portion of SAS data set
15
logical program data vector
idnum diagdate sex rx_grp time _N_ _ERROR_ numeric numeric char char numeric 8 8 8 10 8 DKR* keep keep keep keep keep drop drop *Drop/keep/rename
16
Execution of a DATA Step
17
Execution of a DATA Step
implied output _N_ + 1
Initialization of LPDV read input file end of file?
Y N process statements in step next step termination 18
processes internals defaults
DATA Step Execution
Implied read/write loop, stopped by:
no more data to read
explicit STOP
no input data
some execution time errors
19
processes internals defaults
Execution Time Activities
execute initialize-to-missing (ITM)
read from input source
modify data using user-controlled statements
supply values of variables to LPDV
output observation to SAS data set
20
processes internals defaults
Initialization
_N_ _ERROR_ set to loop count set to 0
user variables set to missing
21
Execution Loop - raw data
data a ; put _all_ ; *write LPDV to LOG; input idnum diagdate: mmddyy8.
sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; 1 09-09-52 F placebo 2 11-15-64 M 300 mg.
3 04-07-48 F 600 mg.
run; proc contents; run; proc print; run;
22
.
2 .
3
LPDV
IDNUM DIAGDATE SEX RX_GRP TIME _N_
.
.
. 1 .
1 -2670 .
F placebo 48 .
1 2 M 300 mg.
1780 .
-4286 .
F 600 mg.
36 2 . 3 52 .
3 4
(over all executions of DATA step……..) 23
2 data a ; 3 put _all_ ; *write LPDV to LOG; 4 input idnum 5 diagdate: mmddyy8.
6 sex $ 7 rx_grp $ 10. ; 8 time = intck ('year', diagdate, today() ) ; 9 put _all_; *write LPDV to LOG; 10 cards ; IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=1 IDNUM=1 DIAGDATE=-2670 SEX=F RX_GRP=placebo TIME=49 _ERROR_=0 _N_=1 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=2 IDNUM=2 DIAGDATE=1780 SEX=M RX_GRP=300 mg. TIME=37 _ERROR_=0 _N_=2 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=3 IDNUM=3 DIAGDATE=-4286 SEX=F RX_GRP=600 mg. TIME=53 _ERROR_=0 _N_=3 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=4 NOTE: The data set WORK.A has 3 observations and 5 variables.
NOTE: The DATA statement used 0.59 seconds.
14 run; 15 16 proc contents; run; NOTE: The PROCEDURE CONTENTS used 0.39 seconds.
24
Data Set Name: WORK.A Observations: 3 Member Type: DATA Variables: 5 Engine: V612 Indexes: 0 Created: 11:18 Saturday, January 20, 2001 Observation Length: 42 Last Modified: 11:18 Saturday, January 20, 2001 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information---- Data Set Page Size: 8192 Number of Data Set Pages: 1 File Format: 607 First Data Page: 1 Max Obs per Page: 194 Obs in First Data Page: 3 -----Alphabetic List of Variables and Attributes---- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 5 TIME Num 8 34 2 DIAGDATE Num 8 8 1 IDNUM Num 8 0 4 RX_GRP Char 10 24 3 SEX Char 8 16
25
PROC PRINT
IDNUM DIAGDATE SEX RX_GRP TIME 1 -2670 F placebo 48 2 1780 M 300 mg. 36 3 -4286 F 600 mg. 52 26
SET statement
reading existing SAS data
DATA Step Compile
no input buffer
compiler reads descriptor portion of input SAS data set to build the LPDV
returns same variables/attributes, including new variables
28
processes internals defaults
SET
determine which SAS data set to be read
identify next observation to be read
copy variable values to LPDV
29
Execution Loop - SAS data data sas_a ; put _all_ ;
set a ; tot_rec + 1 ;
put _all_ ; run;
30
Building LPDV from descriptor portion of old SAS data set
logical program data vector
idnum diagdate sex rx_grp time
tot_rec
numeric numeric char char numeric
numeric
8 8 8 10 8 8
Building descriptor portion of new SAS data set
31
LPDV
IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC _N_
. . . 0 1 1 -2670 F placebo 48 1 1 1 -2670 F placebo 48 1 2 2 1780 M 300 mg. 36 2 2 2 1780 M 300 mg. 36 2 3 3 -4286 F 600 mg. 52 3 3 3 -4286 F 600 mg. 52 3 4
(over all executions of DATA step……..) 32
LOG
idnum=. diagdate=. sex= rx_grp= time=. tot_rec=0 _ERROR_=0 _N_=1 idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=1 idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=4 33
PROC PRINT
IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC 1 -2670 F placebo 48 1 2 1780 M 300 mg. 36 2 3 -4286 F 600 mg. 52 3
34
Logic of a MERGE
compile
execute
35
; data left; input ID X Y ; cards; 1 88 99 2 66 77 3 44 55 data right; input ID A $ B $ ; cards; 1 A14 B32 ; 3 A53 B11
36
proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright) ; by ID ; run;
37
logical program data vector first iteration: MATCH
ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 1 88 99 A14 B32 1 1 1 0
38
logical program data vector second iteration: NO MATCH
ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 2 66 77 1 0 2 0
39
logical program data vector third iteration: MATCH
ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 3 44 55 A53 B11 1 1 3 0
40
Let’s try this again………………… data left; data right; input ID X Y ; input ID A $ B $ ; ; cards; 1 88 99 2 66 77 3 44 55 cards; 1 A14 B32 ; 3 A53 B11
41
proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright) ; ***** by ID (one-on-one merge); run;
42
logical program data vector
first iteration:
1:1 “MATCH”
ID X Y A B _N_ _ERROR_ 1 88 99 A14 B32 1 0 1 OVERWRITTEN – value came from data set “right”
43
logical program data vector
second iteration:
1:1 “MATCH”
ID X Y A B _N_ _ERROR_ 2 66 77 A53 B11 2 0 3 OVERWRITTEN – value came from data set “right”
44
logical program data vector
third iteration:
1:1 “NO MATCH”
ID X Y A B _N_ _ERROR_ 3 44 55 3 0 MISSING – no values from “right”
45
Output SAS data set
ID X Y A B 1 3 3 88 99 A14 B32 66 77 A53 B11 44 55
46
DATA Step Conclusions
Understanding internals and default activities allows you to:
make informed coding decisions
write flexible and efficient code
debug and test effectively
interpret results readily
47
Remember
We have discussed DEFAULTS
As soon as you add options, statements, features, etc., the default actions change; TEST them!
You can use these same tools to track what’s happening.
48