Transcript Chapter 7
Chapter 7
DATA Step
Programming
Section 7.1
Reading SAS Data Sets
and Creating Variables
Objectives
3
Create a SAS data set using another SAS data set as
input.
Create SAS variables.
Use operators and SAS functions to manipulate data
values.
Control which variables are included in a SAS data
set.
SAS Vocabulary
DATA Statement
SET Statement
Assignment Statement
Operators
Operands
SAS Function
SUM
4
TODAY()
MDY(month,day,year)
YEAR(SAS-date)
QTR(SAS-date)
MONTH(SAS-date)
WEEKDAY(SAS-date)
DROP/KEEP
Reading a SAS Data Set
Create a temporary SAS data set named onboard from
the permanent SAS data named ia.dfwlax and create
a variable that represents the total passengers on board.
Sum FirstClass and Economy values to compute
the new variable, Total.
New
SAS date values
Variable
ia.dfwlax
5
Flight Date
Dest
439
921
114
LAX
DFW
LAX
14955
14955
14956
FirstClass Economy
20
20
15
137
131
170
Total
157
151
185
Reading a SAS Data Set
To create a SAS data set using a SAS data set as input,
you must use a
1. DATA statement -
2. SET statement -
6
...
Reading a SAS Data Set
To create a SAS data set using a SAS data set as input,
you must use a
1. DATA statement –
starts the DATA step and tells SAS the name of the
data set you want to CREATE. (output data set:
onboard)
2. SET statement tells SAS the data set to USE to create the data set
named on the DATA statement. (input data set:
ia.dfwlax)
7
Reading a SAS Data Set
General form of a DATA step:
DATA output-SAS-data-set;
SET input-SAS-data-set;
additional SAS statements
RUN;
By default, the SET statement reads all
observations from the input SAS data set
variables from the input SAS data set.
8
The DATA Step – Example
libname ia 'SAS-data-library';
data onboard;
set ia.dfwlax;
run;
Use this data set…
… to create
this data set.
The DATA step creates a data set.
The DATA statement tells SAS the name
of the data set being created.
9
The DATA Step – Example
libname ia 'SAS-data-library';
data onboard;
set ia.dfwlax;
run;
At compile time,
SAS reads the
descriptor portion
of ia.dfwlax
and creates a spot
in the PDV for each
of the variables in
the order they
appear in the data
set.
PDV
Flight Date Dest FirstClass Economy
$ 3 N 8 $ 3
N 8
N 8
10
Reading a SAS Data Set
To create a variable, you must use an
Assignment statement
to assign the new variable, Total, as the sum of
the values of the variables FirstClass and
Economy.
Example:
Total = FClass + EClass;
New variable
11
Assignment Statements
An assignment statement
evaluates an expression
assigns the resulting value to a variable.
General form of an assignment statement:
variable=expression;
12
SAS Expressions
An expression contains operands and operators that
form a set of instructions that produce a value.
Operands are
variable names
constants
Operators are
symbols that request
arithmetic
calculations
SAS functions
We look at functions later in this section.
13
Using Operators
Selected operators for basic arithmetic calculations
in an assignment statement:
Operator Action
Example
Priority
+
Addition
Sum=x+y;
III
-
Subtraction
Diff=x-y;
III
*
Multiplication
Mult=x*y;
II
/
Division
Divide=x/y;
II
**
Exponentiation
Raise=x**y;
I
-
Negative prefix
Negative=-x;
I
(Remember “Please Excuse My Dear Aunt Sally…”)
You can also use parentheses to enforce a different order.
14
Example
libname ia 'SAS-data-library';
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
15
Compiling the DATA Step – Example
libname ia 'SAS-data-library';
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
The + tells SAS that
Total is numeric
PDV
Flight Date Dest FirstClass Economy
$ 3 N 8 $ 3
N 8
N 8
16
Compiling the DATA Step – Example
libname ia 'SAS-data-library';
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date Dest FirstClass Economy Total
$ 3
N 8 $ 3
N 8
N 8
N 8
17
SAS creates the variables in the order that it sees them.
Total is created at the end of the step, so Total is last in the
PDV.
Executing the DATA Step – Example
First, SAS initializes the PDV to missing. Character
variables get set to a blank, and the numeric variables get
set to a period.
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
.
Dest FirstClass Economy Total
.
.
.
onboard
Flight Date
18
Dest FirstClass Economy Total
Executing the DATA Step – Example
There is a pointer to the first observation in the
ia.dfwlax data set. The SET statement tells SAS to
load the observation into the PDV.
ia.dfwlax
Flight
439
921
114
Date
Dest FirstClass Economy
12/11/00 LAX
20
137
12/11/00 DFW
20
131
12/12/00 LAX
15
170
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
.
19
Executing the DATA Step – Example
The pointer then moves down to the next observation in the
ia.dfwlax data set.
ia.dfwlax
Flight
439
921
114
Date
Dest FirstClass Economy
12/11/00 LAX
20
137
12/11/00 DFW
20
131
12/12/00 LAX
15
170
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
.
20
Executing the DATA Step – Example
ia.dfwlax
Flight
439
921
114
Date
Dest FirstClass Economy
12/11/00 LAX
20
137
12/11/00 DFW
20
131
12/12/00 LAX
15
170
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
157
onboard
Flight Date
21
Dest FirstClass Economy Total
Executing the DATA Step – Example
ia.dfwlax
Flight Date
Dest FirstClass Economy
439
12/11/00 LAX
20
137
921
12/11/00 DFW
20
131
114
12/12/00
LAX
15
170
Automatic
return
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
157
onboard Automatic output
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
157
22
Executing the DATA Step – Example
Every variable in a SAS data set has to have a
value.
Remember
Variables that are not read from the data set in
the SET statement are reinitialized in the PDV.
Variables that were read from the SAS data set
will always have a value.
23
Executing the DATA Step – Example
ia.dfwlax
Flight
439
921
114
Date
Dest FirstClass Economy
12/11/00 LAX
20
137
12/11/00 DFW
20
131Total
Reinitialize
12/12/00 LAX
15
170
to missing
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
.
onboard
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
157
24
Executing the DATA Step – Example
ia.dfwlax
Flight
439
921
114
Date
Dest FirstClass Economy
12/11/00 LAX
20
137
12/11/00 DFW
20
131
12/12/00 LAX
15
170
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
921
12/11/00 DFW
20
131
.
onboard
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
157
25
Executing the DATA Step – Example
ia.dfwlax
Flight
439
921
114
Date
Dest FirstClass Economy
12/11/00 LAX
20
137
12/11/00 DFW
20
131
12/12/00 LAX
15
170
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
921
12/11/00 DFW
20
131
151
onboard
Flight Date
Dest FirstClass Economy Total
439
12/11/00 LAX
20
137
157
26
Executing the DATA Step – Example
ia.dfwlax
Flight Date
Dest FirstClass Economy
439
12/11/00 LAX
20
137
921
12/11/00 DFW
20
131
114
12/12/00
LAX
15
170
Automatic
return
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
921
12/11/00 DFW
20
131
151
onboard Automatic output
Flight
Flight
439
439
921
27
Date
Date
12/11/00
12/11/00
12/11/00
Dest
Dest FirstClass
FirstClass Economy
Economy Total
Total
LAX
20
137
157
LAX
20
137
157
DFW
20
131
151
Executing the DATA Step – Example
Flight
439
921
114
ia.dfwlax
Date
Dest FirstClass Economy
12/11/00 LAX
20
137
12/11/00 DFW
20
131
12/12/00 LAX
15
170
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
PDV
Flight Date
Dest FirstClass Economy Total
114
12/12/00 LAX
15
170
185
.
onboard
28
Flight
439
921
114
Date
12/11/00
12/11/00
12/12/00
Dest FirstClass Economy Total
LAX
20
137
157
DFW
20
131
151
LAX
15
170
185
Assignment Statements – Example
proc print data=onboard;
format Date date9.;
run;
The SAS System
Obs
1
2
3
4
5
6
7
8
9
10
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
First
Class
Economy
Total
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
.
45
Why is Total missing in observation 9?
29
Assignment Statements – Example
If any of the variables have missing values in an
expression involving the +, -, * or / operators, the result
will be a missing value.
It could be useful to see a missing value for Total, as it
might indicate that you need to check the data.
However, if you prefer to ignore missing values and see
187 for Total, you will need to use a SAS function.
30
Using SAS Functions
A SAS function is a routine that returns a value that is
determined from specified arguments.
General form of a SAS function:
function-name(argument1,argument2, . . .)
Total=sum(FirstClass,Economy);
31
Using SAS Functions
You must include the parentheses. They are part of the
function syntax. If you forget them, the function name will
be mistaken for a variable name.
function-name(argument1,argument2, . . .)
If there are multiple arguments inside the parentheses,
they must be separated by a comma.
32
Using SAS Functions
SAS functions
perform arithmetic operations
compute sample statistics
– for example: sum, mean, and standard deviation
manipulate SAS dates and process character values
perform many other tasks.
Sample statistics functions
ignore missing values.
33
Using the SUM Function – Example
data onboard;
set ia.dfwlax;
Total=FirstClass+Economy;
run;
proc print data=onboard;
format Date date9.;
run;
Replace the equation
that using + operator
with the SUM function.
Note:
There is no limit on
the number of
variables you can
list inside the SUM
function.
34
data onboard;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
run;
proc print data=onboard;
format Date date9.;
run;
Using the SUM Function – Example
proc print data=onboard;
format Date date9.;
run;
The SAS System
35
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
First
Class
20
20
15
5
14
15
17
7
.
14
Economy
Total
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
Using SAS Functions
Example
What if you wanted the difference between the two
values?
You could use the – operator. There is not a function to
subtract values. However, you could use the SUM
function by adding a negative sign in front of the variable
you want to subtract.
Total=sum(FirstClass,-Economy);
36
Exercise
This exercise reinforces the concepts discussed
previously.
37
Exercises
Use the depression data set to create a new data set called
DepressionData and a list report about economic factors during
the Great Depression. Add two columns to the above data set.
The first new variable called Change should calculate the
difference between the number of banks and the number of banks
suspending operations that year. Format the variable Change with
commas and no decimal places.
The second variable called PercentChange should calculate the
percentage of banks that suspended operations during that year.
Use the PERCENT5. format for the PercentChange column.
The list report should contain the variables:
Year, UnemploymentRate, NumberOfBanks, BankSuspensions,
Change, and PercentChange.
38
Exercises
data Depression;
set depressiondata;
Change=NumberOfBanks-BankSuspensions;
PercentChange=(BankSuspensions/NumberOfBanks);
format change comma7. percentchange percent5.;
run;
proc print data=Depression noobs;
var Year UnemploymentRate NumberOfBanks
BankSuspensions Change PercentChange;
run;
Partial Output
39
Year
Unemployment
Rate
Number
OfBanks
1929
1930
1931
1932
1933
1934
1935
3.3
8.9
15.9
23.6
24.9
21.7
20.1
24,633
22,773
19,970
18,397
15,015
16,096
.
Bank
Suspensions
659
1,350
2,293
1,453
4,000
57
.
Change
Percent
Change
23,974
21,423
17,677
16,944
11,015
16,039
.
3%
6%
11%
8%
27%
0%
.
Exercises
Use the purchases data set to create a new data
set called totalpurchases and a list report.
• The first variable is called BudgetedCost,
which is the Total Items Needed times
the Cost per Item (including the
discount).
• The second variable is called BackOrdered
and is the difference between TotNeeded
and TotPurchased, ignoring missing
values.
Format Cost and BudgetedCost with a dollar
sign and two decimal places.
40
Exercises
data totalpurchases;
set purchases;
BudgetedCost=totneeded*(Cost-Cost*PercentDiscount/100);
BackOrdered=sum(TotNeeded, - TotPurchased);
format cost dollar6.2 budgetedcost dollar9.2;
run;
proc print data=totalpurchases;
run;
Obs
ItemNo
ItemName
1
2
3
4
5
6
7
8
9
636589
167643
891674
197646
324346
354664
994643
764646
546546
Mugs 12 oz
Mugs 20 oz
Wine Glass Fluted
Wine Glass Clear
Cutting Board Maple
Carving Knife 10 inch
Paring Knife
Drink Carafe
Fluted Pitchers
41
Tot
Needed
128
24
100
100
15
30
30
48
36
Tot
Purchased
64
12
85
.
15
15
4
48
40
Cost
$6.80
$7.20
$3.50
$3.50
$11.50
$43.60
$23.50
$34.25
$34.99
Percent
Discount
5
0
0
0
10
15
15
25
20
Budgeted
Back
Cost Ordered
$826.88
$172.80
$350.00
$350.00
$155.25
$1,111.80
$599.25
$1,233.00
$1,007.71
64
12
15
100
0
15
26
0
-4
Using Date Functions
You can use SAS date functions to
create SAS date values
extract information from SAS date values.
42
Date Functions: Create SAS Dates
TODAY( )
obtains the date value from the
system clock and returns the
number of days the date is from
January 1, 1960.
Notice there are no arguments inside the ( ). You will get
an error if you add an argument.
SAS uses the system clock on the computer to return the
date. If you set an incorrect date, the incorrect date will be
returned.
43
Date Functions: Create SAS Dates
MDY(month,day,year) uses numeric month, day, and year
values to return the corresponding
SAS date value.
If you want to supply SAS with a month, day, and year, you can
create a SAS date. This can be a numeric literal or a variable
name that represents a number.
Example: The Declaration of Independence was signed
July 4, 1776. To see this as a SAS date, use the MDY
function.
Date = mdy(7, 4, 1776);
44
-67019
Exercise
This exercise reinforces the concepts discussed
previously.
45
Exercises
What is your birthday as a SAS date value? Use
PROC PRINT to display the value. Verify the date
was calculated successfully by formatting the value
using the DATE9. format.
data bday;
myBday =
run;
46
;
Exercises
Solution:
Use Martin Luther King, Jr.’s birthday for the myBday
value, which is January 15th, 1929.
data bday;
myBday = mdy(1,15,1929);
format mybday date9.;
run;
proc print data=bday;
run;
47
Exercises
Modify the program.
What is today as a SAS date? (format permanently as
DATE9.)
How many days old are you?
If you divide the number of days old you are by 365,
how old are you by SAS calculations?
data bday;
myBday =
today_is =
num_days =
Age =
run;
48
;
;
;
;
Exercises
Solution:
Use Martin Luther King, Jr.’s Birthday for the myBday
value, which is January 15th, 1929.
data bday;
myBday = mdy(1,15,1929);
today_is = today();
num_days = today_is - myBday;
Age = Num_days / 365;
format mybday today_is date9.;
run;
proc print data=bday;
run;
49
Date Functions: Extracting Information
YEAR(SAS-date)
extracts the year from a SAS date
and returns a four-digit value for
year.
Example: The Declaration of Independence was signed
July 4, 1776, which is -67019 as a SAS date. To extract
the year, use the YEAR function.
IndepYear = year(-67019);
50
1776
Date Functions: Extracting Information
QTR(SAS-date)
•
•
•
•
51
extracts the quarter from a SAS
date and returns a number from 1
to 4.
January to March gets a value of 1
April to June gets a value of 2
July to September gets a value of 3
October to December gets a value of 4.
Date Functions: Extracting Information
MONTH(SAS-date)
extracts the month from a SAS
date and returns a number from 1
to 12.
Example: The Declaration of Independence was signed July
4, 1776, which is -67019 as a SAS date. To extract the
month, use the MONTH function.
IndepMonth = month(-67019);
52
7
Date Functions: Extracting Information
WEEKDAY(SAS-date)
extracts the day of the week from
a SAS date and returns a number
from 1 to 7, where 1 represents
Sunday, and so on.
Example: The Declaration of Independence was signed July 4,
1776, which is -67019 as a SAS date. To extract the day of the
week, use the WEEKDAY function.
Freedom = weekday(-67019);
5
(Thursday)
53
Functions
You can also call a function within another function.
Day = weekday(today() );
SAS resolves what is inside the parentheses and works
its way out.
The TODAY function would be executed first and then
SAS would apply the WEEKDAY function to what is
returned from the TODAY function.
54
Review: Date Functions That Create SAS
Dates
55
TODAY( )
obtains the date value from the
system clock.
MDY(mon,day,year)
uses numeric month, day, and
year values to return the
corresponding SAS date value.
Review: Date Functions That Extract
Information from Dates
YEAR(SAS-date)
extracts the year from a SAS date and
returns a four-digit value for year.
QTR(SAS-date)
extracts the quarter from a SAS date and
returns a number from 1 to 4.
MONTH(SAS-date)
extracts the month from a SAS date and
returns a number from 1 to 12.
WEEKDAY(SAS-date)
extracts the day of the week from a SAS
date and returns a number from 1 to 7,
where 1 represents Sunday, and so on.
56
Using the WEEKDAY Function
Add an assignment statement to the DATA step to create
a variable that shows the day of the week that the flight
occurred.
data onboard;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
DayOfWeek=weekday(Date);
run;
Print the data set, but do not display the variables
FirstClass and Economy.
57
Using the WEEKDAY Function – Example
proc print data=onboard;
var Flight Dest Total DayOfWeek Date;
format Date weekdate.;
run;
The SAS System
58
Obs
Flight
Dest
Total
Day
Of
Week
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
157
151
185
90
210
131
183
95
187
45
2
2
3
3
4
4
5
5
6
6
Date
Monday,
Monday,
Tuesday,
Tuesday,
Wednesday,
Wednesday,
Thursday,
Thursday,
Friday,
Friday,
December
December
December
December
December
December
December
December
December
December
11,
11,
12,
12,
13,
13,
14,
14,
15,
15,
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
What if you do not want the variables FirstClass
and Economy in the data set?
Selecting Variables
You can use a DROP or KEEP statement in a DATA step
to control which variables are written to the new SAS data
set.
General form of DROP and KEEP statements:
DROP variables;
KEEP variables;
Use one or the other in the DATA step, not both.
59
Selecting Variables
These two statements are complete opposites, but can
yield the same results.
The DROP statement tells SAS to exclude the variables in
the new data set.
The KEEP statement tells SAS these are the only
variables you want written to the new data set.
Note: The original data set is not changed; only the data
set you are creating in the DATA step is changed.
60
Selecting Variables
If you have a data set with 100 variables
and you want to work with two of them,
which statement would you want to use?
61
...
Selecting Variables
If you have a data set with 100 variables
and you want to work with two of them,
which statement would you want to use?
The KEEP statement.
You could use the DROP statement, but would need to list
98 variables in it.
62
Selecting Variables
If you have a data set with 100 variables
and you do not want two of them,
which statement would you want to use?
63
...
Selecting Variables
If you have a data set with 100 variables
and you do not want two of them,
which statement would you want to use?
The DROP statement.
You could use the KEEP statement, but would need to list
98 variables in it.
64
Selecting Variables
Do not store the variables FirstClass and Economy
in the data set.
data onboard;
set ia.dfwlax;
drop FirstClass Economy;
Total=FirstClass+Economy;
run;
keep Flight Date Dest Total;
D
D
PDV
Flight Date Dest FirstClass Economy Total
.
.
.
.
65
Selecting Variables
data onboard;
set ia.dfwlax;
drop FirstClass Economy;
Total=FirstClass+Economy;
run;
The variables FirstClass and Economy are in the
PDV. You can use them in the DATA step to calculate
Total, but they will not be written to the new data set.
The original data set is unchanged.
D
D
PDV
Flight Date Dest FirstClass Economy Total
.
.
.
.
66
Selecting Variables – Example
proc print data=onboard;
format Date date9.;
run;
The SAS System
Obs
1
2
3
4
5
6
7
8
9
10
67
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
Total
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
157
151
185
90
210
131
183
95
.
45
Creating Variables
Remember
SAS processes the DATA step one statement
at a time. Therefore, the order of the
statements in the DATA step is very important.
Let’s look at this in more detail, using the
example where you created Total.
68
Creating Variables
What happens if you put the statement that
creates Total above the SET statement?
data onboard;
Total=FirstClass+Economy;
set ia.dfwlax;
drop FirstClass Economy;
run;
69
Creating Variables
What happens if we put the statement that
creates Total above the SET statement?
The SAS System
Why is Total listed
first in the output?
Why are some
values missing?
70
Total
Flight
.
157
151
185
90
210
131
183
95
.
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
Creating Variables
Why is Total first in the output?
data onboard;
Total=FirstClass+Economy;
set ia.dfwlax;
drop FirstClass Economy;
run;
SAS creates a spot in the PDV at compile time in the
order that it sees the variables. The first variable it
sees is Total, which is numeric, followed by all the
columns from ia.dfwlax.
71
Creating Variables
Compare the output. Why is there a difference in the
values of Total?
The SAS System
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
The SAS System
Dest
Total
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
157
151
185
90
210
131
183
95
.
45
After SET statement
72
Total
Flight
.
157
151
185
90
210
131
183
95
.
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
Before SET statement
Creating Variables
SAS initializes the PDV at the top of the DATA step. The
SET statement loads the observation from ia.dfwlax
into the PDV.
data onboard;
Total=FirstClass+Economy;
set ia.dfwlax;
drop FirstClass Economy;
run;
SAS initialized the PDV to missing. FirstClass and
Economy both have missing values. Total is
assigned a missing value. SAS then loads the
observation and outputs at the RUN statement.
73
Creating Variables
For the second observation, SAS initializes the PDV at the
top of the DATA step.
data onboard;
Total=FirstClass+Economy;
set ia.dfwlax;
drop FirstClass Economy;
run;
FirstClass and Economy both have values from the
previous row in them. SAS uses the values to calculate
Total. SAS then loads the observation and outputs at
the RUN statement.
74
Creating Variables
Compare the output. Why is there a difference in the
values of Total?
The SAS System
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
The SAS System
Dest
Total
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
157
151
185
90
210
131
183
95
.
45
After SET statement
75
Total
Flight
.
157
151
185
90
210
131
183
95
.
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
Before SET statement
Creating Variables
Remember
SAS processes the DATA step one statement
at a time. Therefore, the order of the
statements in the DATA step is very important.
The KEEP and DROP statements are an
exception to this rule.
They are marked at compile time, so SAS will
always do the KEEP or DROP statements
last, no matter where they are in the step.
76
Exercise – Section 7.1
This exercise reinforces the concepts discussed
previously.
77
Section 7.2
Conditional Processing
Objectives
79
Execute statements conditionally using IF-THEN logic.
Control the length of character variables explicitly with
the LENGTH statement.
Select rows to include in a SAS data set.
Use SAS date constants.
SAS Vocabulary
IF-THEN statement
ELSE statement
UPCASE
DO and END statement
LENGTH
WHERE statement
DELETE statement
Subsetting IF statement
'ddMMMyyyy'd
80
Conditional Execution
International Airlines wants to compute revenue for Los
Angeles and Dallas flights based on the prices in the table
below.
DESTINATION CLASS
AIRFARE
LAX
First
2000
Economy
1200
DFW
First
1500
Economy
900
Revenue = Cost * number First Class + Cost * number of
Economy
Cost is different depending on destination.
81
Conditional Execution
General form of IF-THEN and ELSE statements:
IF expression THEN statement;
ELSE statement;
82
expression contains operands and operators that form
a set of instructions that produce a value.
Only one executable statement is allowed in an IFTHEN or ELSE statement.
Conditional Execution
Remember
Operands are
variable names
constants.
83
Operators are
symbols that request
– a comparison
– a logical operation
– an arithmetic calculation
SAS functions.
Conditional Execution – Example
Compute revenue figures based on flight destination.
DESTINATION CLASS
LAX
First
Economy
DFW
First
Economy
AIRFARE
2000
1200
1500
900
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
84
Conditional Execution – Example
data flightrev; TRUE
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (First Observation)
Flight Date
439
85
Dest First Economy Total Revenue
Class
14955 LAX
20
137
157
.
...
Conditional Execution – Example
data flightrev; TRUE
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (First Observation)
Flight Date
439
86
Dest First Economy Total Revenue
Class
14955 LAX
20
137
157 204400
...
Conditional Execution – Example
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (First Observation)
Flight Date
439
87
Dest First Economy Total Revenue
Class
14955 LAX
20
137
157 204400
...
Conditional Execution – Example
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
Flight Date
982
88
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
...
Conditional Execution – Example
data flightrev; FALSE
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
Flight Date
982
89
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
...
Conditional Execution – Example
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
Flight Date
982
90
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
...
Conditional Execution – Example
data flightrev; FALSE
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
Flight Date
982
91
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
...
Conditional Execution – Example
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
Flight Date
982
92
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
Conditional Execution – Example
proc print data=flightrev;
format Date date9.;
run;
The SAS System
Obs
1
2
3
4
5
6
7
8
9
10
93
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
First
Class
Economy
Total
Revenue
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
204400
147900
234000
.
263200
126900
.
89700
224400
48900
Why are two Revenue values missing?
Conditional Execution – Example
SAS initializes the PDV at the top of the DATA step so that
data from the previous row is not carried forward to the next
row.
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if Dest='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
FALSE
else if Dest='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
Flight Date
982
94
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
Exercise
This exercise reinforces the concepts discussed
previously.
95
Exercises
Use the SeniorTrip3 data set to create
a data set called TripCosts. Calculate
the following costs by creating a Cost
variable where if traveling to the following:
• Washington at a cost of 725 per night
• the Beach at a cost of 475 per night.
96
Exercises
data TripCosts;
set SeniorTrip3;
if Destination='Washington'
then Cost=725*Length;
else if Destination='Beach'
then Cost=475*Length;
run;
proc print data=TripCosts noobs;
run;
97
The SAS System
Destination
Washington
Beach
Washington
Beach
Washington
Beach
Beach
Beach
Washington
Washington
Washington
Washington
Beach
Beach
Beach
Length
Cost
2
2
3
4
2
3
2
3
3
2
4
3
2
3
2
1450
950
2175
1900
1450
1425
950
1425
2175
1450
2900
2175
950
1425
950
The UPCASE Function
You can use the UPCASE function to convert letters from
lowercase to uppercase.
General form of the UPCASE function:
UPCASE (argument)
UPCASE does not change the stored values.
98
Conditional Execution – Example
Use the UPCASE function to convert the Dest values to
uppercase for the comparison.
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if upcase(Dest)='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
99
Conditional Execution – Example
FALSE
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if upcase(Dest)='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
upcase('dfw')='DFW'
Flight Date
982
100
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
...
Conditional Execution – Example
TRUE
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if upcase(Dest)='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
upcase('dfw')='DFW'
Flight Date
982
101
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
.
...
Conditional Execution – Example
TRUE
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if upcase(Dest)='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
PDV (Fourth Observation)
Flight Date
982
102
upcase('dfw')='DFW'
Dest First Economy Total Revenue
Class
14956 dfw
5
85
90
84000
Conditional Execution – Example
proc print data=flightrev;
format Date date9.;
run;
The SAS System
Obs
1
2
3
4
5
6
7
8
9
10
103
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
First
Class
Economy
Total
Revenue
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
Conditional Execution – Example
The values of Dest are not all in uppercase. Why?
The SAS System
Obs
1
2
3
4
5
6
7
8
9
10
104
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
First
Class
Economy
Total
Revenue
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
The UPCASE function converts the data values to
uppercase for search purposes only.
Conditional Execution – Example
Use the UPCASE function to convert the Dest values to
uppercase in an assignment statement.
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
Dest = upcase(Dest);
if Dest ='LAX' then
Revenue=sum(2000*FirstClass,1200*Economy);
else if Dest ='DFW' then
Revenue=sum(1500*FirstClass,900*Economy);
run;
105
Conditional Execution – Example
The values of Dest are now stored in uppercase.
Obs
1
2
3
4
5
6
7
8
9
10
106
Flight
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
First
Class
Economy
Total
Revenue
LAX
DFW
LAX
DFW
LAX
DFW
LAX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
Conditional Execution
If you need more than one statement in your IF/THEN
blocks of code, use the DO and END statements to
execute a group of statements based on a condition.
General form of the DO and END statements:
IF expression THEN DO;
executable statements
END;
ELSE DO;
executable statements
END;
107
Conditional Execution – Example
Use DO and END statements to execute two statements
based on a condition.
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
108
Conditional Execution – Example
proc print data=flightrev;
var Dest City Flight Date Revenue;
format Date date9.;
run;
The SAS System
Obs
Dest
City
Flight
1
2
3
4
5
6
7
8
9
10
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
Los An
Dallas
Los An
Dallas
Los An
Dallas
Los An
Dallas
Los An
Dallas
439
921
114
982
439
982
431
982
114
982
Why are City values truncated?
109
Date
Revenue
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
Conditional Execution – Example
Where is City first defined?
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
Remember
110
SAS creates City in the PDV when it
reads the first variable definition and sets
the name, type, and length.
Variable Lengths – Example
At compile time, the length of a variable is determined the
first time the variable is encountered.
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
6 characters between
end;
the quotes: Length=6
run;
111
Review: The LENGTH Statement
You can use the LENGTH statement to define the length
of a variable explicitly.
General form of the LENGTH statement:
LENGTH variable(s) $ length;
Example:
length City $ 11;
112
The LENGTH Statement – Example
The LENGTH statement must be above the first definition for
the variable City.
data flightrev;
set ia.dfwlax;
length City $ 11;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
113
The LENGTH Statement – Example
proc print data=flightrev;
var Dest City Flight Date Revenue;
format Date date9.;
run;
The SAS System
Obs
1
2
3
4
5
6
7
8
9
10
114
Dest
City
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Flight
439
921
114
982
439
982
431
982
114
982
Date
Revenue
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
The LENGTH Statement – Example
What is the result if the LENGTH statement is above the
SET statement?
data flightrev;
length City $ 11;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
115
The LENGTH Statement – Example
City will be the first variable created in the PDV and the
first variable in the data set.
The SAS System
Obs City
1
2
3
4
5
6
7
8
9
10
116
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Flight
439
921
114
982
439
982
431
982
114
982
First
Date Dest Class Economy Total Revenue
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
The LENGTH Statement – Example
What is the result if the LENGTH statement appears after
the Total= assignment statement?
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
length City $ 11;
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
117
The LENGTH Statement – Example
City now appears after Total in the data set.
The SAS System
Obs Flight
1
2
3
4
5
6
7
8
9
10
118
439
921
114
982
439
982
431
982
114
982
First
Date Dest Class Economy Total City
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Los Angeles
Dallas
Revenue
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
The LENGTH Statement – Example
What is the result if the LENGTH statement appeared after
the conditional logic?
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
length City $ 11;
run;
119
The LENGTH Statement – Example
The first definition of City is Dallas, so the variable
length is 6. The LENGTH statement has no affect.
The SAS System
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
120
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
First
Class
20
20
15
5
14
15
17
7
.
14
Economy
Total
Revenue
City
137
131
170
85
196
116
166
88
187
31
157
151
185
90
210
131
183
95
187
45
204400
147900
234000
84000
263200
126900
233200
89700
224400
48900
Los An
Dallas
Los An
Dallas
Los An
Dallas
Los An
Dallas
Los An
Dallas
The LENGTH Statement – Example
You will see a note in the SAS log that the length of City
was already set and to use the LENGTH statement as the
first statement in the DATA step to declare the length of
City.
data flightrev;
set ia.dfwlax;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
length City $ 11;
run;
121
Subsetting Rows
In a DATA step, you can subset the rows (observations) in
a SAS data set with a
WHERE statement (from Chapter 4)
DELETE statement
subsetting IF statement.
The WHERE statement in a DATA step is the same as the
WHERE statement you saw in a PROC step.
122
Deleting Rows
You can use a DELETE statement to control which rows
are not written to the SAS data set you are creating.
General form of the DELETE statement:
IF expression THEN DELETE;
The expression can be any SAS expression.
The original data set is not changed.
123
Deleting Rows – Example
Modify the DATA step:
Delete rows in the new data set that have a Total value
that is less than or equal to 175.
124
Deleting Rows – Example
Delete rows that have a Total value that is less than or
equal to 175.
data over175;
set ia.dfwlax;
length City $ 11;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
if Total le 175 then delete;
run;
125
Deleting Rows – Example
proc print data=over175;
var Dest City Flight Date Total Revenue;
format Date date9.;
run;
Output
The SAS System
Obs
1
2
3
4
126
Dest
LAX
LAX
LaX
LAX
City
Los
Los
Los
Los
Angeles
Angeles
Angeles
Angeles
Flight
114
439
431
114
Date
12DEC2000
13DEC2000
14DEC2000
15DEC2000
Total
185
210
183
187
Revenue
234000
263200
233200
224400
Deleting Rows – Example
What is the result if the conditional logic is placed after the
assignment statement?
data over175;
set ia.dfwlax;
length City $ 11;
Total=sum(FirstClass,Economy);
if Total le 175 then delete;
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
127
Deleting Rows – Example
proc print data=over175;
var Dest City Flight Date Total Revenue;
format Date date9.;
run;
The SAS System
Obs
1
2
3
4
Dest
LAX
LAX
LaX
LAX
City
Los
Los
Los
Los
Angeles
Angeles
Angeles
Angeles
Flight
114
439
431
114
Date
12DEC2000
13DEC2000
14DEC2000
15DEC2000
Total
185
210
183
187
Revenue
234000
263200
233200
224400
There is no need to calculate Revenue and assign City
if the row does not meet the subsetting criterion. Moving
the statement up makes the program more efficient and
the results are the same.
128
Deleting Rows – Example
What is the result if the conditional logic is above the
assignment statement?
data over175;
set ia.dfwlax;
length City $ 11;
if Total le 175 then delete;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
129
Deleting Rows – Example
There are no
errors or
warnings in the
SAS log.
There are zero
observations in
the data set.
PROC PRINT
did not display
results.
130
Deleting Rows – Example
The code tests
the value of
Total before
it is calculated.
The initial and
reinitialized
value for
Total is a
missing value.
Missing is not
greater than
175, so all rows
will be deleted.
131
Exercise
This exercise reinforces the concepts discussed
previously.
132
Exercise
A representative from Johns Hopkins University is
coming to school to visit with students. You want to
notify students who are interested in attending.
• Using the schoolchoices data set, create
a data set called HOPKINS containing only
the students who are interested in attending
Johns Hopkins University as either their first,
second, or third choice.
• The new data set should only contain the
student’s ID, first and last names, as well as
the first through third school choice.
133
Exercises
Students interested in Johns Hopkins
Obs
1
2
3
4
5
6
7
8
9
Obs
1
2
3
4
5
6
7
8
9
134
Student
ID
4567
5052
5100
5127
5678
5687
8522
9864
9997
FirstName
Last
Name
firstchoice
Gabriella
Alicia
Paula
Clarissa
Sheila
Terrell
Stephan
Ismael
Tiffany
Crowley
Noboa
Perez
Daly
Dawson
Stires
O'Flynn
Miranda
Pierre
Notre Dame
Tulane University
University of North Carolina - Chapel Hill
Gonzaga University
Columbia University
Johns Hopkins University
Johns Hopkins University
Johns Hopkins University
Johns Hopkins University
secondchoice
thirdchoice
Johns Hopkins University
Johns Hopkins University
Baylor University
University of Massachusetts System
East Carolina University
Syracuse University
University of Cincinnati
Notre Dame
Gonzaga University
University of Central Florida
Elon University
Johns Hopkins University
Johns Hopkins University
Johns Hopkins University
University of Pennsylvania
Purdue
University of Texas - Austin
Georgia Tech
Exercises
Modify the DATA step to create a new column called
Choice.
135
The value of Choice should be First,
Second, or Third.
Be sure to include Choice in your new data set.
After you have verified that your output is correct,
modify the DATA step so that the variables
FirstChoice, SecondChoice, and
ThirdChoice are not stored in the new data
set.
Exercises
The school wants to see students who have a High
School GPA 3.5 and above.
Modify your program so that the data set contains only
students that meet that criterion.
Add the GPA column to the data set, so you can verify it
is correct.
• Replace the title on the second line with:
with at least a 3.5 GPA
• Save your program as
Hopkins_students2.sas to use later.
136
Exercises
Students interested in Johns Hopkins
with at least a 3.5 GPA
Obs
1
2
3
Obs
1
2
3
137
Student
ID
First
Name
Last
Name
HS_GPA
5678
5687
9864
Sheila
Terrell
Ismael
Dawson
Stires
Miranda
3.80
4.00
4.00
secondchoice
East Carolina University
Syracuse University
Notre Dame
firstchoice
Columbia University
Johns Hopkins University
Johns Hopkins University
thirdchoice
Johns Hopkins University
University of Pennsylvania
University of Texas - Austin
Selecting Rows
You can use a subsetting IF statement to control which
rows are written to the SAS data set.
General form of the subsetting IF statement:
IF expression;
The expression can be any SAS expression.
The subsetting IF statement is valid only in a DATA step.
138
Selecting Rows
A subsetting IF controls which rows are processed. If the
expression is TRUE, the observation continues to
process through the DATA step.
At the end of the step is a RUN statement and the
automatic output to the data set. SAS then returns to the
top of the DATA step (automatic return).
If the expression is FALSE, then the observation stops
processing and SAS returns to the top of the DATA step.
139
Process Flow of a Subsetting IF
Subsetting IF:
DATA Statement
Read Observation
or Record
IF Expression
True
Continue Processing
Observation
Output Observation to SAS
Data Set
140
Process Flow of a Subsetting IF
Subsetting IF:
DATA Statement
Read Observation
or Record
IF Expression False
141
...
Review: Process Flow of a Subsetting IF
Subsetting IF:
DATA Statement
Read Observation
or Record
IF Expression False
True
Continue Processing
Observation
Output Observation to SAS
Data Set
142
Selecting Rows – Example
Select rows that have a Total value that is greater than
175.
data over175;
set ia.dfwlax;
length City $ 11;
Total=sum(FirstClass,Economy);
if Total gt 175;
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
143
Selecting Rows – Example
proc print data=over175;
var Dest City Flight Date Total Revenue;
format Date date9.;
run;
The SAS System
Obs
1
2
3
4
144
Dest
LAX
LAX
LaX
LAX
City
Los
Los
Los
Los
Angeles
Angeles
Angeles
Angeles
Flight
114
439
431
114
Date
12DEC2000
13DEC2000
14DEC2000
15DEC2000
Total
185
210
183
187
Revenue
234000
263200
233200
224400
Selecting Rows
The variable Date in the ia.dfwlax data set contains
SAS date values (numeric values).
01JAN1960
01JAN1961
14DEC2000
store
0
366
???
01/01/1960
12/14/2000
display
01/01/1959
What if you only wanted flights that were before a specific
date, such as 14DEC2000?
145
Selecting Rows – Example
Use the MDY function.
data over175;
set ia.dfwlax;
length City $ 11;
Total=sum(FirstClass,Economy);
if Total gt 175 and Date < mdy(12,14,2000);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
146
Using SAS Dates – Example
proc print data=over175;
var Dest City Flight Date Total Revenue;
format Date date9.;
run;
The SAS System
Obs
1
2
Dest
LAX
LAX
City
Flight
Los Angeles
Los Angeles
114
439
Date
12DEC2000
13DEC2000
There is another method to use dates.
147
Total
185
210
Revenue
234000
263200
Using SAS Date Constants
The date constant 'ddMMMyyyy'd (example:
'14dec2000'd) creates a SAS date value from the date
enclosed in quotes.
148
dd
is a one- or two-digit value for the day.
MMM
is a three-letter abbreviation for the
month.
yyyy
is a two- or four-digit value for the year.
d
is required to convert the quoted string to
a SAS date.
Using SAS Date Constants – Example
data over175;
set ia.dfwlax;
length City $ 11;
Total=sum(FirstClass,Economy);
if Total gt 175 and Date lt '14dec2000'd;
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
Remember the d after the quotes; otherwise, SAS will
think it is a character string.
149
Using SAS Date Constants – Example
proc print data=over175;
var Dest City Flight Date Total Revenue;
format Date date9.;
run;
The SAS System
Obs
1
2
150
Dest
LAX
LAX
City
Flight
Los Angeles
Los Angeles
114
439
Date
12DEC2000
13DEC2000
Total
Revenue
185
210
234000
263200
Subsetting Data – Example
If the data were in a raw data file, use INFILE and INPUT
statements.
data over175;
infile 'raw-data-file';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
length City $ 11;
Total=sum(FirstClass,Economy);
if Total gt 175 and Date lt '14dec2000'd;
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
151
Subsetting Data – Example
proc print data=over175;
var Dest City Flight Date Total Revenue;
format Date date9.;
run;
The SAS System
Obs
1
2
152
Dest
LAX
LAX
City
Flight
Los Angeles
Los Angeles
114
439
Date
12DEC2000
13DEC2000
Total
Revenue
185
210
234000
263200
Reference: WHERE or Subsetting IF?
Step and Usage
PROC step
WHERE
IF
Yes
No
No
No
Yes
Yes
Yes
Yes
DATA step (source of variable)
INPUT statement
Assignment statement
SET statement (single data set)
Use WHERE in PROC steps.
Use IF in DATA step.
You can use the WHERE if all the variables in the WHERE
statement exist in the SAS data set. Total could not be
used because you created it in the DATA step.
153
WHERE or Subsetting IF – Example
You can use a WHERE statement and a subsetting IF
statement in the same step.
data over175;
set ia.dfwlax;
where Date lt '14dec2000'd;
length City $ 11;
Total=sum(FirstClass,Economy);
if Total gt 175;
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
154
WHERE or Subsetting IF – Example
proc print data=over175;
var Dest City Flight Date Total Revenue;
format Date date9.;
run;
The SAS System
Obs
1
2
155
Dest
LAX
LAX
City
Flight
Los Angeles
Los Angeles
114
439
Date
12DEC2000
13DEC2000
Total
Revenue
185
210
234000
263200
WHERE or Subsetting IF – Example
The IF statement read
10 observations from
the original data set.
The WHERE and IF
statements read 6
observations from the
original data set.
156
WHERE or Subsetting IF – Example
The WHERE statement is processed before the row is
loaded into the PDV. SAS will not read the data into the
PDV unless the expression(s) in the WHERE statement
is/are true.
Therefore, all variables in the WHERE statement must be
in the data set that SAS is reading (input data set).
157
WHERE or Subsetting IF – Example
Will this WHERE statement execute?
data over175;
set ia.dfwlax;
where Date lt '14dec2000'd and Total gt 175;
length City $ 11;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
158
WHERE or Subsetting IF – Example
Will this WHERE statement execute?
data over175;
set ia.dfwlax;
where Date lt '14dec2000'd and Total gt 175;
length City $ 11;
Total=sum(FirstClass,Economy);
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
159
No, because Total is created in this DATA step. The WHERE
statement can process only variables that exist in the input data set,
ia.dfwlax.
WHERE or Subsetting IF – Example
SAS Log
160
WHERE or Subsetting IF – Example
Move the WHERE statement after the assignment
statement.
data over175;
set ia.dfwlax;
length City $ 11;
Total=sum(FirstClass,Economy);
where Date lt '14dec2000'd and Total gt 175;
if upcase(Dest)='DFW' then do;
Revenue=sum(1500*FirstClass,900*Economy);
City='Dallas';
end;
else if upcase(Dest)='LAX' then do;
Revenue=sum(2000*FirstClass,1200*Economy);
City='Los Angeles';
end;
run;
161
WHERE or Subsetting IF – Example
Log
Total is created in this DATA step. The WHERE statement
can process only variables that exist in the input data set
ia.dfwlax.
162
Exercise – Section 7.2
This exercise reinforces the concepts discussed
previously.
163