Chapter 6 Part I - Spruce Creek High School

Download Report

Transcript Chapter 6 Part I - Spruce Creek High School

Chapter 6 Part I
Creating SAS® Data Sets
Section 6.1
Reading Raw Data Files:
Column Input
Objectives




3
Create a temporary SAS data set from a raw data file.
Create a permanent SAS data set from a raw data file.
Explain how the DATA step processes data.
Read standard data using column input.
SAS Code/Vocabulary






4
DATA
INFILE
INPUT
PAD
Compilation
Execution
Reading Raw Data Files
Data for flights from New York to Dallas (DFW) and Los
Angeles (LAX) are stored in a raw data file.
Create a SAS data set from the raw data.
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
Description Column
92112/11/00DFW 20131
Flight Number s 1- 3
11412/12/00LAX 15170
Date
4-11
98212/12/00dfw 5 85
Destination
12-14
43912/13/00LAX 14196
First Class
15-17
98212/13/00DFW 15116
Passengers
43112/14/00LaX 17166
Economy
18-20
98212/14/00DFW 7 88
Passengers
11412/15/00LAX
187
98212/15/00DFW 14 31
5
Accessing Data Sources
Raw
Data
File
Data
Entry
129986542,
Fred,Barne
s,54,65454
3681,Alici
a,Burnett,
23,2414533
84,Connor,
Coats
Other
Software
File
Conversion Process
DATA
Step
FSEDIT
FSVIEW
SAS
Data
Set
6
SAS/ACCESS
Software
LastName
FirstName
Age
TORRES
LANGKAMM
SMITH
WAGSCHAL
TOERMOEN
JAN
SARAH
MICHAEL
NADJA
JOCHEN
23
46
71
37
16
Creating a SAS Data Set
In order to create a SAS data set
from a raw data file, you must do
the following:
1. Start a DATA step and name
the SAS data set being created
(DATA statement).
Raw Data File
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
DATA Step
data SAS-data-set-name;
7
...
Creating a SAS Data Set
In order to create a SAS data set
from a raw data file, you must do
the following:
1. Start a DATA step and name
the SAS data set being created
(DATA statement).
2. Identify the location of the
raw data file to read
(INFILE statement)
8
Raw Data File
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
DATA Step
data SAS-data-set-name;
infile 'raw-data-filename';
...
Creating a SAS Data Set
In order to create a SAS data set
from a raw data file, you must do
the following:
1. Start a DATA step and name
the SAS data set being created
(DATA statement).
2. Identify the location of the
raw data file to read
(INFILE statement).
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
DATA Step
data SAS-data-set-name;
infile 'raw-data-filename';
input input-specifications;
run;
3. Describe how to read the data
fields from the raw data file
(INPUT statement).
9
Raw Data File
Creating a SAS Data Set – Step 1
General form of the DATA statement:
DATA libref.SAS-data-set(s);
Remember
10
The DATA data step creates a SAS data set.
You provide the name of the data set that you
want to create in this statement.
Creating a SAS Data Set – Step 1
Example:
This DATA statement creates a temporary SAS data
set named dfwlax:
data work.dfwlax;
11
Creating a SAS Data Set – Step 1
Example:
This DATA statement creates a permanent SAS data set
named dfwlax in the ia library:
libname ia 'SAS-data-library';
data ia.dfwlax;
12
Pointing to a Raw Data File – Step 2
General form of the INFILE statement:
INFILE 'filename' <options>;
The INFILE statement points to the raw data file,
or input file, in order to create the new data set.
Remember
13
Pointing to a Raw Data File – Step 2
Examples:
infile 'c:\workshop\winsas\prog1\dfwlax.dat';

14
Options in the INFILE statement are discussed later
in this chapter.
Reading Data Fields – Step 3
The DATA statement tells SAS what data set you are
creating.
The INFILE statement tells SAS what file to read.
You must then define the following:
 variable names
 variable type (character or numeric)
 physical location for each field
15
Reading Data Fields – Step 3
General form of the INPUT statement:
INPUT input-specifications;
You will learn several methods of input. Each uses a
different mode to read variables from the raw data file.
The DATA and INFILE statements remain the same.
16
Reading Data Fields – Step 3
Input-specifications
 name the SAS variables
 identify the variables as character or numeric
 specify the locations of the fields in the raw data
 can be specified as
– column
– formatted
– list (delimited) input.
17
Reading Data Fields
Column and
formatted input are
similar in that the
values are in fixed
fields.
For example, the date
always starts in column
4.

18
Delimited input, also
known as list input,
indicates that values are
separated by a delimiter.
The values are not in fixed
fields. A comma-separated
file (CSV) is an example of
a delimited file.
1
1
2
1---5----0----5----0
1
1
2
1---5----0----5----0---
43912/11/00LAX
92112/11/00DFW
11412/12/00LAX
98212/12/00dfw
43912/13/00LAX
439,12/11/00,LAX,20,137
921,12/11/00,DFW,20,131
114,12/12/00,LAX,15,170
982,12/12/00,dfw,5,85
439,12/13/00,LAX,14,196
20137
20131
15170
5 85
14196
Column input is the simplest of these methods.
Reading Data Using Column Input
Column input is appropriate for reading
 data in fixed columns
 standard character and numeric data.
Standard numerics are
• positive and negative numbers
• numbers with decimals
• exponential notation.
Standard numerics are not
• dates
• values that contain special characters
such as commas or dollar signs.
Examples of standard numeric data:
15
19
-15
15.4
+1.23
1.23E3
-1.23E-3
Reading Data Using Column Input
General form of an INPUT statement using column input:
INPUT variable <$> startcol-endcol . . ;
The $ indicates the
variable is a
character variable.
Column
number
where the
variable
begins
Column
number
where the
variable
ends
This pattern is repeated for every variable that you want
to read.
20
Reading Data Using Column Input
Raw Data File
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
Read the raw data file
using column input.
DATA Step
data SAS-data-set-name;
infile 'raw-data-filename'
input variable <$> startcol-endcol ...;
run;
SAS Data Set
21
Flight Date
Dest
439
921
114
LAX
DFW
LAX
12/11/00
12/11/00
12/12/00
FirstClass Economy
20
20
15
137
131
170
Reading Raw Data Files
Column Numbers
Data for flights from
New York to Dallas
(DFW) and Los Angeles
(LAX) is stored in a raw
data file.
Data
22
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 14 31
The Raw Data
Use column input to read data
that is in fixed columns.
Flight
Description Column
Flight Number s 1- 3
4-11
Date
12-14
Destination
15-17
First Class
Passengers
18-20
Economy
Passengers
23
Date
Dest First Econ
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 14 31
The Raw Data
How would you write the input statement to read
the fields?
24
Reading Data Using Column Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
Start with the keyword INPUT, followed by the first
variable that you want to read.
What date type is Flight?
25
...
Reading Data Using Column Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
Read Date next.
Why are you creating Date as a character variable?
26
Reading Data Using Column Input
Why are you creating Date as a character variable?
Column input is appropriate for reading
 data in fixed columns
 standard character and numeric data.
Standard numerics are
• positive and negative numbers
• numbers with decimals
• exponential notation.
27
Standard numerics are not
• dates
• values that contain special characters
such as commas or dollar signs.
Reading Data Using Column Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
The leading blank will be ignored.
What data type is FirstClass?
28
...
Reading Data Using Column Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
The leading blank will be ignored.
What data type is FirstClass?
It is numeric; there is no $ to indicate character.
29
Reading Data Using Column Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
30
Reading Data Using Column Input
Column input differs from delimited input in that
 you can read variables in any order
 you do not have to read all of the variables.
How can you read Date first, and not read
FirstClass at all?
31
Reading Data Using Column Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input Date $ 4-11 Flight $ 1-3
Dest $ 12-14
Economy 18-20;
List Date first in the INPUT statement and exclude the
reading instruction for FirstClass.
32
Create Temporary SAS Data Sets
Store the dfwlax data set in the Work library.
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Partial SAS log
NOTE: The data set WORK.DFWLAX has
10 observations and 5 variables.
33
Create Permanent SAS Data Sets
Add a LIBNAME statement to the previous DATA step to
permanently store the dfwlax data set.
libname ia 'SAS-data-library';
data ia.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Partial SAS log
NOTE: The data set WORK.DFWLAX has
10 observations and 5 variables.
34
Looking Behind the Scenes
The DATA step is processed in two phases:
 Compilation
 Execution
Compilation sets up the framework of the data set matrix
and creates the descriptor portion of the SAS data set.
Execution fills in the values from the raw data and creates
the data in the SAS data set.
35
Looking Behind the Scenes
At compile time, SAS creates
1. an input buffer to hold the current raw data file record
that is being processed
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
36
...
Looking Behind the Scenes
At compile time, SAS creates
1. an input buffer to hold the current raw data file record
that is being processed
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
2. a program data vector (PDV) to hold the current SAS
observation
Flight Date
$ 3
$ 8
Dest
$ 3
FirstClass Economy
N 8
N 8
Every variable is represented in the PDV.
37
...
Looking Behind the Scenes
At compile time, SAS creates
1. an input buffer to hold the current raw data file record
that is being processed
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
2. a program data vector (PDV) to hold the current SAS
observation
Flight Date
$ 3
$ 8
Dest
$ 3
FirstClass Economy
N 8
N 8
3. the descriptor portion of the output data set.
Flight Date
$ 3
$ 8
38
Dest
$ 3
FirstClass Economy
N 8
N 8
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
The DATA statement creates an output data set.
39
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
The INFILE statement creates the input buffer to hold one
record at a time from the raw data file.
The default length of the buffer is 256 bytes for Windows.
40
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
SAS then reads the INPUT statement, creating a spot in
memory for each variable.
This is called the program data vector (PDV).
41
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
The PDV stores the Name, Type, and Length of
each variable.
42
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
PDV
Flight
$ 3
43
...
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
PDV
Flight Date
$ 3
$ 8
44
...
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
PDV
Flight Date
$ 3
$ 8
45
Dest
$ 3
...
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
PDV
Flight Date
$ 3
$ 8
46
Dest
$ 3
FirstClass
N 8
...
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
PDV
Flight Date
$ 3
$ 8
47
Dest
$ 3
FirstClass Economy
N 8
N 8
Looking Behind the Scenes
SAS reaches the end of the DATA step at the RUN
statement.
 There are no more variables to create at the end of the
DATA step.
 SAS will create the complete descriptor portion of the
SAS data set.
48
Compiling the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
PDV
Flight Date
Dest
FirstClass Economy
dfwlax descriptor portion
49
Flight Date
$ 3
$ 8
Dest
$ 3
FirstClass Economy
N 8
N 8
Looking Behind the Scenes
After compilation is complete, SAS enters execution.
 The DATA step is a loop.
 SAS processes one record at a time from the raw data
file.
 Each time through the DATA step loop, one SAS
observation is processed.
50
Looking Behind the Scenes
51

SAS first initializes the values in the PDV to missing.

Character variables are initialized to a blank and
numeric variables are initialized to a dot or period.

Everything in a data set will have a value. Missing is a
valid SAS value.
Executing the DATA Step
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
run;
Input Buffer
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
PDV
Flight Date
Dest
FirstClass Economy
Dest
FirstClass Economy
dfwlax
Flight Date
52
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
dfwlax
PDV
Input Buffer
53
Flight Date
Dest
FirstClass Economy
.
.
Flight Date
Dest
FirstClass Economy
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
Looking Behind the Scenes
The INPUT statement tells SAS to load the record into the
input buffer.
The pointer in the raw data file then moves down to the
next line in the raw data file.
54
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
55
Flight Date
Dest
FirstClass Economy
.
.
Flight Date
Dest
FirstClass Economy
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
56
Flight Date
439
Dest
FirstClass Economy
.
.
Flight Date
Dest
FirstClass Economy
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
57
Flight Date
439
12/11/00
Dest
FirstClass Economy
.
.
Flight Date
Dest
FirstClass Economy
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
58
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
.
.
Flight Date
Dest
FirstClass Economy
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
59
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
.
Flight Date
Dest
FirstClass Economy
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
60
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
137
Flight Date
Dest
FirstClass Economy
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
Looking Behind the Scenes
SAS reaches the end of the DATA step at the RUN
statement.
At the RUN statement, SAS outputs what is in the PDV to
the data set. This is called automatic output.
61
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
62
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
137
Automatic
output
Flight
Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
137
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
AutomaticDest
return$ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
Looking Behind the Scenes
SAS then loops back to the top of the DATA step.
SAS then reinitializes the values in the PDV to missing.
63
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
64
Flight Date
Dest
FirstClass Economy
.
.
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
137
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
Reinitialize variables
11412/12/00LAX 15170
to missing
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
4 3 9 1 2 / 1 1 / 0 0 L A X
2 0 1 3 7
dfwlax
PDV
Input Buffer
65
Flight Date
Dest
FirstClass Economy
.
.
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
137
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
9 2 1 1 2 / 1 1 / 0 0 D F W
2 0 1 3 1
dfwlax
PDV
Input Buffer
66
Flight Date
Dest
FirstClass Economy
.
.
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
137
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
9 2 1 1 2 / 1 1 / 0 0 D F W
2 0 1 3 1
dfwlax
PDV
Input Buffer
67
Flight Date
921
12/11/00
Dest
DFW
FirstClass Economy
20
131
Flight Date
439
12/11/00
Dest
LAX
FirstClass Economy
20
137
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
...
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
9 2 1 1 2 / 1 1 / 0 0 D F W
2 0 1 3 1
dfwlax
PDV
Input Buffer
68
Flight Date
921
12/11/00
Dest
DFW
FirstClass Economy
20
131
Automatic
output
Flight
Date
439
12/11/00
921
12/11/00
Dest
LAX
DFW
FirstClass Economy
20
137
20
131
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
AutomaticDest
return$ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
Executing the DATA Step
1
2
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1 1 4 1 2 / 1 2 / 0 0 L A X
1 5 1 7 0
dfwlax
PDV
Input Buffer
69
Flight Date
Dest
FirstClass Economy
.
.
Flight
439
921
114
Dest
LAX
DFW
LAX
FirstClass Economy
20
137
20
131
15
170
Date
12/11/00
12/11/00
12/12/00
Raw Data
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 Date $ 4-11
Dest $ 12-14 FirstClass 15-17
Economy 18-20;
43912/11/00LAX 20137
run;
92112/11/00DFW 20131
11412/12/00LAX 15170
DATA Step Execution: Summary
Compile Program
Initialize Variables
to Missing (PDV)
Execute INPUT
Statement
Execute Other
Statements
Output to
SAS Data Set
70
End of
File?
No
Automatic Output
Yes
Next
Step
Access Temporary SAS Data Sets
proc print data=work.dfwlax;
run;
The SAS System
71
Obs
Flight
Date
Dest
First
Class
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
12/11/00
12/11/00
12/12/00
12/12/00
12/13/00
12/13/00
12/14/00
12/14/00
12/15/00
12/15/00
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
Economy
137
131
170
85
196
116
166
88
187
31
Access Permanent SAS Data Sets
To access a permanently stored SAS data set
 submit a LIBNAME statement to assign a libref to the
SAS data library
 use the libref as the first-level name of the SAS data
set.
The LIBNAME statement needs to be submitted
once per SAS session.
Remember
72
Access Permanent SAS Data Sets
libname ia 'SAS-data-library';
proc print data=ia.dfwlax;
run;
The SAS System
73
Obs
Flight
Date
Dest
First
Class
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
12/11/00
12/11/00
12/12/00
12/12/00
12/13/00
12/13/00
12/14/00
12/14/00
12/15/00
12/15/00
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
Economy
137
131
170
85
196
116
166
88
187
31
Exercise
This exercise reinforces the concepts discussed
previously.
74
Exercises
The wake_nc schools.txt raw data file
contains data about high schools in Wake County,
North Carolina, USA.
The town, median home value for homes in the area,
name of the school, number of students, and ratio of
teachers to students are in this file.
You want to read the data into SAS.
75
Exercises
Open the wake_nc schools.txt raw data file.
Write the names of the variables and their associated
column numbers.
Variable
Town
MHV
HighSchool
NumStudents
TSRatio
76
Column Ranges
Exercises
Using the wake_nc schools.txt raw data file
and column input, create a temporary data set named
wakenc.
Produce a listing report:
1. Label:
• MHV as Median Housing Value
• TSRatio as Teacher/Student Ratio
• Numstudents as Number of Students
2. Give titles on the first two lines of your output:
Wake County
High School Data
3. No date, no observations, no page number
77
Exercises
Wake County
High School Data
Town
Apex
Apex
Apex
Cary
Cary
Fuquay-varina
Garner
Holly Springs
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Wake Forest
Wake Forest
Wendell
Zebulon
78
Median
Housing
Value
$182,000.00
$182,000.00
$182,000.00
$181,690.00
$181,690.00
$131,000.00
$117,600.00
$153,100.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$149,600.00
$149,600.00
$96,500.00
$97,700.00
HighSchool
Apex High School
Athens Drive High @ Middle Ck
Middle Creek High School
Cary High School
Green Hope High School
Fuquay-varina High School
Garner High School
Community Partners Charter High
Athens Drive High School
Baker Charter High School
Gov Morehead School
Leesville Road High School
Longview
Millbrook High School
Needham Broughton High School
Phillips High School
Raleigh Charter High School
Sanderson High School
Southeast Raleigh High School
Wakefield High School
William G Enloe High School
Franklin Academy
Wake Forest-rolesville High
East Wake High School
East Wake Academy
Num
Students
2,047
1,747
1,220
2,209
2,005
1,518
1,867
121
1,769
40
68
2,085
93
2,040
2,171
217
475
1,770
2,088
1,818
2,344
791
1,507
1,975
508
Teacher/
Student
Ratio
1:17
1:14
1:23
1:19
1:18
1:15
1:15
1:10
1:14
1:13
1:2
1:17
1:4
1:16
1:17
1:8
1:12
1:16
1:16
1:19
1:16
1:12
1:21
1:16
1:14
Exercises
data work.wakenc;
infile 'wake_nc schools.txt';
input Town $1-13 MHV $21-32 HighSchool $34-65
NumStudents $69-77 TSRatio $78-83;
run;
options nodate nonumber;
proc print data=work.wakenc split=' ' noobs;
label MHV='Median Housing Value'
NumStudents='Number of Students'
HighSchool= 'High School'
TSRatio='Teacher/Student Ratio';
title1 'Wake County';
title3 'High School Data';
run;
79
Exercise – Section 6.1
This exercise reinforces the concepts discussed
previously.
80
Section 6.2
Reading Raw Data Files:
Formatted Input
Objectives


82
Read standard and nonstandard character and
numeric data using formatted input.
Read date values and convert them to SAS date
values.
SAS Code/Vocabulary



83
@n
+n
Informat
Reading Data Using Formatted Input
The Date variable that you created in the last section
was a character type.
To read Date as a date value, you need to use formatted
input because the date is stored as mm/dd/yy.
84
Reading Data Using Formatted Input
Formatted input is appropriate for reading the following:
 data in fixed columns
 standard and nonstandard character and numeric data
 calendar values to be converted to SAS date values
85
Reading Data Using Formatted Input
General form of the INPUT statement with formatted
input:
INPUT pointer-control variable informat ...;
Formatted input is used to read data values by doing the
following:
 moving the input pointer to the starting position of the
field
 specifying a variable name
 specifying an informat
86
Reading Data Using Formatted Input
Pointer controls:
@n moves the pointer to column n.
absolute pointer (recommended)
+n
moves the pointer n positions to the right.
relative pointer
An informat specifies
 the width of the input field
 how to read the data values that are stored in the field.
87
What Is a SAS Informat?
An informat is an instruction that SAS uses to read data
values.
SAS informats have the following form:
Indicates
a character
informat
<$>informat-namew.<d>
Number of
decimal places
Informat
name
Total width
of the field
to read
Required
delimiter
Do not forget the dot at the end of the informat name.
88
What Is a SAS Informat?
An informat is an instruction that SAS uses to read data
values.
The date was stored as mm/dd/yy. You need to remove
the slashes to read the date portion.
89
Selected Informats
8. or 8.0
reads 8 columns of numeric data.
Raw Data Value
Informat
1 2 3 4 5 6 7
8.0
1 2 3 4 . 5 6 7
8.0
8.2
1 2 3 4 5 6 7
2
1 2 3 4 . 5 6 7
2
reads 8 columns of numeric data and may insert
a decimal point in the value if one is not present.
Raw Data Value
90
SAS Data Value
Informat
1 2 3 4 5 6 7
8.2
1 2 3 4 . 5 6 7
8.2
SAS Data Value
1 2 3 4 5 . 6 7
2
1 2 3 4 . 5 6 7
2
What Is a SAS Informat?
Rules for the w.d informat:
 The w tells SAS to read in w columns of numeric data
from the raw data, where w is a number.
 If there is a decimal in the number, SAS retains the
decimal.
 If there is not a decimal in the number, SAS inserts a
decimal d positions from the right.
91
Selected Informats
$8.
reads 8 columns of character data
and removes leading blanks.
Raw Data Value
J A M E S
Informat
$8.
SAS Data Value
J A M E S
A
J
$CHAR8. reads 8 columns of character data and
preserves leading blanks.
Raw Data Value
J A M E S
92
Informat
$CHAR8.
SAS Data Value
J A M E S
What Is a SAS Informat?
!
Using the $CHAR8. informat stores this
in the data:
J A M E S
The statement where name='JAMES'; will not
select the row above.
J A M E S
93
=
J A M E S
A
J
Selected Informats
COMMA7. reads 7 columns of numeric data and removes
selected nonnumeric characters such as dollar
signs and commas.
Raw Data Value
$ 1 2 , 5 6 7
Informat
COMMA7.0
SAS Data Value
2
1 2 5
3
6 7
MMDDYY8. reads dates of the form 10/29/01.
Raw Data Value
1 0 / 2 9 / 0 1
94
Informat
MMDDYY8.
SAS Data Value
1 5 2 7 7
Working with Date Values
Date values that are stored as SAS dates are special
numeric values.
A SAS date value is interpreted as the number of days
between January 1, 1960, and a specific date.
01JAN1959
01JAN1960
01JAN1961
informat
-365
0
366
01/01/1960
01/01/1961
format
01/01/1959
95
Convert Dates to SAS Date Values
SAS uses date informats to read and convert dates to
SAS date values.
Examples:
Raw Data
Value
10/29/2001
10/29/01
29OCT2001
29/10/2001
Informat
MMDDYY10.
MMDDYY8.
DATE9.
DDMMYY10.
Converted
Value
15277
15277
15277
15277
Number of days between
01JAN1960 and 29OCT2001
96
Convert Dates to SAS Date Values
Your informat type must match your data type. If not, you
will get an error.
Examples:
Raw Data
Value
10/29/2001
10/29/01
29OCT2001
29/10/2001
Informat
MMDDYY10.
MMDDYY8.
DATE9.
DDMMYY10.
Converted
Value
.
MMDDYY10. is not the informat to use for 29OCT2001.
97
Convert Dates to SAS Date Values
What is the result if you use formatted input with the
following informat to read in this data?
Raw Data
Value
10/29/2001
98
Informat
MMDDYY8.
Converted
Value
Convert Dates to SAS Date Values
What is the result if you use formatted input with the
following informat to read in this data?
Raw Data
Value
10/29/2001
Informat
Converted
Value
MMDDYY8.
SAS reads 8 columns, which is in the MMDDYY form.
SAS considers this to be 1920.

99
See the YEARCUTOFF handout for more
information on why this is 1920 and not 2020.
Using Formatted Input
Raw Data File
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
Read the raw data file
using formatted input.
DATA Step
data SAS-data-set-name;
infile 'raw-data-filename';
input pointer-control variable informat-name;
run;
SAS Data Set
100
SAS date values
Flight Date
Dest
439
921
114
LAX
DFW
LAX
14955
14955
14956
FirstClass Economy
20
20
15
137
131
170
Reading Data: Formatted Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
101
...
Reading Data: Formatted Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
102
...
Reading Data: Formatted Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
103
...
Reading Data: Formatted Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
104
...
Reading Data: Formatted Input
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
105
Reading Data: Formatted Input
1
1
2
1---5----0----5----0
Raw Data File 43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
data work.dfwlax;
infile 'raw-data-file';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
run;
106
Reading Data: Formatted Input
proc print data=work.dfwlax;
run;
SAS date values
The SAS System
107
Obs
Flight
Date
Dest
First
Class
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
14955
14955
14956
14956
14957
14957
14958
14958
14959
14959
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
Economy
137
131
170
85
196
116
166
88
187
31
Reading Data: Formatted Input
proc print data=work.dfwlax;
format Date date9.;
Formatted SAS
run;
date values
The SAS System
108
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Dest
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
First
Class
20
20
15
5
14
15
17
7
.
14
Economy
137
131
170
85
196
116
166
88
187
31
Reading Raw Data
You learned about column and formatted input.
You can use both at the same time on one input
statement. This is called mixed input.
data work.dfwlax;
infile 'raw-data-file';
input Flight $ 1-3 @4 Date mmddyy8.
Dest $ 12-14 @15 FirstClass 3.
Economy 18-20;
run;
109
Reading Raw Data
What if you forget the required dot delimiter on the
informat? What happens?
data work.dfwlax;
infile 'raw-data-file';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3
@18 Economy 3.;
run;
110
Reading Raw Data
There are no errors or warnings although the dot was
required. Why?
111
Reading Raw Data
Compare the output with the dot delimiter and without it.
Do you notice any difference?
With the dot delimiter
Without the dot delimiter
112
Convert Dates to SAS Date Values
data work.dfwlax;
infile 'raw-data-file';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3
@18 Economy 3.;
run;
The log is clean. There are no odd notes, warnings or
errors.
What happened? Why?
113
Reading Raw Data
Look at the FirstClass variable.
With the dot delimiter
Without the dot delimiter
114
Reading Raw Data
Read down the column of the FirstClass variable.
Do you see those numbers in another column?
Without the dot delimiter
115
Reading Raw Data
Read down the column of the FirstClass variable.
Do you see those numbers in another column?
Without the dot delimiter
Column 3
116
Convert Dates to SAS Date Values
data work.dfwlax;
infile 'raw-data-file';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3
@18 Economy 3.;
run;
SAS read @15 and moved the pointer to column
15 in the buffer.
Then SAS read FirstClass 3, which is
column input.
The dot delimiter is what distinguishes an informat
from a column number.
117
Exercise
This exercise reinforces the concepts discussed
previously.
118
Exercises
Using the wake_nc schools.txt raw data file and
formatted input, create a temporary data set named
wakenc.
Produce two listing reports:
1. One with the data unformatted
2. In the second listing report:
 Label MHV as Median Housing Value, TSRatio as
Teacher/Student Ratio, and NumStudents as
Number of Students.
 Format the NumStudents to have commas in the
value and the MHV to have dollar signs, commas, and
two decimal places.
119
Exercises
Output from the first unformatted report:
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
120
Town
Apex
Apex
Apex
Cary
Cary
Fuquay-varina
Garner
Holly Springs
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Wake Forest
Wake Forest
Wendell
Zebulon
MHV
182000
182000
182000
181690
181690
131000
117600
153100
159598
159598
159598
159598
159598
159598
159598
159598
159598
159598
159598
159598
159598
149600
149600
96500
97700
HighSchool
Apex High School
Athens Drive High @ Middle Ck
Middle Creek High School
Cary High School
Green Hope High School
Fuquay-varina High School
Garner High School
Community Partners Charter High
Athens Drive High School
Baker Charter High School
Gov Morehead School
Leesville Road High School
Longview
Millbrook High School
Needham Broughton High School
Phillips High School
Raleigh Charter High School
Sanderson High School
Southeast Raleigh High School
Wakefield High School
William G Enloe High School
Franklin Academy
Wake Forest-rolesville High
East Wake High School
East Wake Academy
Num
Students
TSRatio
2047
1747
1220
2209
2005
1518
1867
121
1769
40
68
2085
93
2040
2171
217
475
1770
2088
1818
2344
791
1507
1975
508
1:17
1:14
1:23
1:19
1:18
1:15
1:15
1:10
1:14
1:13
1:2
1:17
1:4
1:16
1:17
1:8
1:12
1:16
1:16
1:19
1:16
1:12
1:21
1:16
1:14
Exercises
Output from the second, formatted report:
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
121
Town
Apex
Apex
Apex
Cary
Cary
Fuquay-varina
Garner
Holly Springs
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Raleigh
Wake Forest
Wake Forest
Wendell
Zebulon
Median
Housing
Value
$182,000.00
$182,000.00
$182,000.00
$181,690.00
$181,690.00
$131,000.00
$117,600.00
$153,100.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$159,598.00
$149,600.00
$149,600.00
$96,500.00
$97,700.00
HighSchool
Apex High School
Athens Drive High @ Middle Ck
Middle Creek High School
Cary High School
Green Hope High School
Fuquay-varina High School
Garner High School
Community Partners Charter High
Athens Drive High School
Baker Charter High School
Gov Morehead School
Leesville Road High School
Longview
Millbrook High School
Needham Broughton High School
Phillips High School
Raleigh Charter High School
Sanderson High School
Southeast Raleigh High School
Wakefield High School
William G Enloe High School
Franklin Academy
Wake Forest-rolesville High
East Wake High School
East Wake Academy
Num
Students
2,047
1,747
1,220
2,209
2,005
1,518
1,867
121
1,769
40
68
2,085
93
2,040
2,171
217
475
1,770
2,088
1,818
2,344
791
1,507
1,975
508
Teacher/
Student
Ratio
1:17
1:14
1:23
1:19
1:18
1:15
1:15
1:10
1:14
1:13
1:2
1:17
1:4
1:16
1:17
1:8
1:12
1:16
1:16
1:19
1:16
1:12
1:21
1:16
1:14
Exercise – Section 6.2
This exercise reinforces the concepts discussed
previously.
122
Section 6.3
Examining Data Errors
Objectives


124
Define types of data errors.
Identify data errors.
SAS Code/Vocabulary



125
Data Error
_ERROR_
_N_
What Are Data Errors?
In Chapter 2, you looked at syntax errors.
Specifically you discussed the following:
 what a syntax error is
 what SAS does when it encounters an error
 how to fix the syntax error by reading the SAS log
126
What Are Data Errors?
There is another type of error called a data error.



127
Data errors occur when SAS encounters data that it is
not expecting.
This might be a problem with the data or with your
program.
You need to look in the SAS log to decide.
What Are Data Errors?
SAS detects data errors when the following occur:



128
The INPUT statement encounters invalid data in a
field.
Illegal arguments are used in functions. (We will
discuss this in the next chapter.)
Impossible mathematical operations are requested.
Examining Data Errors
When SAS encounters a data error, these events occur:
 A note that describes the error is printed in the SAS
log.
 The input record being read is displayed in the SAS
log (contents of the input buffer).
 The values in the SAS observation being created are
displayed in the SAS log (contents of the PDV).
 A missing value is assigned to the appropriate SAS
variable.
 Execution continues.
129
Demonstration
Examining Data Errors
Open the program c06s3d1.sas.
Read the program, and then submit it.
 What data set is created in the program?
 What raw data file is read?
 How many variables are going to be in the
new data set? What are their names, types
and lengths?
130
Examining Data Errors
Did you get output in the Output window?
How does it look?
Is there anything unusual in the log window?
131
Examining Data Errors
SAS Output
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
132
Emp
ID
LastName
0031
0040
0071
0082
0091
0106
0275
0286
0309
0334
0347
0355
0366
0730
0739
GOLDENBERG
WILLIAMS
PERRY
MCGWIER-WATTS
SCOTT
THACKER
GRAHAM
DREWRY
HORTON
DOWN
CHERVENY
BELL
GLENN
BELL
SAYRE
FirstName
Job
Code
Salary
DESIREE
ARLENE M.
ROBERT A.
CHRISTINA
HARVEY F.
DAVID S.
DEBORAH S.
SUSAN
THOMAS L.
EDWARD
BRENDA B.
THOMAS B.
MARTHA S.
CARLA
MARCO
PILOT1
FLTAT1
FLTAT1
PILOT3
FLTAT2
FLTAT1
FLTAT2
PILOT1
FLTAT1
PILOT1
FLTAT2
PILOT1
PILOT3
PILOT1
PILOT1
50221.62
23666.12
21957.71
96387.39
32278.40
24161.14
32024.93
55377.00
23705.12
.
38563.45
59803.16
120202.38
37397.93
59268.61
Examining Data Errors
SAS log
133
Examining Data Errors
Did you notice the following?
1.There are no errors.
 One odd note was written to the SAS log.
2.The data set was still created with 15 rows.
 This matches the 15 lines read from the raw data
file.
3.There was invalid data in observation 10.
134
Examining Data Errors
SAS encountered a problem with line 10.
1. SAS wrote a note in the log regarding invalid data in
columns 37-45.
135
Examining Data Errors
SAS encountered a problem with line 10.
1. SAS wrote a note in the log regarding invalid data in
columns 37-45.
2. SAS printed a ruler to help you see the record.
136
Examining Data Errors
SAS encountered a problem with line 10.
1. SAS wrote a note in the log regarding invalid data in
columns 37-45.
2. SAS printed a ruler to help you see the record.
3. Under the ruler, SAS wrote the contents of the input
buffer (the raw data record).
137
Examining Data Errors
SAS encountered a problem with line 10.
1. SAS wrote a note in the log regarding invalid data in
columns 37-45.
2. SAS printed a ruler to help you see the record.
3. Under the ruler, SAS wrote the contents of the input
buffer (the raw data record).
4. SAS then wrote the contents of the PDV to the log to
help you see what it output to the data set. This is easy
to identify, because the variable name appears followed
by an equal sign and the value SAS assigned it.
138
Examining Data Errors
_ERROR_ and _N_ are internal SAS variables.
They are not written to the data set.
_N_ is the number of times SAS looped through the DATA step.
In this example, _N_ = 10, which means that this is the tenth
time through the DATA step.
_ERROR_ can have a value of 0 or 1.
It indicates whether or not an error occurred.
0 = false, 1 = true.
You can use these internal variables in a later DATA step.
139
Examining Data Errors
SAS encountered a problem with line 10.
1. SAS wrote a note in the log regarding invalid data in
columns 37-45.
2. SAS printed a ruler to help you see the record.
3. Under the ruler, SAS wrote the contents of the input
buffer (the raw data record).
4. SAS then wrote the contents of the PDV to the log to
help you see what it output to the data set. This is easy
to identify, because the variable name appears followed
by an equal sign and the value SAS assigned it.
5. SAS wrote the observations to the data set and
continued processing.
140
Examining Data Errors
Is the problem with the data or the program?


141
Look at the SAS log where the note begins, not the
program.
Look for the variable name that was flagged in the
note.
Examining Data Errors
Look in the PDV in the SAS log:
1. What was the resulting value for that variable?
2. What date type did SAS assign to that variable?
142
Examining Data Errors
1. What was the resulting value for that variable?
SAS assigned a missing value to Salary.
2. What data type did SAS assign to that variable?
SAS assigned the data type of numeric, because it
assigned Salary a missing value of a period.
143
Examining Data Errors
In the SAS log, you see a note that lists the name of the
variable, the line number, and the columns in which it
read from the raw data file.
The note indicates that the data can
be found in columns 37 through 45
in the input buffer.
144
Examining Data Errors
Under the ruler, locate columns 37 through 45.
What do you notice about this value?
37
145
45
Examining Data Errors
Under the ruler, locate columns 37 through 45.
What do you notice about this value?
The leading blank in column 37 is not an issue.
However, the % is not a standard numeric. This is a data
error.
37
45
146
Examining Data Errors
This is why Obs 10 had a missing value for Salary.
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
147
Emp
ID
LastName
0031
0040
0071
0082
0091
0106
0275
0286
0309
0334
0347
0355
0366
0730
0739
GOLDENBERG
WILLIAMS
PERRY
MCGWIER-WATTS
SCOTT
THACKER
GRAHAM
DREWRY
HORTON
DOWN
CHERVENY
BELL
GLENN
BELL
SAYRE
FirstName
Job
Code
Salary
DESIREE
ARLENE M.
ROBERT A.
CHRISTINA
HARVEY F.
DAVID S.
DEBORAH S.
SUSAN
THOMAS L.
EDWARD
BRENDA B.
THOMAS B.
MARTHA S.
CARLA
MARCO
PILOT1
FLTAT1
FLTAT1
PILOT3
FLTAT2
FLTAT1
FLTAT2
PILOT1
FLTAT1
PILOT1
FLTAT2
PILOT1
PILOT3
PILOT1
PILOT1
50221.62
23666.12
21957.71
96387.39
32278.40
24161.14
32024.93
55377.00
23705.12
.
38563.45
59803.16
120202.38
37397.93
59268.61
Examining Data Errors
Was the problem with the data or the program?
In this example, the problem was with the data.
If you knew the value of Edward Down’s salary and had
Write access to the text file, you could modify the data.
If you do not have access to modify the data, you must
contact the person in the proper group to help you.
148
Examining Data Errors (Review)
Strategy to find the problem:
1. Do not look at your program. Look at the log where
the NOTE begins.
2. What variable offended SAS? It is the one named in
the NOTE.
3. Look in the PDV in the log. What type does SAS think
that variable is?
4. In the note, SAS tells you where to look in the buffer.
Under the ruler, locate the columns specified.
5. Do you notice any issue with the value?
149
Demonstration
Examining Data Errors
Open the program c06s3d2.sas.
Read the program, and then submit it.
1. What data set is created in the program?
2. What raw data file is read?
3. How many variables are in the new data
set?
4. What are their names, types, and lengths?
150
Examining Data Errors
Examine your output. Is it what you expected?
Are there any unusual notes in the log window?
151
Examining Data Errors
Output: Is this what you expected?
152
Obs
Emp
ID
LastName
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0031
0040
0071
0082
0091
0106
0275
0286
0309
0334
0347
0355
0366
0730
0739
GOLDENBERG
WILLIAMS
PERRY
MCGWIER-WATTS
SCOTT
THACKER
GRAHAM
DREWRY
HORTON
DOWN
CHERVENY
BELL
GLENN
BELL
SAYRE
FirstName
Job
Code
Salary
DESIREE
ARLENE M.
ROBERT A.
CHRISTINA
HARVEY F.
DAVID S.
DEBORAH S.
SUSAN
THOMAS L.
EDWARD
BRENDA B.
THOMAS B.
MARTHA S.
CARLA
MARCO
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
50221.62
23666.12
21957.71
96387.39
32278.40
24161.14
32024.93
55377.00
23705.12
.
38563.45
59803.16
120202.38
37397.93
59268.61
Examining Data Errors
All the values for JobCode are missing.
This might be how the data is stored.
To help you decide, look at the messages in the SAS log.
If the log is clean, check the data values in the raw data
file.
153
Examining Data Errors
Partial log
154
Examining Data Errors
For each code submission, SAS displays only a certain
number of errors in the log.
The ERROR= system option controls how many notes of
the same type are displayed in the log. The default value
is 20. The rest of the data is processed, but only 20 errors
will be written to the log.
You can reset this value in an OPTIONS statement.
155
Examining Data Errors
Use the strategies to find the problem:
1. Look at the log where the note begins, not the
program.
2. What variable was flagged by SAS? It is the one
named in the note.
3. Look in the PDV in the SAS log. What data type was
assigned to the variable?
4. In the note, SAS tells you where to look in the buffer.
Under the ruler, locate the columns specified.
5. Do you notice any issue with the value?
156
Examining Data Errors
Answers:
2. What variable was flagged by SAS? It is the one
named in the note.
Jobcode
3. Look in the PDV in the SAS log. What data type was
assigned to the variable?
Numeric
5. Do you notice any issue with the value?
They are character strings.
157
Examining Data Errors
Look at the DATA step code in the SAS log.
 Does the data type in the INPUT statement match the
data?
 What is the data type used in the INPUT statement?
158
Examining Data Errors
Partial Log
In the INPUT buffer, JobCode is a character string.
However, SAS assigned it as a numeric. (The missing
value of a period was assigned.)
This is a problem with the program. The $ was not
included in the INPUT statement for JobCode.
159
Examining Data Errors
Use the strategies to find the problem:
1. Look at the log where the note begins, not the
program.
2. What variable was flagged by SAS? It is the one
named in the note.
3. Look in the PDV in the SAS log. What data type was
assigned to the variable?
4. In the note, SAS will tell you where to look in the
buffer. Under the ruler, locate the columns specified.
5. Do you notice any issue with the value?
6. If the answer to question 5 above is No, then look at
the DATA step code in the log. Does the data type in
the INPUT statement match the data?
160
Exercise
This exercise reinforces the concepts discussed
previously.
161
Exercises
1. Please open the StudyAbroad.txt file. Using this raw data
file, create a temporary SAS data set called
AbroadOptions where to following conditions are met:
The average cost is formatted in U.S. dollars, and no
decimals.
 Dates are formatted as MM/DD/YY10.
 The number of students is formatted with commas.
 The title is "2008 Study Abroad Options."
2. Create a listing report where labels are Country, Number of
Students, Average Cost, and Length of Stay.
3. Review the log. Identify the errors that you received as program
errors, data errors, or both.

162
Exercise – Section 6.3
This exercise reinforces the concepts discussed
previously.
163
Section 6.4
Assigning Variable
Attributes
Objectives


165
Assign permanent attributes to SAS variables.
Override permanent variable attributes.
SAS Code/Vocabulary



166
Descriptor Portion
LABEL
FORMAT
Review: Default Variable Attributes
When a variable is created in a DATA step, the following
occurs:
 The name, type, and length of the variable are
automatically assigned.
 The remaining attributes, such as label and format, are
not automatically assigned.
When the variable is used in a later step, these events
occur:
 The name is displayed for identification purposes.
 Its value is displayed using a system-determined
format.
167
Default Variable Attributes
Previously, you created the ia.dfwlax data set.
libname ia 'SAS-data-library';
data ia.dfwlax;
infile 'raw-data-file';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
run;
168
Default Variable Attributes
Examine the descriptor portion of the ia.dfwlax data
set.
proc contents data=ia.dfwlax;
run;
Partial Output
Alphabetic List of Variables and Attributes
#
Variable
Type
Len
2
3
5
4
1
Date
Dest
Economy
FirstClass
Flight
Num
Char
Num
Num
Char
8
3
8
8
3
You can see the attributes: Name, Type, and Length.
169
Temporary Variable Attributes
You previously used the LABEL and FORMAT statements
in a PROC step to temporarily assign attributes.
proc print data=ia.dfwlax label;
format Date mmddyy10.;
label Dest='Destination'
FirstClass='First Class Passengers'
Economy='Economy Passengers';
run;
170
Temporary Variable Attributes
Output
The SAS System
Obs
1
2
3
4
5
6
7
8
9
10
171
Flight
439
921
114
982
439
982
431
982
114
982
Date
12/11/2000
12/11/2000
12/12/2000
12/12/2000
12/13/2000
12/13/2000
12/14/2000
12/14/2000
12/15/2000
12/15/2000
Destination
First
Class
Passengers
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
Economy
Passengers
137
131
170
85
196
116
166
88
187
31
Specifying Variable Attributes
Using LABEL and FORMAT statements in the PROC
PRINT step only temporarily assigned the attributes to the
data.
If you submit another PROC PRINT step without a
FORMAT or LABEL statement, the labels and formats will
not be displayed.
If you move the LABEL and FORMAT statements to the
DATA step, they will be permanently assigned.
172
Permanent Variable Attributes
Assign labels and formats in the DATA step.
libname ia 'SAS-data-library';
data ia.dfwlax;
infile 'raw-data-file';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
format Date mmddyy10.;
label Dest='Destination'
FirstClass='First Class Passengers'
Economy='Economy Passengers';
run;
173
Permanent Variable Attributes
Examine the descriptor portion of the ia.dfwlax data
set.
proc contents data=ia.dfwlax;
run;
Partial Output
Alphabetic List of Variables and Attributes
#
Variable
Type
Len
2
3
5
4
1
Date
Dest
Economy
FirstClass
Flight
Num
Char
Num
Num
Char
8
3
8
8
3
Format
MMDDYY10.
Label
Destination
Economy Passengers
First Class Passengers
You also have the attributes for Format and Label.
174
Specifying Variable Attributes
Use LABEL and FORMAT statements in the
 PROC step to temporarily assign the attributes (for the
duration of the step only)
 DATA step to permanently assign the attributes (stored
in the data set descriptor portion).
175
Specifying Variable Attributes
!
176
Even if the labels are permanently assigned in
the descriptor portion of the data set, PROC
PRINT still requires either the LABEL or SPLIT=
option to display the labels.
Permanent Variable Attributes
proc print data=ia.dfwlax label;
run;
Output
The SAS System
The labels are
stored in the
descriptor
portion of the
data set. PROC
PRINT will
apply the labels
with the LABEL
option.
177
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
Date
12/11/2000
12/11/2000
12/12/2000
12/12/2000
12/13/2000
12/13/2000
12/14/2000
12/14/2000
12/15/2000
12/15/2000
Destination
First
Class
Passengers
Economy
Passengers
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
Permanent Variable Attributes
proc print data=ia.dfwlax label;
run;
Output
The SAS System
The formats
are stored in
the descriptor
portion of the
data set and
are used
automatically.
178
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
Date
12/11/2000
12/11/2000
12/12/2000
12/12/2000
12/13/2000
12/13/2000
12/14/2000
12/14/2000
12/15/2000
12/15/2000
Destination
First
Class
Passengers
Economy
Passengers
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
Variable Attributes
You might not want to always see the date using the
MMDDYY10. format, which is stored in the descriptor
portion of the data set.
What if you want to use the DATE9. format for a report?
179
Override Permanent Attributes
Use a FORMAT statement in a PROC PRINT step to
temporarily override the format stored in the data set
descriptor.
proc print data=ia.dfwlax label;
format Date date9.;
run;
180
Override Permanent Attributes
Output
The SAS System
181
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
Date
11DEC2000
11DEC2000
12DEC2000
12DEC2000
13DEC2000
13DEC2000
14DEC2000
14DEC2000
15DEC2000
15DEC2000
Destination
First
Class
Passengers
Economy
Passengers
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
Exercise – Section 6.4
This exercise reinforces the concepts discussed
previously.
182
Section 6.5
Reading Delimited
Raw Data Files
Objectives




184
Read a space-delimited raw data file.
Read a comma-delimited raw data file.
Read a raw data file with missing data at the end of a
row.
Read a raw data file with missing data represented by
consecutive delimiters.
SAS Vocabulary






Free Format
INPUT
Standard Data
Nonstandard Data
INFORMAT
COMMAw.







185
MMDDYYw.
DATEw.
LENGTH
INFILE
DLM=
MISSOVER
DSD
List Input with the Default Delimiter
50001
50002
50003
50004
50005
50006



186
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 72 550
24jun1993 170 510
20dec1994 180 520
The data is not in fixed columns.
The fields are separated by spaces.
There is one nonstandard field.
The first two
fields are in fixed
columns, but the
last two columns
are not.
List Input
Free format can be defined as raw data with fields that
are not in fixed columns.
 Use list input, also known as delimited input, to read
free-format data.
 The list input style signals to SAS that fields are
separated by delimiters.
 SAS then reads from non-delimiter to delimiter instead
of from a specific location in the raw data record.
187
Format Comparison
Column/Formatted Data
Flight
Date
Dest First Econ
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 14 31
188
Free Format/Delimited Data
50001
50002
50003
50004
50005
50006
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 72 550
24jun1993 170 510
20dec1994 180 520
Delimiters
Common delimiters are shown below:
blanks
50001 4feb1989 132 530
50002 11nov1989 152 540
50003 22oct1991 90 530
A space (blank) is
the default delimiter.
50001,4feb1989,132,530
commas
50002,11nov1989,152,540
50003,22oct1991,90,530
tab characters
189
50001 4feb1989
50002 11nov1989
50003 22oct1991
132
152
90
530
540
530
Reading Delimited Data
The DATA and INFILE statements remain
unchanged.
Remember
The syntax of the INPUT statement changes depending
on what type of input you use.
 Column
 Formatted
 List
190
List Input
General form of the INPUT statement for list input:
INPUT var-1 $ var-2 . . .var-n;
List the variables in the INPUT statement.
 If the variable is a character, you must put a $ after
the name to indicate character data.
 No symbol after the variable name indicates a numeric
variable.
input AccountNum Name $ Balance Type $;
191
List Input
General form of the INPUT statement for list input:
INPUT var-1 $ var-2 . . .var-n;


192
You must specify the variables in the order that they
appear in the raw data file.
Even if you want only the first and last variable in the
raw data file, you must read all of the data.
Reading a Delimited Raw Data File
data airplanes;
infile 'raw-data-file';
input ID $ InService $ PassCap CargoCap;
run;
How does SAS read numbers with special
characters such as dollar signs and commas?
How does SAS read dates?
193
Input Data
The second field is a date.
How does SAS store dates?
50001
50002
50003
50004
50005
50006
194
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Review: Standard Data
The term standard data refers to character and numeric
data that SAS recognizes automatically.
Examples of standard numeric data:
15
195
-15
15.4
+1.23
1.23E3
-1.23E-3
Review: Nonstandard Data
The term nonstandard data refers to character and
numeric data that SAS does not recognize automatically.
Examples of nonstandard numeric data:
12/12/2012
196
29FEB2000
4,242
$89,000
Review: Informats
To read in nonstandard data, you must apply an informat.
An informat is an instruction that specifies how SAS reads
raw data.
General form of an informat:
<$>INFORMAT-NAME<w>.<d>
197
Specifying an Informat
To specify an informat, use the colon (:) format modifier in
the INPUT statement between the variable name and the
informat.
General form of a format modifier in an INPUT statement:
Colon format modifier
INPUT variable : informat;
198
Reading a Delimited Raw Data File
data airplanes;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
The colon format modifier indicates list input, not
formatted input.
199
Reading a Delimited Raw Data File
data airplanes;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
How does SAS determine the lengths of these
variables?
200
Lengths of Variables
When you use list input, the default length for both
character and numeric variables is eight (8) bytes.
Some values might require more than 8 bytes of storage.
201
Lengths of Variables
You can set the length of character variables with a
LENGTH statement or with an informat.
General form of a LENGTH statement:
LENGTH variable-name <$> length-specification ...;
Use the dollar sign to denote character variables.
Remember
202
Lengths of Variables
!
203
You can change the length of a numeric variable,
but if you do, you might lose precision.
It is good practice not to change the length of
numeric variables.
Lengths of Variables
Remember
204
When SAS creates a variable, it needs to know
three important pieces of information to create
the program data vector (PDV):
 Name
 Type
 Length
Setting the Length of a Variable
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
In this example, what is the type and length
of the variable ID?
205
Setting the Length of a Variable
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
In this example, what is the type and length
of the variable ID?
Name= ID
Type = Character
Length = 5
206
SAS has all the information that it
needs to create a spot in the PDV
from the LENGTH statement.
Submitting the step
What happens at compile time and execution
time?
Understanding this will help you debug your
programs quickly.
207
Raw Data File
50001
50002
50003
50004
50005
50006
Compile
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
PDV
ID
$
5
208
INSERVICE
N
8
PASSCAP
N
8
CARGOCAP
N
8
...
Raw Data File
50001
50002
50003
50004
50005
50006
Compile
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
PDV
ID
$
5
209
INSERVICE
N
8
PASSCAP
N
8
CARGOCAP
N
8
...
Raw Data File
50001
50002
50003
50004
50005
50006
Execute
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
PDV
ID
$
5
INSERVICE
N
8
.
PASSCAP
N
8
CARGOCAP
N
8
.
.
SAS initializes the values in the PDV to missing.
210
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 1
4 f e b 1 9 8 9
1 3 2
5 3 0
PDV
ID
$
5
INSERVICE
N
8
.
PASSCAP
N
8
CARGOCAP
N
8
.
.
The INPUT statement tells SAS to load the record into the input
buffer. The pointer then moves down to the next record.
211
Execution of the INPUT Statement
SAS continues through the INPUT statement to populate
every variable in the PDV.
SAS starts at the first non-delimiter and reads until it
reads a delimiter. Leading delimiters are ignored.
The default delimiter is a blank.
212
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
5 3 0
SAS stops at the first delimiter.
PDV
ID
213
INSERVICE
PASSCAP
CARGOCAP
$
N
N
N
5
8
8
8
50001
.
.
.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
5 3 0
SAS stops at the next delimiter.
PDV
ID
214
INSERVICE
PASSCAP
CARGOCAP
$
N
N
N
5
8
8
8
50001
.
10627
.
.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
SAS stops
at the delimiter.
PDV
ID
215
INSERVICE
5 3 0
PASSCAP
CARGOCAP
$
N
N
N
5
8
8
8
50001
.
10627
.
132
.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
5 3 0
SAS stops at the delimiter.
PDV
ID
216
INSERVICE
PASSCAP
CARGOCAP
$
N
N
N
5
8
8
8
50001
.
10627
.
132
530
.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Automatic
output
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
PDV
ID
217
INSERVICE
PASSCAP
5 3 0
CARGOCAP
$
N
N
N
5
8
8
8
50001
.
10627
.
530
132
.
Write out observation to airplanes.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 1
Automatic return
4 f eb 1 9 8 9
1 3 2
PDV
ID
218
INSERVICE
PASSCAP
5 3 0
CARGOCAP
$
N
N
N
5
8
8
8
50001
.
10627
.
132
530
.
Execution of the DATA Step
SAS returns to the top of the DATA step.
SAS reinitializes the values in the PDV to missing. This
prevents data corruption.
219
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
PDV
ID
220
INSERVICE
PASSCAP
5 3 0
CARGOCAP
$
N
N
N
5
8
8
8
.
.
.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 2
1 1 n o v 1 98 9
1 5 2
PDV
ID
221
INSERVICE
PASSCAP
5 4 0
CARGOCAP
$
N
N
N
5
8
8
8
.
.
.
The INPUT statement loads the record into the input
buffer. The pointer then moves down to the next record.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 2
ID
222
1 1 n o v 1 98 9
INSERVICE
PDV
1 5 2
PASSCAP
5 4 0
CARGOCAP
$
N
N
N
5
8
8
8
50002
.
10907
.
152
SAS reads from non-delimiter to delimiter.
540
.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Automatic
output
Input Buffer
5 0 0 0 2
1 1 n o v 1 98 9
1 5 2
PDV
ID
INSERVICE
PASSCAP
5 4 0
CARGOCAP
$
N
N
N
5
8
8
8
50002
.
10907
.
152
540
.
Write out the observation to airplanes.
223
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 2
Automatic return
1 1 n o v 1 98 9
1 5 2
PDV
ID
224
INSERVICE
PASSCAP
5 4 0
CARGOCAP
$
N
N
N
5
8
8
8
50002
.
10907
.
152
540
.
...
Raw Data File
50001
50002
50003
50004
50005
50006
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $
InService : date9.
PassCap CargoCap;
run;
4feb1989 132 530
11nov1989 152 540
22oct1991 90 530
4feb1993 172 550
24jun1993 170 510
20dec1994 180 520
Input Buffer
5 0 0 0 2
Continue processing
1 1 n o until
v 1 the
9 8 end
9 of1the
5 raw
2 data
5 4 file.
0
PDV
ID
225
INSERVICE
PASSCAP
CARGOCAP
$
N
N
N
5
8
8
8
.
.
.
Reading a Raw Data File with List Input
proc print data=airplanes noobs;
run;
PROC PRINT Output
In
ID
Service
50001
50002
50003
50004
50005
50006
226
10627
10907
11617
12088
12228
12772
Pass
Cap
132
152
90
172
170
180
Cargo
Cap
530
540
530
550
510
520
Exercise
This exercise reinforces the concepts discussed
previously.
227
Exercises
Convert the raw data file TripChoices.txt to
a SAS data set named work.trip.
The variables in the text file are StudentID,
Destination, HotelCost, and
DepartureDate.
Produce a list report to view the new data set.
228
Exercises
data work.trip;
length StudentID $ 4 Destination $ 9;
infile 'TripChoices.txt';
input StudentID $ Destination $ DepartureDate : MMDDYY8.
HotelCost dollar8.0;
run;
proc print data=work.trip noobs;
format DepartureDate MMDDYY8. HotelCost dollar8.2;
run;
229
Exercises
The SAS System
Student
ID
3250
3258
3259
3266
3456
3459
3489
3511
230
Destination
Beach
Mountains
Beach
Mountains
Beach
Mountains
Beach
Mountains
Departure
Date
05/07/06
05/06/06
05/06/06
05/07/06
05/06/06
05/05/06
05/06/06
05/05/06
Hotel
Cost
$99.00
$125.00
$125.00
$99.00
$125.00
$99.00
$99.00
$99.00
Exercises
Challenge
In this section, you looked at using the LENGTH
statement to set the length of ID to 5.
Without using a LENGTH statement, how can you
modify the DATA step and still have an ID with a length
of 5?
data airplanes;
length ID $ 5;
infile 'raw-data-file';
input ID $ InService : date9. PassCap CargoCap;
run;
231
Exercises
Challenge Answer
Without using a LENGTH statement, how can you modify
the data step and still have an ID with a length of 5?
data airplanes;
infile 'raw-data-file';
input ID :$5. InService : date9.
PassCap CargoCap;
run;
You can use an informat to set the length of a variable.
232
Using List Input
Text or raw data files can have other delimiters.
Commas are a common delimiter.
SAS enables you to specify different delimiters
when you read a text file.
233
Non-Default Delimiter
These fields are separated by commas.
50001 , 4feb1989,132, 530
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
234
Using the DLM= Option
The DLM= option sets a character or characters that SAS
recognizes as a delimiter in the raw data file.
General form of the INFILE statement with the
DLM= option:
INFILE 'raw-data-file' DLM='delimiter(s)';
Any character that you can type on your keyboard can be
a delimiter.
You can also use hexadecimal characters (used for TAB
delimited data).
235
Specifying a Delimiter
data airplanes2;
length ID $ 5;
infile 'raw-data-file' dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
236
Using the DLM= Option
Log
1
2
3
4
5
data airplanes2;
length ID $ 5;
infile 'aircraft2.dat' dlm=',';
input ID $ InService :date9. PassCap CargoCap;
run;
NOTE: The infile 'aircraft2.dat' is:
File Name=C:\Workshop\winsas\prog2\aircraft2.dat,
RECFM=V,LRECL=256
NOTE: 6 records were read from the infile 'aircraft2.dat'.
The minimum record length was 24.
The maximum record length was 26.
NOTE: The data set WORK.AIRPLANES2 has 6 observations and 4
variables.
Data
237
Forgetting the Colon
You have the following data and program:
Bree,11/15/1989,F
Ben,7/17/1988,M
Sam,04/17/1992,F
data friends;
length
Name
10;into SAS
We want
to read
this$data
infile 'raw-data-file' dlm=',';
input Name $ BirthDate :mmddyy10.
Gender $;
format Birthdate date9.;
run;
238
Forgetting the Colon
Output
The SAS System
Obs
Name
BirthDate
1
Bree
15NOV1989
17JUL1988
We want2to readBen
this data into
SAS
3
Sam
17APR1992
239
Gender
F
M
F
Forgetting the Colon
What happens if you remove the colon before the
MMDDYY10. format?
data friends;
length Name $ 10;
infile 'raw-data-file' dlm=',';
input Name $ BirthDate mmddyy10.
Gender $;
format Birthdate date9.;
run;
240
Forgetting the Colon
Output
The SAS System
Obs
Name
BirthDate
1
Bree
15NOV1989
We want2to readBen
this data into SAS .
3
Sam
17APR1992
241
Gender
F
M
F
Forgetting the Colon
SAS Log
98
99
100
101
102
103
data friends;
length Name $ 10;
infile ‘rawdatafile’ dlm=',';
input Name $ BirthDate mmddyy10.
Gender $;
run;
NOTE: Invalid data for BirthDate in line 106 5-14.
RULE:
----+----1----+----2----+----3----+----4----+
106
Ben,7/17/1988,M
We
want
to
read this data into SAS
Name=Ben BirthDate=. Gender=M _ERROR_=1 _N_=2
NOTE: The data set WORK.FRIENDS has 3 observations and 3
variables.

242
Only the second observation is missing.
With the Colon
The colon signals that SAS should read
from non-delimiter to delimiter.
B e n , 7 /1 7 / 1 9 8 9 , M
With the colon format
modifier, SAS starts
reading one column
after the delimiter
243
SAS stops at the next
delimiter and does not
store the delimiter as
part of the data value.
Using List Input – Forgetting the Colon (:)
If the colon is omitted, SAS begins reading one column to
the right of the pointer. It reads the length of the informat,
which might cause it to read past the end of the field, or
only part of the data.
B e n , 7 /1 7 / 1 9 8 9 , M
Without the colon,
SAS starts reading
where the pointer is.
MMDDYY10. tells
SAS to read exactly
10 columns.
SAS reads 10
columns and stops
one column after
the read.
Without the colon, SAS uses formatted input without the
@ pointer.
244
Using List Input – Forgetting the Colon (:)
B e n , 7 /1 7 / 1 9 8 9 , M
7/17/1989, is not following
the MMDDYY10. informat.
The comma becomes part of
the data value.
No error messages are printed in the SAS log, but you did
see invalid data messages.
245
Exercise
This exercise reinforces the concepts discussed
previously.
246
Exercises
Create a SAS data set called work.rank by reading
the comma-delimited raw data file
SchoolRank.csv.
The variables are Name, AvgSAT, Rank, AvgGPA,
and StartDate.
Produce a list report to view the new data set.
247
Exercises
data work.rank;
length Name $ 9;
infile 'SchoolRank.csv' dlm=',';
input Name $ AvgSAT Rank AvgGPA
StartDate:MMDDYY10.;
run;
proc print data=work.rank noobs;
format StartDate MMDDYY10.;
run;
248
Exercises
The SAS System
Name
Sunnyside
Brighton
Riverside
Hillside
East
SouthWest
Magnet
249
Avg
SAT
1270
1220
1160
950
1070
1110
1220
Rank
Avg
GPA
StartDate
6
5
3
4
1
2
7
3.40
3.49
3.76
3.75
4.00
3.87
2.80
09/01/2007
08/20/2007
09/01/2007
08/20/2007
09/01/2007
08/20/2007
09/01/2007
Using the DLM= Option
What if you use more than one character in the
DLM= option?
250
Using the DLM= Option
Use DLM=",* " in the INFILE statement to read the
following data:
Bree,*11/15/1989,*F,*8,999,*6
What would be the data values for Name, DOB,
Gender, and Scholarship?
251
Using the DLM= Option
SAS treats the comma (,) as a delimiter, the asterisk (*) as
a delimiter, and any combination of the * and , together as
a delimiter.
Bree,*11/15/1989,*F,*8,999,*6
SAS stops at the comma, because it is a delimiter.
Name
The * and the , do not have to appear together to be used
as a delimiter.
252
Using the DLM= Option
SAS treats the , or the * as a delimiter.
Bree,*11/15/1989,*F,*8,999,*6
SAS starts reading at the first non-delimiter.
The comma or the asterisk is considered a delimiter.
253
Using the DLM= Option
SAS treats the , or the * as a delimiter.
Bree,*11/15/1989,*F,*8,999,*6
DOB
254
SAS stops at the comma,
because it is a delimiter.
Using the DLM= Option
SAS treats the , or the * as a delimiter.
Bree,*11/15/1989,*F,*8,999,*6
Gender
SAS stops at the comma,
because it is a delimiter.
255
Using the DLM= Option
SAS treats the , or the * as a delimiter.
Bree,*11/15/1989,*F,*8,999,*6
Scholarship
SAS stops at the comma,
because it is a delimiter.
256
Exercise – Section 6.5
This exercise reinforces the concepts discussed
previously.
257
Section 6.6
Reading Raw Data
with Missing Values
Objectives


259
Work with missing data at the end of the record.
Work with missing data in the middle of the record.
SAS Code/Vocabulary







260
MISSOVER
LOST CARD
PAD
TRUNCOVER
DSD
CARDS
DATALINES
Working With Missing Data
In this section, you see what happens when data is
missing
 at the end of the record
 in the middle of the record.
261
Missing Data at the End of a Row
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
262
Missing Data at the End of a Row
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
Remember
263
SAS expects data values for every variable
in the PDV.
Missing Data at the End of a Row
By default, when there is missing data at the end of a row,
the following occurs:
1. SAS loads the next record to finish the observation.
2. A note is written to the log.
3. SAS loads a new record at the top of the DATA step
and continues processing.
264
Raw Data File
Execute
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
ID
$
5
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
265
.
.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
ID
$
5
,
4 f e b 1 9 89, 1 3 2
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
266
.
.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
ID
$
5
50001
267

,
4 f e b 1 9 89, 1 3 2
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
.
.
SAS read six columns, but only stored the first five.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
ID
$
5
50001
268
,
4 f e b 1 9 89, 1 3 2
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
,
4 f e b 1 9 89, 1 3 2
SAS stops at the
end of the record.
ID
$
5
50001
269
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
132
.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
ID
$
5
50001
270
,
4 f e b 1 9 89, 1 3 2
No data
You still need to
read CargoCap.
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
132
.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 00 2 ,
ID
$
5
50001
271
1 1 n ov 1 9 8 9, 1 5 2 ,
5 40
SAS reads from
the next record.
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
10627
.
.
132
50002
.
 The pointer is now on the third line in the raw data file. ...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Automatic
output
Input Buffer
5 0 00 2 , 1 1 n o v1 9 8 9, 1 5 2 , 5 4 0
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
ID
$
5
50001
272
.
10627
.
132
50002
.
Write out the observation to airplanes3.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Automatic return
Input Buffer
5 0 00 2 , 1 1 n o v1 9 8 9, 1 5 2 , 5 4 0
ID
$
5
50001
273
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
132
50002
.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 3 ,
ID
$
5
2 2 o c t1 9 9 1 , 9 0 ,
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
274
5 3 0
.
.
...
Raw Data File
50001 , 4feb1989,132
50002, 11nov1989,152, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172
50005, 24jun1993, 170, 510
50006, 20dec1994, 180, 520
data airplanes3;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 3 ,
ID
$
5
2 2 o c t 1 9 9Continue
1 , 9 0 processing
,
5 3 0
until the end of the raw data file.
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
275
.
.
Partial Log
NOTE: 6 records were read from the infile
'aircraft3.dat'.
The minimum record length was 19.
The maximum record length was 26.
NOTE: SAS went to a new line when INPUT
statement reached past the end of
a line.
NOTE: The data set WORK.AIRPLANES3 has 4
observations and 4 variables.
Notes can be good or bad; you decide.
How many records were read from the raw data?
How many were written to the raw data file?
276
Missing Data at the End of the Row
proc print data=airplanes3 noobs;
run;
PROC PRINT Output
ID
50001
50003
50004
50006
In
Service
Pass
Cap
Cargo
Cap
10627
11617
12088
12772
132
90
172
180
50002
530
50005
520
Are these
valid
values for
cargo?
There were six rows of data in the raw data file, but you
have only five observations in the data set.
277
The MISSOVER Option
The MISSOVER option prevents SAS from loading a new
record when the end of the current record is reached.
General form of the INFILE statement with the
MISSOVER option:
INFILE 'raw-data-file' MISSOVER;
If SAS reaches the end of the row without finding values
for all fields, variables without values are set to missing.
278
Using the MISSOVER Option
data airplanes3;
length ID $ 5;
infile 'raw-data-file' dlm=',' missover;
input ID $
InService : date9.
PassCap CargoCap;
run;
279
Using the MISSOVER Option
Partial SAS Log
NOTE: 6 records were read from the infile
'aircraft3.dat'.
The minimum record length was 19.
The maximum record length was 26.
NOTE: The data set WORK.AIRPLANES3 has 6
observations and 4 variables.
The numbers match:
 6 records read
 6 observations written
280
Using the MISSOVER Option
proc print data=airplanes3 noobs;
run;
PROC PRINT Output
ID
50001
50002
50003
50004
50005
50006
281
In
Service
10627
10907
11617
12088
12228
12772
Pass
Cap
132
152
90
172
170
180
Cargo
Cap
.
540
530
.
510
520
You have
missing values
in rows1 and
4.
Using the MISSOVER Option
SAS reinitializes the PDV at the top of the DATA Step.
Otherwise, previous values would remain in the PDV.
ID
50001
50002
50003
50004
50005
50006
282
In
Service
10627
10907
11617
12088
12228
12772
Pass
Cap
132
152
90
172
170
180
Cargo
Cap
.
540
530
.
510
520
This row would
have 530 rather
than missing,
and the data
would be
incorrect.
Other Options for Reading Raw Data
There are other options that are useful when you read
from a raw data file. You will examine two.
These options are for use with formatted input and have
no affect on reading delimited data.
283
Reading Raw Data Files
Data for flights from New York to Dallas (DFW) and Los
Angeles (LAX) is stored in a raw data file called
dfwlax_short.dat.
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
Description Column
92112/11/00DFW 20131
Flight Number s 1- 3
11412/12/00LAX 15170
Date
4-11
98212/12/00dfw 5 85
Destination
12-14
43912/13/00LAX 14196
First Class
15-17
98212/13/00DFW 15116
Passengers
43112/14/00LaX 17166
Economy
18-20
98212/14/00DFW 7 88
Passengers
11412/15/00LAX
187
98212/15/00DFW 1431
284
Reading Raw Data Files
This file is different than the one you worked with earlier.
1
1
2
1---5----0----5----0
Every row in a raw data file
has an end-of-record marker 43912/11/00LAX 20137
that indicates the end of data. 92112/11/00DFW 20131
11412/12/00LAX 15170
In this file, some of the rows 98212/12/00dfw 5 85
43912/13/00LAX 14196
have the end-of-record
marker inside the last range. 98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 1431
285
Demonstration
Open short_data.sas and examine the program:
data work.dfwlax;
infile 'dfwlax_short.dat';
input @1 Flight $3.
@4 Date mmddyy8.
@12 Dest $3.
@15 FirstClass 3.
@18 Economy 3.;
run;
proc print data=work.dfwlax;
run;
You expect to read 10 rows from the data file and write 10
observations to the data set.
286
Demonstration
The log indicates two issues.
25
26
27
28
29
30
data work.dfwlax;
infile 'dfwlax_short.dat';
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
run;
NOTE: LOST CARD.
Flight=982 Date=14959 Dest=DFW FirstClass=14 Economy=. _ERROR_=1
_N_=8
NOTE: 10 records were read from the infile 'dfwlax_short.dat'.
NOTE: SAS went to a new line when INPUT statement reached past
the end of a line.
NOTE: The data set WORK.DFWLAX has 7 observations and 5
variables.
287
Demonstration
Output
The SAS System
Obs
Flight
Date
Dest
First
Class
1
2
3
4
5
6
7
439
921
114
982
982
431
982
14955
14955
14956
14956
14957
14958
14958
LAX
DFW
LAX
dfw
DFW
LaX
DFW
20
20
15
5
15
17
7
Economy
137
131
170
439
116
166
114
You are expect 10 observations, but there are only 7.
288
Other Options for Reading Raw Data
Compare the output to the data.
The SAS System
Obs
Flight
Date
Dest
First
Class
1
2
3
4
5
6
7
439
921
114
982
982
431
982
14955
14955
14956
14956
14957
14958
14958
LAX
DFW
LAX
dfw
DFW
LaX
DFW
20
20
15
5
15
17
7
Economy
137
131
170
439
116
166
114
Where is this row?
289
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 1431
Other Options for Reading Raw Data
You encountered a problem at row number 4. SAS loaded
the next record to find data values for the Economy
variable.
Examine the notes:
Lost card.
SAS went to a new line when INPUT statement reached
past the end of a line.
Two rows had this behavior.
This was not an error and SAS continued processing.
290
Other Options for Reading Raw Data
In the program, data values for Economy start in column
18 and are three columns long. For rows 4, 8, and 10,
SAS could only read two columns before encountering
the end-of-record marker.
When SAS read the end-of-record marker and did not
have data for all variables, it went to the next row to read
the first three columns.
data work.dfwlax;
infile 'dfwlax_short.dat';
input @1 Flight $3. @4 Date mmddyy8. @12 Dest $3.
@15 FirstClass 3. @18 Economy 3.;
run;
proc print data=work.dfwlax;
run;
291
Other Options for Reading Raw Data
Compare the output to the data:
The SAS System
292
Obs
Flight
Date
Dest
First
Class
1
2
3
4
5
6
7
439
921
114
982
982
431
982
14955
14955
14956
14956
14957
14958
14958
LAX
DFW
LAX
dfw
DFW
LaX
DFW
20
20
15
5
15
17
7
Economy
137
131
170
439
116
166
114
1
1
2
1---5----0----5----0
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 1431
Other Options for Reading Raw Data
One method to fix this is to use the PAD option in the
INFILE statement.
The PAD option tells SAS to fill the input buffer with
blanks at the end of the record, so that the end-of-record
marker is moved to the end of the logical record length
(LRECL). SAS will not read beyond the end of record.
data work.dfwlax;
infile 'dfwlax_short.dat' PAD;
input @1 Flight $3. @4 Date mmddyy8. @12 Dest $3.
@15 FirstClass 3. @18 Economy 3.;
run;
proc print data=work.dfwlax;
run;
293
Other Options for Reading Raw Data
The SAS System
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
Date
14955
14955
14956
14956
14957
14957
14958
14958
14959
14959
33
34
35
36
37
38
Dest
Class
First
Economy
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
data work.dfwlax;
infile 'dfwlax_short.dat' pad;
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
run;
NOTE: 10 records were read from the infile 'dfwlax_short.dat'.
The minimum record length was 19.
The maximum record length was 20.
NOTE: The data set WORK.DFWLAX has 10 observations and 5
variables.
294
Other Options for Reading Raw Data
Another method is to use the TRUNCOVER option on the
INFILE statement.
The TRUNCOVER option enables you to read records
that are shorter than the INPUT statement expects.
Variables without any values assigned are set to missing.
SAS will not load the next record to read all the columns.
data work.dfwlax;
infile 'dfwlax_short.dat' TRUNCOVER;
input @1 Flight $3. @4 Date mmddyy8. @12 Dest $3.
@15 FirstClass 3. @18 Economy 3.;
run;
proc print data=work.dfwlax;
run;
295
Other Options for Reading Raw Data
The SAS System
Obs
Flight
1
2
3
4
5
6
7
8
9
10
439
921
114
982
439
982
431
982
114
982
Date
14955
14955
14956
14956
14957
14957
14958
14958
14959
14959
33
34
35
36
37
38
Dest
Class
First
Economy
LAX
DFW
LAX
dfw
LAX
DFW
LaX
DFW
LAX
DFW
20
20
15
5
14
15
17
7
.
14
137
131
170
85
196
116
166
88
187
31
data work.dfwlax;
infile 'dfwlax_short.dat' truncover;
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3. @15 FirstClass 3.
@18 Economy 3.;
run;
NOTE: 10 records were read from the infile 'dfwlax_short.dat'.
The minimum record length was 19.
The maximum record length was 20.
NOTE: The data set WORK.DFWLAX has 10 observations and 5
variables.
296
Exercise
This exercise reinforces the concepts discussed
previously.
297
Exercises
The raw data file TripChoices2.txt has missing
values. Create a SAS data set called work.choices
from this file and produce a listing report to display the
results.
The variables are StudentID, Destination,
DepartureDate, and HotelCost.
298
Exercises
data work.choices;
length StudentId $ 4 Destination $ 9;
infile 'TripChoices2.txt' dlm=',' missover;
input StudentID $ Destination $ DepartureDate : MMDDYY8.
HotelCost :dollar10.0;
run;
proc print data=work.choices noobs;
format DepartureDate MMDDYY8. HotelCost dollar8.2;
run;
299
Exercises
The SAS System
300
Student
Id
Destination
Departure
Date
3250
3258
2667
5487
3266
3456
3459
5487
3249
Beach
Mountains
Mountains
Beach
Mountains
Beach
Mountains
Beach
Mountains
05/07/06
05/06/06
05/07/07
05/06/06
05/07/06
05/06/06
05/05/06
05/06/06
05/05/06
Hotel
Cost
$99.00
$125.00
$99.00
$125.00
$99.00
$125.00
.
.
$99.00
Missing Data
You saw what SAS does when data is missing at the end
of a record.
What does SAS do when data is missing inside the
record?
301
Missing Values without Placeholders
There is missing data represented by two consecutive
delimiters.
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
302
Missing Values without Placeholders
By default, SAS treats two consecutive delimiters as one.
SAS reads from non-delimiter to delimiter, so two
delimiters in a row are considered one delimiter.
303
Missing Values without Placeholders
data airplanes4;
length ID $ 5;
infile 'raw-data-file' dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
304
Raw Data File
Execute
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
ID
$
5
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
305
.
.
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
ID
$
5
,
4 f e b 1 9 89 , ,
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
306
5 3 0
.
.
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
,
4 f e b 1 9 89 , ,
5 3 0
No data
ID
$
5
50001
307
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
530
.
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 1
ID
$
5
50001
308
,
4 f e b 1 9 89 , ,
5 3 0
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
530
.
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 2 ,
ID
$
5
50001
309
1 1 n o v 1 9 8 9, 1 3 2 ,
5 4 0
SAS loads
PDV
next record.
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
530
.
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 2,
ID
$
5
50001
310
1 1 n o v1 9 8 9, 1 3 2 ,
5 4 0
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
530
.
50002
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Automatic
Input Buffer
output
5 0 0 0 2
, 1 1n o v 1 9 8 9, 1 3 2 ,
5 4 0
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
ID
$
5
.
530
.
530
50002
Write out observation to airplanes4.
50001
311
.
10627
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
Input Buffer
5 0 0 0 2
ID
$
5
50001
312
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Automatic return
, 1 1n o v 1 9 8 9, 1 3 2 ,
5 4 0
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
10627
.
530
530
.
50002
...
Raw Data File
50001 , 4feb1989,, 530
50002, 11nov1989,132, 540
50003, 22oct1991,90, 530
50004, 4feb1993,172, 550
50005, 24jun1993,, 510
50006, 20dec1994, 180, 520
data airplanes4;
length ID $ 5;
infile 'raw-data-file'
dlm=',';
input ID $
InService : date9.
PassCap CargoCap;
run;
Input Buffer
5 0 0 0 3 ,
ID
$
5
2 2 oc t 1 9 91 , 9 0,
PDV
INSERVICE PASSCAP CARGOCAP
N
N
N
8
8
8
.
313
5 3 0
.
.
Missing Values Without Placeholders
Partial Log
NOTE: 6 records were read from the infile
'aircraft4.dat'.
The minimum record length was 21.
The maximum record length was 26.
NOTE: SAS went to a new line when INPUT
statement reached past the end of a
line.
NOTE: The data set WORK.AIRPLANES4 has 4
observations and 4 variables.
314
Missing Values Without Placeholders
proc print data=airplanes4 noobs;
run;
PROC PRINT Output
ID
50001
50003
50004
50005
In
Service
Pass
Cap
Cargo
Cap
10627
11617
12088
12228
530
90
172
510
50002
530
550
50006
The MISSOVER option did not fix this problem.
PassCap is still incorrect.
315
Are the
values
for cargo
valid?
Missing Values Without Placeholders
Missing data should be represented by a placeholder.
However, you might not have Update access to the data
and it would be a tedious process.
5 0 0 0 1 , 4feb1989
316
, . ,
5 3 0
Missing Values Without Placeholders
If your data does not have placeholders, use the DSD
option.
5 0 0 0 1 , 4feb1989
,,
5 3 0
General form of the DSD option in the INFILE statement:
INFILE 'file-name' DSD;
317
The DSD Option
The DSD option
 sets the default delimiter to a comma
 treats consecutive delimiters as missing values
 enables SAS to read values with embedded delimiters
if the value is surrounded by double quotation marks.
– Examples:
 "$1,456,300" if the delimiter is a comma
 "North Carolina" if the delimiter is a space
318
Using the DSD Option
data airplanes4;
length ID $ 5;
infile 'raw-data-file' dsd;
input ID $
InService : date9.
PassCap CargoCap;
run;
319
Missing Values Without Placeholders
Partial Log
NOTE: 6 records were read from the infile
'aircraft4.dat'.
The minimum record length was 22.
The maximum record length was 25.
NOTE: The data set WORK.AIRPLANES4 has 6
observations and 4 variables.
The numbers match:
 6 records read
 6 observations written
320
Using the DSD Option
proc print data=airplanes4 noobs;
run;
PROC PRINT Output
In
ID
Service
50001
50002
50003
50004
50005
50006
321
10627
10907
11617
12088
12228
12772
Pass
Cap
Cargo
Cap
.
132
90
172
.
180
530
540
530
550
510
520
The DSD Option
The DSD option changes the default delimiter to
a comma.
Remember
If you have a space-delimited file (or any
delimiter other than a comma), use the DSD
option and the DLM= option.
INFILE 'file-name' DSD DLM=' ';
322
Review: INFILE Statement Options
Problem
Setting delimiters
Missing data at end of row
Option
DLM='delimiter(s)'
MISSOVER
End of Record Marker inside a fixed
column data value
PAD
End of Record Marker inside a fixed
column value
TRUNCOVER
Missing data represented by
consecutive delimiters
and/or
Embedded delimiters where values
are surrounded by double quotes
DSD
These options can be used separately or together in the
INFILE statement.
323
Exercise
This exercise reinforces the concepts discussed
previously.
324
Exercises
The following raw data file TripChoices1.txt
has missing values. Create a SAS data set called
work.choices1 from this file and produce a
listing report to display the results.
The variables are StudentID, Destination,
DepartureDate, and HotelCost.
325
Exercises
data work.choices;
length Destination $ 9;
infile 'TripChoices1.txt' dsd dlm='*';
input StudentID
Destination $
DepartureDate : MMDDYY8.
HotelCost $;
run;
proc print data=work.choices noobs;
format DepartureDate MMDDYY8.;
run;
326
Exercises
The SAS System
Destination
Beach
Mountains
Beach
Mountains
Beach
Mountains
Beach
Mountains
327
Student
ID
Departure
Date
Hotel
Cost
3250
3258
2667
3259
3266
3456
3459
5487
.
05/07/06
05/06/06
05/07/07
05/06/06
05/07/06
05/06/06
05/05/06
05/06/06
05/05/06
$99.00
$125.00
$99.00
$125.00
$99.00
$125.00
$99.00
$99.00
Reading In-Stream Data
SAS can read in-stream, or embedded data, instead of an
external file. This is a good method to test code and/or
data.
Use either a CARDS statement or a DATALINES
statement within your DATA step.
328
Reading In-Stream Data
CARDS
Statement
In-stream
Data
329
data work.dfwlax;
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3.
@15 FirstClass 3. @18 Economy 3.;
cards;
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 14 31
run;
proc print data=work.dfwlax;
run;
Reading In-Stream Data
DATALINES
Statement
In-stream
Data
330
data work.dfwlax;
input @1 Flight $3. @4 Date mmddyy8.
@12 Dest $3.
@15 FirstClass 3. @18 Economy 3.;
datalines;
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 14 31
run;
proc print data=work.dfwlax;
run;
Reading In-Stream Data
You can read in-stream data with all of the input methods
discussed: Column, Formatted, and List.
When using column or formatted Input, be sure that you
start the data in the correct column.
If your INPUT statement begins with @1 or column 1, be
certain that the data starts all the way to the left in the
Editor window in column 1.
331
Reading In-Stream Data
SAS will
look for
Flight in
columns
1-3, not the
first column
data that
appears.
332
data work.dfwlax;
input @1 Flight $3. @4 Date mmddyy8. @12 Dest $3.
@15 FirstClass 3. @18 Economy 3.;
cards;
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 14 31
run;
proc print data=work.dfwlax;
run;
Reading In-Stream Data
If you want to use an INFILE statement option such as
DLM= or MISSOVER, replace the raw data file name with
a CARDS or DATALINES option.
INFILE CARDS DLM= ' , ' ;
INFILE DATALINES DSD;
333
Reading Raw Data Not from an External File
DATALINES
option
334
data work.dfwlax;
infile datalines dlm=',';
input Flight :$3. Date :mmddyy8. Dest :$3.
FirstClass :3. Economy :3.;
datalines;
43912/11/00LAX 20137
92112/11/00DFW 20131
11412/12/00LAX 15170
98212/12/00dfw 5 85
43912/13/00LAX 14196
98212/13/00DFW 15116
43112/14/00LaX 17166
98212/14/00DFW 7 88
11412/15/00LAX
187
98212/15/00DFW 14 31
run;
proc print data=work.dfwlax;
run;
Exercise
This exercise reinforces the concepts discussed
previously.
335
Exercises
Copy the data from TripChoices1.txt into
your Editor window. Read the data from inside the
DATA step, instead of from the raw data file.
The variables are StudentID, Destination,
DepartureDate, and HotelCost.
336
Exercises
data work.choices;
length StudentID $ 4 Destination $ 9;
infile datalines dsd dlm=‘*’;
input StudentID $ Destination $
DepartureDate : MMDDYY8. HotelCost : dollar10.;
Datalines;
3250*Beach*05/07/06*$99.00*
3258*Mountains*05/06/06*$125.00*
2667**05/07/07*$99.00*
3259*Beach*05/06/06*$125.00*
3266*Mountains*05/07/06*$99.00*
3456*Beach*05/06/06*$125.00*
3459*Mountains*05/05/06**
5487*Beach*05/06/06*$99.00*
*Mountains*05/05/06*$99.00*
run;
proc print data=work.choices noobs;
format DepartureDate MMDDYY10. HotelCost :dollar8.2;
run;
337
Exercises
The SAS System
Student
ID
3250
3258
2667
3259
3266
3456
3459
5487
338
Destination
Beach
Mountains
Beach
Mountains
Beach
Mountains
Beach
Mountains
Departure
Date
Hotel
Cost
05/07/2006
05/06/2006
05/07/2007
05/06/2006
05/07/2006
05/06/2006
05/05/2006
05/06/2006
05/05/2006
$99.00
$125.00
$99.00
$125.00
$99.00
$125.00
.
$99.00
$99.00
Exercise – Section 6.6
This exercise reinforces the concepts discussed
previously.
339
Section 6.7
Reading Raw Data with
the Import Wizard
Demonstration
Read raw data with the Import Wizard.
341
The IMPORT Wizard
You can bring data into SAS
using a point-and-click
interface called the Import
Wizard.
Select File  Import Data.
342
The IMPORT Wizard
The Import Wizard prompts you for what it needs to know.
Read in TripChoices.txt, which is a spacedelimited file.
343
The IMPORT Wizard
Navigate to your data by selecting Browse.
Click the OPTIONS button, which opens this window.
344
The IMPORT Wizard
What is the delimiter?
If the delimiter is not in the list, select
character and type the delimiter in the
dialog box.
If the names of the variables are in the
first row, be sure that this option is
checked. Otherwise, SAS will use
Var1, Var2, Var3, ... as the names.
What row does the data begin on?
This option tells SAS how many rows
to read to determine the data type of
the variables.
345
Using the IMPORT Wizard
Tell SAS the name of the data set that you want to create,
and what library to put it in.
The last window
enables you to
save the code
that is generated
by the Import
Wizard.
346
Using the Import Wizard
When you finish importing the data, you see a note in the
log:
NOTE: WORK.TEMP was successfully created.
If you want to copy the DATA step to save the code, or to
make changes to the DATA step, select the following:
Run  Recall Last Submit
347