No Slide Title

Download Report

Transcript No Slide Title

Chapter 2: Getting Data into
SAS


Data is stored in many different formats
Here are four categories of methods to
read in data
1. Entering data directly through the keyboard
(small data sets)
2. Creating SAS data sets from raw data lines
© Fall
2011 John Grego and the University of South Carolina
1
Getting Data into SAS

Here are four categories of methods to
read in data
3. Converting other software’s data files (e.g.,
Excel) into SAS data sets (my favorite!)
4. Reading other software’s data files directly
(often requires additional SAS/ACCESS
products)
© Fall
2011 John Grego and the University of South Carolina
2
Import Window



Allows you to import various types of
data files (Microsoft Excel formats)
The default is for the first row to be
variable names. Use Options… to
change.
Options… button also selects worksheet
from workbook
© Fall
2011 John Grego and the University of South Carolina
3
Import Window



Work library--the data set is deleted after
exiting SAS
Other libraries—data set saved (but not
necessarily library location) after exiting
SAS
Can save PROC IMPORT statement
used to import data
© Fall
2011 John Grego and the University of South Carolina
4
Reading in Raw Data

Type data directly into
a SAS program using
the statements that
appear to the right
© Fall
cards;
datalines;
lines;
2011 John Grego and the University of South Carolina
5
Reading in Raw Data



If your raw data is an external text file,
you could use an INFILE statement to
tell SAS where it is.
Specify the full path name
If your lines are longer than 256
charcters, use LRECL=..
6
Data Separated by Spaces



This style is called “free format” since the
number of spaces in between variables is
flexible
Use the INPUT statement to name
variables
Include a $ after names of character
variables
7
Data Arranged in Columns



Knowledge of this approach is particularly
critical for large data sets
Each value of a variable is found at the
same spot on the data line
Advantages:
1. Don’t need spacing between values
2. Missing values don’t need a special symbol
(can be blank)
8
Data Arranged in Columns

Advantages:
Example:
3. Character data can
have blanks
4. You can skip
variables you don’t
need to read into
SAS
INPUT var1 1-10
var2 11-15 var3
$ 16-30;
9
Data Not in Standard Format

Types of non-standard data:
1. Numbers with commas or dollar signs
2. Dates and times of day


We can read nonstandard data using
codes known as informats
Most informats end in . (so SAS won’t
confuse them with a variable)
10
Data Not in Standard Format


Import from Excel often assigns informats
automatically
Pages 44-45 list many SAS informats
(Note that date informats are converted to
a numerical value—Julian date)
11
Other Inputting Issues

You can mix input
styles: read in some
variables list-style;
others column-style;
others using
informats; even the
order can be shuffled;
E.g, you can explicity
move the pointer to a
specific column
number:
@50 moves the pointer
to the 50th column

12
Messy Data
colon modifier: Tells
SAS exactly how
many columns a
variable occupies, but
stops when it reaches
a space
 This method is not
appropriate for
characters with
embedded spaces


Example: Deptname
:$15. tells SAS to
read Deptanme for
15 characters or until
it reaches a space.
13
Multiple Lines of Data per
Observation



Sometimes each observation will be on
several lines in the raw data file (census
data, standardized test scores, etc)
Use / to tell SAS when to go to the next
line
Or use #2, for example, to tell SAS to go
to the 2nd line of the observation
14
Multiple Records per Line of
Raw Data



Sometimes several observations will be on
one line of data
This is common for textbook exercises
Use @@ to tell SAS to keep the pointer on
the raw data line and prepare to read the
next observation
15
Reading Part of a Data File


Sometimes we want to modify data input
based on values of one variable
We can read just the first variable(s) using
the @ sign
16
Reading Delimited Files


These instructions have been almost
completely subsumed by the Import
Wizard and the widespread use of Excel
DLM= specifies alternative delimiters
– Comma: DLM=‘,’
– Tab: DLM=‘09’X
– #: DLM=‘#’
17
Reading Delimited Files


Two delimiters in a row are read as a
single delimiter
What if two commas in a row indicate a
missing value? Or data values contain
commas?
– Can use DSD option
– Data values with commas must be in quotes
– Use options other than the default
(DLM=‘,’)
18
SAS data sets: Temporary and
Permanent


Data sets stored in Work library are
temporary (removed upon exiting SAS)
Data sets stored in other libraries are
permanent (will be saved upon exiting
SAS)
19
SAS data sets: Temporary and
Permanent

You can specify the
library when creating
a data set in the
DATA step
Example: Suppose you
have a library called
sportlib (this is a
“libref”)
DATA
sportlib.baseball;
creates a data set
baseball to be stored in
the permanent sportlib
library
20
SAS data sets: Temporary and
Permanent
DATA
work.baseball;
stores baseball in the
temporary work
library
DATA baseball;
stores baseball in the
default temporary
work library
21