Chapter 2 in the Little SAS Book

Download Report

Transcript Chapter 2 in the Little SAS Book

Modifying and Combing SAS
Data Sets
(Chapter in the 6 Little SAS Book)
Animal Science 500
Lecture No. 6
September 16, 2010
IOWA STATE UNIVERSITY
Department of Animal Science
Modifying Variable Lengths

The code below uses only 3 character spaces for the
county by specifying the length of the location variable

You can also use this with numeric variables
data new;
length location $3;
input location $ 1-15 date $ rainfall;
Cards; /*remember you could use the datalines statement to do
the same thing);*/
Story county
6/10/10 4.5
Polk county
6/10/10 6.5
Story county
7/10/10 4.9
Polk county
7/10/10 2.4
;
Run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
Creating new variables
 You
can create variables in the initial data step
where you are inputting the data or you can
use a new data step to create the variables.

As we did when we created
1.
2.
3.

Adjusted backfat variables for barrows and gilts
Total weight gain during the test period
Average daily gain during the test period
You do not have to make calculations when
making new variables.

You could divide backfat, loin muscle are, gain, etc
into categories to evaluate if model effects differ by the
classes you developed
IOWA STATE UNIVERSITY
Department of Animal Science
Creating new variables
you choose to create a new data step – you
will need to either create a new file in SAS or
modify the existing file.
 If
Example what we have done in Lab by the following
statements
Data Pig13 Set Pig12;


Alternatively you could create a new file in SAS


If you create a new file you may need to merge it with the
original data file;
This is a place where students often have difficulty
IOWA STATE UNIVERSITY
Department of Animal Science
Creating New Variables in SAS

Both of these options will
result in a data set named
“New” with all the variables
that have been defined

This option creates the
variables lograinfall and
sqrtrainfall in the initial data
step

In the second set of code
you are creating a new file in
SAS and naming it “New”
the set statement tells SAS
to Assign the data from the
first “New” to this file “New”
data new;
input @1 location $ 1-15 date mmddyy8.
rainfall;
log_rain = log(rainfall);
sqrt_rain= sqrt(rain);
datalines;
Story county 6/10/10 4.5
Polk county 6/10/10 6.5
Story county 7/10/10 4.9
Polk county 7/10/10 2.4
;
run;
data new; set new;
log_rain = log(rainfall);
sqrt_rain = sqrt(rain);
run;
IOWA STATE UNIVERSITY
Department of Animal Science
Creating New Variables in SAS

SAS has many other functions to perform various calculations for
trigonometry, finance, and other applications.
 Some examples assuming x is the variable you want to modify:
 Log = log(x)
 Sin = sin(x)
 Cos = cos(x)
 As we will see in the next lab, you may find the distribution does not
meet the assumptions of the analysis of variance
 Will need to transform data often using some of the

Use SAS help to find the correct notation

Some helpful search hints are:
 Search under SAS Functions, Arithmetic Functions, Numeric
Variables, Logical Operators
IOWA STATE UNIVERSITY
Department of Animal Science
Operators

Recall from previous discussions

Addition, subtraction, multiplication, and division are
specified by +, -, *, and /, respectively.

For exponentiation, a double asterisk ** is used.


exprainfall = exprainfall**2
Parentheses can be used to group expressions, and
these expressions can be nested within several levels.
SAS follows the standard PEMDAS order, () ** * / + - ,
for evaluating functions.
IOWA STATE UNIVERSITY
Department of Animal Science
Logical Operators

SAS can also evaluate
logical expressions
 < ,>, =, if, then, else,
else if, and (&), or(|),
not (^)…
 Search Logical
Operators in SAS
Help
data new;
input @1 location $ 1-15 date mmddyy8. rainfall;
*creating a new variable based on location;
if location = “Story county" then x = rainfall +5;
else x = 5;
*creating a new variable based on level of
rainfall;
if rainfall < 3 then y=1;
else if rainfall < 4.9 then y=2;
else y = 3;
datalines;
Story county 6/10/10 4.5
Polk county 6/10/10 6.5
Story county 7/10/10 4.9
Polk county 7/10/10 2.4
;
Run;
Quit;
data new; set new;
log_rain = log(rainfall);
sqrt_rain = sqrt(rain);
run;
IOWA STATE UNIVERSITY
Department of Animal Science
Other ways to Modify Variables - Do Loops

DO loops can be used to create an ordered sequence of numbers.

Below is an example of a do loop in SAS
The program "loops" through the values of Q from 1 to 5 and
performs the calculations requested for the current value of Q. The
OUTPUT statement tells SAS to export Q and the new variables to
the dataset EXAMPLE. The END statement signifies the end of the
loop. An END statement is necessary for each DO statement! Notice
that neither INPUT nor DATALINES statements are used.

data do; set new;
do q=1 to 5;
q_rain=q*rainfall;
q_rainsquared=q_rain**2;
output;
end;
proc print data = do;
run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
Modifying your data using PROC Transpose
 Sometimes
you need to reshape your data
which is in a long format (shown below)
FamID
Year
FamInc
1
2007
40000
1
2008
40500
1
2009
41000
2
2007
45000
2
2008
45400
2
2009
45800
3
2007
75000
3
2008
76000
3
2009
77000
IOWA STATE UNIVERSITY
Department of Animal Science
Modifying your data using PROC Transpose
 into
a wide format (shown below).
FamID
FamInc07
FamInc08
FamInc0
1
40000
40500
41000
2
45000
45400
45800
3
75000
76000
77000
IOWA STATE UNIVERSITY
Department of Animal Science
Modifying your data using PROC Transpose
 How
do we accomplish this?
 SAS
proc transpose to reshape the data from a
long to a wide format.
IOWA STATE UNIVERSITY
Department of Animal Science
Modifying your data using PROC Transpose
data long1 ;
input famid year faminc ;
cards ;
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000
;
run;
quit;
proc transpose data=long1 out=wide1
prefix=faminc; by famid ; id year; var famin;
run;
quit;
proc print data = wide1;
run;
quit;
Notice that the option prefix=
faminc specifies a prefix to use in
constructing names for
transposed variables in the output
data set. SAS automatic variable
_NAME_ contains the name of the
variable being transposed
IOWA STATE UNIVERSITY
Department of Animal Science
Modifying your data using PROC Transpose
 What
 SAS
does this get you?
output that looks like the following
Obs famid _Name_ faminc96
faminc97
1
1
faminc
40000
40500
41000
2
2
faminc
45000
45400
45800
3
3
faminc
75000
76000
77000
IOWA STATE UNIVERSITY
Department of Animal Science
faminc98
Removing Observations
 When
performing statistical calculations, SAS,
by default, uses all of the observations that are
in the dataset.
 You
can selectively delete observations that
you do not wish to use with:


IF statements - specify which observations to keep,
IF and THEN DELETE will delete observations.
IOWA STATE UNIVERSITY
Department of Animal Science
Removing Observations
 Both
statements result in a file with only data
from Story County
data new; set new;
if location = “Story county";
Run;
Quit;
data new; set new;
if location = “Polk county"
then delete;
run;
IOWA STATE UNIVERSITY
Department of Animal Science
Removing Variables
 You
may only need to use a few variables from
a larger dataset. This can be done with KEEP
or DROP statements.
 For
many datasets, you can keep unneeded
variables in the dataset, and SAS can handle
them with ease.


This will be the case for most of you dealing with
several hundred or even several thousand observations
Those dealing with very large data sets might benefit
from using the Keep or Drop statements

Caution be sure you are thinking ahead so that you do not drop
a variable needed later in some calculation or function
IOWA STATE UNIVERSITY
Department of Animal Science
Removing Variables
 Let’s
say we wanted to keep only the location,
x and y variables from the “new” file.
 Both
task.
data steps below will accomplish the
data subset; set new;
drop date rainfall;
run;
data subset; set new;
keep location x y;
run;
IOWA STATE UNIVERSITY
Department of Animal Science
Combining Datasets

There are two different ways to combine datasets
using the SET and MERGE statements in a Data Step.

The SET statement is used to add observations to an
existing dataset.



This is what we have done in lab
We have developed new variables using calculations with existing
data and have adjusted current data in the data set.
Data Pig12 Set Pig12;


ADG = (OFFWT – ONWT) / DOT;
Consider the following example, using monthly rainfall
totals in Gainesville in 1995 and 1996. When used to
combine observations into one dataset, the SET
command works as follows:
IOWA STATE UNIVERSITY
Department of Animal Science
Combining Datasets – Set
data rain95;
input month rainfall @@; year=1995;
datalines;
1 3.08 2 1.07 3 6.14 4 5.18 5 2.47 6 7.55 7 7.66 8 7.20 9 2.10 10 4.33 11 3.15 12 1.29
;
run;
quit;
data rain96;
input month rainfall @@; year=1996;
datalines; 1 0.97 2 0.66 3 10.52 4 1.72 5 2.01 6 6.05 7 11.00 8 4.90 9 2.23 10 6.18 11
1.73 12 6.63
;
run;
quit;
data rain9596; set rain95 rain96;
run;
quit;
IOWA STATE UNIVERSITY
Department of Animal Science
Combining Datasets
 The
MERGE statement adds the variables in
one dataset to another dataset.

Consider the following example using
Southern teams in the National Basketball
Association:
IOWA STATE UNIVERSITY
Department of Animal Science
Combining Datasets - Merge
data nba1;
input @1 city $11. @11 division $;
Cards;
Orlando
Atlantic
Miami
Atlantic
Atlanta
Central
Charlotte Central
;
run; quit;
data nba2;
input mascot $ @@;
Cards;
Magic
Heat
Hawks
Hornets
;
run; quit;
data nba3;
merge nba1 nba2;
run; quit;
Proc Print;
run; quit;
IOWA STATE UNIVERSITY
Department of Animal Science
Combining Datasets – Merge Output Results
Obs
city
division
mascot
1
Orlando
Atlantic
Magic
2
Miami
Atlantic
Heat
3
Atlanta
4
Charlotte
Central
Central
IOWA STATE UNIVERSITY
Department of Animal Science
Hawks
Hornets
Combining Datasets
 In
the last example, the observations were
ordered so that each dataset corresponds to
the other. You will often need to put datasets
together based on values of variables which
are included in both datasets. To do this,
both datasets must first be sorted in order by
the common variable or variables.
IOWA STATE UNIVERSITY
Department of Animal Science
Combining Datasets
Proc sort sorts the dat
file using the variable
chosen in the by state
we will discuss Proc S
more detail later on
IOWA STATE UNIVERSITY
Department of Animal Science
Combining Datasets with Merge -Output
The SAS System 10:16 Saturday, September 11, 2010 14
Obs
city
division
mascot
1
Atlanta
Central
Hawks
2
Charlotte
Central
Hornets
3
Miami
Atlantic
Heat
4
Orlando
Atlantic
Magic
IOWA STATE UNIVERSITY
Department of Animal Science