SAS class - julius.csscr.washington.edu

Download Report

Transcript SAS class - julius.csscr.washington.edu

How to start using SAS
Tina Tian
The topics

An overview of the SAS system

Reading raw data/ create SAS data set

Combining SAS data sets & Match
merging SAS Data Sets

Formatting data

Introduce some simple statistical
analysis procedures
Basic Screen Navigation


Main:
 Editor
contains the SAS program to be submitted.
 Log
contains information about the processing of the SAS
program, including any warning and error messages
 Output
contains reports generated by SAS procedures and
DATA steps
Side:
 Explore
navigate to other objects like libraries
 Results
navigate your Output window
SAS programs
A SAS program is a sequence of steps that the user
submits for execution.
Data steps are typically used to create SAS data sets
PROC steps are typically used to process SAS data
sets ( that is, generate reports and graphs, sort
data and analyze data)
SAS Data Libraries

A SAS data library is a collection of SAS files that are
recognized as a unit by SAS

A SAS data set is one type of SAS file stored in a data
library

Work library is temporary library, when SAS is closed, all
the datasets in the Work library are deleted; create a
permanent SAS dataset via your own library.
SAS Data Libraries

Identify/create SAS data libraries by assigning each a
library reference name (libref) with LIBNAME statement
LIBNAME libref “file-folder-location”;
Eg: LIBNAME readData 'C:\temp\sas class\readData‘;

Rules for naming a library reference name:



The name must be 8 characters or less
The name must begin with a letter or underscore
The remaining characters must be letters, numbers or
underscores.
Reading internal raw data in SAS
system

Put small amounts of raw data directly in the
SAS program to create SAS data set, you must

Start a DATA step and name the SAS data
set being created with DATA statement

Describe how to read the data fields from the
raw data file with INPUT statement

Use the DATALINES statement to indicate
internal data

The RUN statement detects the end of a step
Reading internal raw data in SAS
system

Example:
DATA dog1;
INPUT ID Age Gender $ Income;
DATALINES;
1 10 m 2300
2 13 f 1500
3 12 f 1700
4 9 m 100
5 13 m 1000;
RUN;
Reading external raw data files into
SAS system

In order to create a SAS data set from a raw
data file, you must

Start a DATA step and name the SAS data set
being created (DATA statement)

Identify the location of the raw data file to read
(INFILE statement)

Describe how to read the data fields from the raw
data file (INPUT statement)

The RUN statement detects the end of a step
Reading external raw data file into
SAS system
LIBNAME readData “C:\temp\sas class”;
DATA readData.dog1;
INFILE “C:\temp\sas class\dog.txt”;
INPUT ID Age Gender $ Income;
RUN;




The LIBNAME statement assigns a libref ‘readData ’ to a data library.
The DATA statement creates a permanent SAS data set named ‘dog1’.
The INFILE statement points to a raw data file.
The INPUT statement
- name the SAS variables
- identify the variables as character or numeric ($ indicates character data)
- specify the locations of the fields in the raw data
- can be specified as column, formatted, list, or named input

The RUN statement detects the end of a step
Reading Delimited or PC Database
Files with the IMPORT Procedure

If your data file has the proper extension, use the simplest form of
the IMPORT procedure:
PROC IMPORT DATA FILE = ‘filename’ OUT = data-set
DBMS = identifier ;
RUN;
Type of File
Extension
Comma-delimited
Tab-delimited
Excel
Lotus Files
Delimiters other than commas or tabs

.csv
.txt
.xls
.wk1, .wk3, .wk4
DBMS Identifier
CSV
TAB
EXCEL
WK1,WK3,WK4
DLM
Examples:
PROC IMPORT DATAFILE=‘c:\temp\sale.xls’ OUT=readData.import1;
DBMS = EXCEL;
RUN;
Reading Delimited or PC Database
Files with the IMPORT Procedure

If your file does not have the proper extension, or your file is
of type with delimiters other than commas or tabs, then you
must use the DBMS= and DELIMITER= option
PROC IMPORT DATA FILE = ‘filename’ OUT = data-set
DBMS = identifier ;
DELIMITER = ‘delimiter-character’;
RUN;

Examples:
PROC IMPORT DATAFILE=‘c:\temp\sale.txt’ OUT=readData.import2;
DBMS = DLM;
DELIMITER = ‘&’;
RUN;
Reading Files with the IMPORT
Procedure

If your file does not have the proper extension, or your file
is of type with delimiters other than commas or tabs, then
you must use the DBMS= and DELIMITER= option
PROC IMPORT DATAFILE = ‘filename’ OUT = data-set
DBMS = identifier;
DELIMITER = ‘delimiter-character’;
RUN;

Example:
PROC IMPORT DATAFILE = ‘C:\sas class\readData\import2.txt’
OUT =readData.sasfile DBMS =DLM;
DELIMITER = ‘&’;
RUN;
Format in SAS data set

Standard Formats (selected):




Character: $w.
Date, Time and Datetime:
DATEw., MMDDYYw., TIMEw.d, ……
Numeric: COMMAw.d, DOLLARw.d, ……
Use FORMAT statement
PROC PRINT DATA=sales;
VAR Name DateReturned CandyType Profit;
FORMAT DateReturned DATE9. Profit DOLLAR 6.2;
RUN;
Format in SAS data set

Create your own custom formats with two steps:



Create the format using PROC FORMAT and VALUE statement.
Assign the format to the variable using FORMAT statement.
General form of a simple PROC FORMAT steps:
PROC FORMAT;
VALUE name range-1=‘formatted-text-1’
range-2=‘formatted-text-2’ ……;
RUN;

The name in VALUE statement is the name of the format you are
creating, which can’t be longer than eight characters, must not start or
end with a number. If the format is for character data, it must start
with a $.
Format in SAS data set
Exmaple:
/* Step1: Create the format for certain variables */
PROC FORMAT;
VALUE $genFmt ‘m’ = 'Male'
‘f’ = 'Female';
VALUE polFmt 1=‘likes’
2=‘dont care’
3=‘dislikes’
9=‘no answer’
RUN;
/* Step2: Assign the variables */
DATA Mydata.dog123(replace=yes);
SET Mydata.dog123;
FORMAT Gender genFmt. Policy polFmt.;
RUN;
Format in SAS data set

Permanently store formats in a SAS catalog by



Creating a format catalog file with LIB in PROC
FORMAT statement
Setting the format search options
Example:
LIBNAME Mydata ‘C:\sas class\Format’;
OPTIONS FMTSEARCH=(Mydata.dogfmt);
PROC FORMAT LIB=Mydata.dogfmt;
VALUE $genFmt m = 'Male’ f = 'Female';
RUN;

Read formats
OPTIONS nofmterr;
OPTIONS FMTSEARCH=(Mydata.dogfmt);
Combining SAS Data Sets:
Concatenating and Interleaving

Use the SET statement in a DATA step to
concatenate SAS data sets.

Use the SET and BY statements in a DATA
step to interleave SAS data sets.
Combining SAS Data Sets:
Concatenating and Interleaving

General form of a DATA step concatenation:

DATA new SAS-data-set;
SET SAS-data-set1 SAS-data-set2 …;
RUN;

Example:
DATA mydata.dog12;
SET dog1 mydata.dog2;
RUN;
Combining SAS Data Sets:
Concatenating and Interleaving

General form of a DATA step interleave:

DATA new-data-set;
SET SAS-data-set1 SAS-data-set2 …;
BY BY-variable;
RUN;

Sort all SAS data set first by using PROC SORT

Example:
PROC SORT data=dog1 OUT=dog1_sorted; BY ID; RUN;
DATA mydata.dog12;
SET dog1 mydata.dog2;
BY ID;
RUN;
Match-Merging SAS Data Sets

One-to-one match merge
One-to-many match merge
Many-to-many match merge

The SAS statements for all three types of match
merge are identical in the following form:
DATA new-data-set;
MERGE SAS-data-set-1 SAS-data-set-2 SAS-data-set-3 …;
BY by-variable(s); /* indicates the variable(s) that control
which observations to match */
RUN;
Merging SAS Data Sets: A More
Complex Example

Example: Merge two data sets acquire the names of the group
team that is scheduled to fly next week.
combData.employee
combData.groupsched
EmpID
LastName
EmpID
FlightNum
E00632
Strauss
E04064
5105
E01483
Lee
E0632
5250
E01996
Nick
E01996
5501
E04064
Waschk
/* To match-merge the data sets by common variables - EmpID, the
data sets must be ordered by EmpID */
PROC SORT data=combData.Groupsched;
BY EmpID;
RUN;
Merging SAS Data Sets: A More
Complex Example
/* simply merge two data sets */
DATA combData.nextweek;
MERGE combData.employee combData.groupsched;
BY EmpID;
RUN;
EmpID
LastJName
FlightNum
E00632
Strauss
5250
E01483
Lee
E01996
Nick
5501
E04064
Waschk
5105
Merging SAS Data Sets: A More
Complex Example


Eliminating Nonmatches
Use the IN= data set option to determine which dataset(s)
contributed to the current observation.
General form of the IN=data set option:
SAS-data-set (IN=variable)



Variable is a temporary numeric variable that has two
possible values:
0 indicates that the data set did not contribute to the
current observation.
1 indicates that the data set did contribute to the
current observation.
Merging SAS Data Sets: A More
Complex Example
/* Exclude from the data set employee who are not scheduled to
fly next week. */
LIBNAME combData “K:\sas class\merge”;
DATA combData.nextweek;
MERGE combData.employee
combData.groupsched (in=InSched);
BY EmpID;
IF InSched=1;
True
RUN;
EmpID
LastJName
FlightNum
E00632
Strauss
5250
E01996
Nick
5501
E04064
Waschk
5105
Merging SAS Data Sets: A More
Complex Example
/* Find employees who are not in the flight scheduled group. */
LIBNAME combData “K:\sas class\merge”;
DATA combData .nextweek;
MERGE combData .employee (in=InEmp)
combData.groupsched (in=InSched);
BY EmpID;
IF InEmp=1;
True
IF InSched=0;
False
RUN;
EmpID
LastJName
E01483
Lee
FlightNum
Different Types of Merges in SAS

One-to-Many Merging
Work.one
Work.two
X
E
X
Y
1
A1
1
A
1
A2
2
B
2
B1
3
C
3
C1
3
C2
Work.three
DATA work.three;
MERGE work.one work.two;
BY X;
RUN;
X
Y
Z
1
A
A1
1
A
A2
2
B
B1
3
C
C1
3
C
C2
Different Types of Merges in SAS

Many-to-Many Merging
Work.one
Work.two
X
Z
X
Y
1
AA1
1
A1
1
AA2
1
A2
1
AA3
2
B1
2
BB1
2
B2
2
BB2
Work.three
DATA work.three;
MERGE work.one work.two;
BY X;
RUN;
X
Y
Z
1
A1
AA1
1
A2
AA2
1
A2
AA3
2
B1
BB1
2
B2
BB2
Some simple analysis procedure

The PRINT Procedure

The CONTENTS Procedure

The FREQ Procedure

The SORT Procedure

The MEANS Procedure

The CORR Procedure

The TTEST Procedure

The ANOVA Procedure
The PRINT Procedure


The PRINT procedure prints the observations in a SAS data
set.
General form of a simple PROC PRINT steps:
PROC PRINT DATA = SAS-data-set;
VAR variable(s) <option>;
SUM variable(s) <option>;
RUN;


The VAR statement specifies which variables to print and the order
The SUM statement indicates the total values of numeric variables
The Contents Procedure

The CONTENTS procedure shows the contents of a
SAS data set and prints the directory of the SAS data
library

General form of a simple PROC CONTENTS steps:
PROC CONTENTS DATA = SAS-data-set;
RUN;
The SORT Procedure

The SORT procedure orders SAS data set observations by
the values of one or more character or numeric variables.

General form of a simple PROC SORT steps:
PROC SORT DATA = SAS-data-set;
BY <DESCENDING> variable-1 <...<DESCENDING> variable-n>;
RUN;
The MEANS Procedure

The MEANS procedure provides descriptive
statistics for variables across all observations

General form of a simple PROC MEANS steps:
PROC MEANS DATA = SAS-data-set;
CLASS variable(s) </ option(s)>;
VAR variable(s)
RUN;
The FREQ Procedure

The FREQ procedure produces one-way to n-way
frequency and crosstabulation (contingency) tables

General form of a simple PROC FREQ steps:
PROC FREQ DATA = SAS-data-set;
TABLE requests < / options > ;
RUN;

The TABLES statement requests one-way to n-way frequency
and crosstabulation tables and statistics for those tables
The TTEST Procedure


The TTEST procedure performs t tests for one sample, two
samples, and paired observations.
General form of a simple PROC FREQ steps:


PROC TTEST DATA = SAS-data-set H0=m;
VAR variable(s);
RUN;
PROC TTEST DATA = SAS-data-set;
VAR variable(s);
CLASS variable;
RUN;
• use H0 option to a given number in the one sample t test
• use CLASS statement in the two groups comparison t test
The ANOVA Procedure

The ANOVA procedure performs one-way analysis of
variance (ANOVA) for balanced data

General form of a simple PROC FREQ steps:
PROC ANOVA DATA = SAS-data-set;
CLASS variable(s) </options>;
MODLE dependents = effects <options>;
RUN;
Some simple analysis procedure

The UNIVARIATE Procedure

The REG Procedure

The LOGISTIC Procedure
The UNIVARIATE Procedure


The UNIVARIATE procedure provides descriptive
statistics, histograms, quartile - quartile plots (Q-Q plots)
and probability plots
General form of a simple PROC FREQ steps:
PROC UNIVARIATE DATA = SAS-data-set;
VAR variables;
HISTOGRAM;
QQPLOT;
RUN;
The REG procedure



The REG procedure is one of many regression
procedures in the SAS System.
The REG procedure allows several MODEL
statements and gives additional regression
diagnostics, especially for detection of collinearity. It
also creates plots of model summary statistics and
regression diagnostics.
PROC REG <options>;
MODEL dependents=independents </options>;
PLOT <yvariable*xvariable>;
RUN;
An example


PROC REG DATA=water;
MODEL Water = Temperature Days Persons / VIF;
MODEL Water = Temperature Production Days / VIF;
RUN;
PROC REG DATA=water;
MODEL Water = Temperature Production Days;
PLOT STUDENT.* PREDICTED.; /*To get studentized Residual */
PLOT STUDENT.* NPP.; /*To get Normal Cumulative Distribution*/
PLOT r.*nqq.; /*Produce normal Q-Q plot */
RUN;
The LOGISTIC procedure

The binary or ordinal responses with continuous
independent variables
PROC LOGISTIC < options > ;
MODEL dependents=independents < / options > ;
RUN;

The binary or ordinal responses with categorical
independent variables
PROC LOGISTIC < options > ;
CLASS categorical variables < / option > ;
MODEL dependents=independents < / options > ;
RUN;
Example
PROC LOGISTIC data=Mydata2.pain;
CLASS Treatment Sex;
MODEL Pain= Treatment Sex Treatment*Sex Age Duration;
RUN;