Biostat 605 Week 1 Notes - Biostatistics and Risk

Download Report

Transcript Biostat 605 Week 1 Notes - Biostatistics and Risk

EPIB 698C Lecture2
Instructor: Raul Cruz-Cano
[email protected]
7/9/2012
1
Creating Data in SAS, an overview



Creating datasets by hand entry (Viewtable
window, CARDS, DATALINES statement)
Reading dataset from external files (not SAS
data, INFILE statement)
Using Import/Export facility (a point-and-click
approach)
2
Entering Data with Viewtable window


To open viewtable window, select “Table Editor”
from the Tools menu. An empty viewtable window
will appear
The letters at the tops of columns are default
variable names. Right click on the letter and open
the Column attributes window. You can replace
variables name, type, etc
3
Entering Data with Viewtable window




Entering you data once you have defined your columns.
To save your table, select “Save as” from the File menu,
then select a library and specify the name of your table
(SAS dataset)
To open an existing table, go to ToolsTable editor, the
view table window will be opened. Then go the File menu,
click Openchoose libraryselect Table name. To switch
from browse mode (default) to edit mode, select “Edit
Mode” from the Edit menu
Using Viewtable you can easily create a data table by
setting the columns, you can add rows to the table.
However, you can not add columns once you finish
defining your columns. This is a considerable
disadvantage.
4
Reading Data Inline, CARDS Statement


You enter the actual data points inside the PROGRAM
EDITOR
Example: CARDS statement
data instructor;
input name $ gender $ age;
cards;
Jane F 30
Mary F 29
Mike M 28
;
run;
5
Examples

Reading multiple observations in each line of data by
adding the @@ symbol at the end of the input statement
data aaa;
input x y @@;
datalines;
1 12.4 1 11.3 1 1.4 2 2.1 2 19.4 1 10.0
2 13.2 1 11.6
;
run;
6
Reading Dataset from External Files: INFILE
Statement



Identifies the external file that contains the data and
has options that control how the records in file are
read into SAS
Must be used before the input statement because it
locates the data file to be read
Syntax
data data_set_name;
Infile directory_and_file_name;
input variable_list;
run;
7
Reading raw data separated by spaces—list
input


List input (also called free formatted input) can read data
separated by at least one space. By default, SAS assumes
data values are separated by one or more blanks.
Will read all the data in a record, no skipping unwanted
values

Any missing data must by indicated with a period

Character data must be simple, no embedded spaces
8
Reading raw data separated by spaces—list
input
Example:
data demographics;
infile ’F:\teaching\SAS\lecture2\mydata.txt';
input Gender $ Age Height Weight;
run;
 $ sign after gender means that gender is a character
variable
9
Specify missing values with list input

We use a period to represent missing values
M 50 168 155
M 50 168 155
F 23 160 101
M 65 172 220
F 35 165 133
M 15 171 166
F
M
F
M
.
65
35
15
160 101
172 220
165 133
171 166
10
Reading raw data separated by commas

Comma separated values file (csv file) use commas as
data delimiters

They may or may not enclose character values in quotes.

Example: mydata.csv
"M",50,68,155
"F",23,60,101
"M",65,72,220
"F",35,65,133
"M",15,71,166
11
Reading raw data separated by commas
data demographics;
infile 'c:\books\learning\mydata.csv' dsd;
input Gender $ Age Height Weight;
run;
Dsd: means delimiter sensitive data. It has several functions.
(1)change the default delimiter from a blank to a comma.
(2)If there are two delimiters in a row, it assumes there is a
missing value in between
(3)If character values are placed in quotes, the quotes are
stripped from the value
12
INFILE statement, useful options
DSD
 It recognizes two consecutive delimiters as a missing
value.
Example: 20,30,,50, SAS will treat this as 20 30
50 . but with the the dsd option SAS will treat it as
20 30 . 50 .

It allows you to include the delimiter within quoted
strings.
Example: a comma separated file and your data
included values like "George Bush, Jr.”
With the dsd option, SAS will recognize that the
comma in "George Bush, Jr." is part of the name, and
not a separator indicating a new variable.
13
INFILE statement (cont.)
DLM=
The dlm= option can be used to specify the delimiter that
separates the variables in your raw data file.
dlm=‘,’ indicates a comma is the delimiter (e.g., a
comma separated file, .csv file).
dlm='09'x indicates that tabs are used to separate
your variables
dlm=‘:’ indicates a colon is the delimiter
14
INFILE statement (cont.)

We can use dsd and dlm at the same time:
infile ‘file-description’ dsd dlm=‘:’ ;
This combination of options performs all the actions
requested by the DSD option, but overrides the default
delimiter (comma) with a delimiter of your choice
15
INFILE statement (cont.)

useful options continued…



missover: if number of variables in file does not match
number of variables in input file all remaining variables
are set to missing.
obs: specifies the last record to be read into the data
set
firstobs: specifies the first line of data to be read into
data set. Useful if there is a header row in the dataset.
16
Filename statement

Filename statement identifies the file and associate it with
a reference name then use this reference in your INFILE
statement instead of the actual file name
filename mydata_file
'F:\teaching\SAS\lecture2\mydata.csv';
data demographics;
infile mydata_file dsd;
input Gender $ Age Height Weight;
run;
17
Specifying INFILE options with the
DATALINES statement
data demographics;
infile datalines dsd;
input gender $ Age Height Weight;
datalines;
"M",50,68,155
"F",23,60,101
"M",65,72,220
"F",35,65,133
"M",15,71,166
;
run;
18
Reading data from fixed columns


Many raw data files store specific information in fixed
columns
The advantage of fixed column files:
(1) don’t need to worry about missing values
(2) you can choose which variables to read and in what
order to read them
19
Bank data
123456789012345678901
00110/ 21/1955M 1145
00211/ 18/2001F 18722
Column
Column
Column
Column
columns
1-3 : subject ID
4-13: Date of birth
14-14: gender
15-21: Account balance
20
Column Input
data financial;
infile ‘f:\SAS\learning\bank.txt';
input Subj
$
1-3
DOB
$ 4-13
Gender
$
14
Balance
15-21;
run;
21
Formatted Input



Formatted input can read both character and standard
numerical values as well as nonstandard numerical values,
such as numbers with dollar signs and commas, and dates.
Formatted input is the most common and powerful of all
input methods
SAS Formats and Informats: An informat is a
specification for how raw data should be read. A format is
a layout specification for how a variable should be printed
or displayed.
22
Formatted Input
The bank data:
data financial;
infile ‘F:\teaching\SAS\bank.txt';
input
@1 Subj
$3.
@4 DOB
mmddyy10.
@14 Gender
$1.
@15 Balance
7.;
run;

23
Formatted Input


The @ sign the INPUT statement are called
column pointers. @4 tells SAS to go to column 4.
Following variable names are SAS informats.
Informats are built-in instructions that tell SAS
how to read a data value
24
Formatted Input



Two of the most basic informats are w.d and $w.
The w.d informat reads standard numeric values. The w
tells SAS how many columns to read; the optional d tells
SAS that there is an implied decimal point in the value.
For examples: data value is 123,
With informat 3.0, SAS will save it as 123;
With informat 3.1, SAS will save it as 12.3;
If the data value already has a decimal in it, then SAS
ignores the d option.
For examples: data value is 1.23,
With informat 4.1, SAS will it as 1.23;
25
Formatted Input



The $w. Informat tells SAS to read w columns of character
data
The MMDDYY10. informat tells SAS that the date you are
reading is in the mm/dd/yyyy form. SAS reads the date and
converts the value into a SAS date.
SAS stores dates as numeric values equal to the number of
days from January 1, 1960. Eg, if you read 01/01/1960,
SAS stores a value of 0. The data 01/02/1960 is stored as a
value of 1.
26
The format statement


The format statements are built-in SAS command that
allow you to display data in easily readable ways.
All SAS formats command ends either in a period or in a
period follows by a number.
title "Listing of FINANCIAL";
proc print data=financial;
format
DOB
mmddyy10.
Balance dollar11.2;
The dollar11.2 tells SAS to put a $ sign in
front of the number, and allow up to 11
columns to print the balance values, the 2
tells SAS to include two decimal places
after the decimal points.
27
Using a format/informat statement in a DATA
step


It is usually more useful to place your format
statement with a SAS data step. There is a
permanent association of the formats and
variables in the data set.
You can override any permanent format by
placing a FORMAT statement in a particular
procedure.
28
A informat statement with list input

Following the key word informat, you list each variable and
information you want to use to read each variable
data list_example;
informat Subj
$3.
Name
$20.
DOB
mmddyy10.
Salary dollar8.;
infile 'c:\books\learning\list.csv' dsd;
input Subj Name DOB Salary;
format DOB date9. Salary dollar8.;
29
data list_example;
infile 'c:\books\learning\list.csv ' dsd;
input Subj :
$3.
Name :
$20.
DOB : mmddyy10.
Salary : dollar8.;
format DOB date9. Salary dollar8.;
(1)
there is a colon (called an informat modifier) preceding of each informat.
It tells SAS to use informat supplied but to stop reading the values for
this variable when a delimiter is met.
(2) Without it, SAS may read past a delimiter to satisfy the width specified in
the informat.
30
What are the Differences




Informats used at input
Usually reading external data
Formats used during output cycle
Write formatted value to output
Selected date informats
Informats
Data form
Input data
Input statement
Datew.
ddmmmyyyy
ddmmmyy
1Feb1961
Input date Date9.
1Feb61
Input date Date7.
ddmmyy
01-02-61
Input date ddmmyy8.
ddmmyyyy
01/02/1961 Input date ddmmyy10.
mmddyy
02-01-61
mmddyy
02-01-1961 Input date mmddyy10.
DDMMYYw.
MMDDYYw.
Input date mmddyy8.
Selected date formats
formats
Data
displayed
Input data
format
statement
Results
Datew.
ddmmmyyyy
366
format date
Date9.
01Jan1961
ddmmmyy
366
format date
Date7.
01Jan61
366
format date
mmddyy8.
01/01/61
366
format date
mmddyy10.
01/01/1961
MMDDYYw. mmddyy
mmddyyyy
More examples of informat for numeric
data
Informat
Definition
Input
data
INPUT
statement
Results
COMMAw.d
Removes embedded
commas and $,
converts left
parentheses to
minus sign
$1,000
Input
income
comma6.0;
1000
(1,234) Input
income
comma7.0;
-1234
Percentw.
Converts percent to
numbers
(20%)
Input value
Percent5.;
-0.2
w.d
Reads standard
numeric data
-12.3
Input value
5.1;
-12.3
More examples of informat for character
data
Informat
Definition
Input data
INPUT
statement
Results
$CHARw.
Reading character
data, do not trim
leading or trailing
blanks
my cat
Input
animal
$char10.;
my cat
$w.
my cat
Read character data, my cat
trim blanks
my cat
Input
animal
$char10.;
my cat
Input
animal
$char10.;
my cat
Input
animal
$char10.;
my cat
Import/Export Data


To Export SAS datasets
 Go to the File menu and select “Export Data”
 Choose the data file ( from the library Work)
 Locate and select file type using the browse button
 Save the data set and finish
 Check the log to make sure the data set was created
 This method does not require a data step, but any
modification may require a data step
 Convenient for Excel file
Import a SAS data set follows similar step
36
Read data in Excel file


(1) Use Import procedure
FileImport dataChoose Microsoft ExcelClick
“Next”Select work sheet using browseselect “Table”
you want to importClick “Next”Select Library and
assign a file nameClick “Next” if you want Proc import
SAS code generated, otherwise click “Finish” Check the
log window to make sure the data was created
successfully.
(2) Save Excel file as a csv file, then read in using the
infile statement with dsd option.
37
Read existing SAS data:Libname and set
statement




libname epib
"D:\SAS_summer10\lecture2_3\lecture_data";
data new;
set epib.list_example2;
run;
38