Public Health 5415 Biostatistical Methods II Spring 2004

Download Report

Transcript Public Health 5415 Biostatistical Methods II Spring 2004

Public Health 5415
Biostatistical Methods II
Spring 2005
Greg Grandits
[email protected].
612-626-9033
Class Times
Monday
10:10am-12:05pm
Wednesday 10:10am-11:00am
Course objectives:
• Write and run simple SAS programs to
perform common analyses.
• Analyze health science data using basic
statistical and inferential techniques.
• Understand statistical methods as
commonly presented in public health
literature
Topics Covered
•
•
•
•
•
Linear regression
Logistic regression
Life-table analyses
Cox regression
Relative risk, odds ratio, hazard ratio
estimation
SAS programming to do above analyses
SAS Usage
• SAS is the worlds largest privately held
software company
• 40,000 customer sites worldwide
• 3.5 million users worldwide
• 90% of Fortune 500 companies use SAS
• Nearly all analyses of publications in
medical research use SAS
• SAS invests extensive resources to R & D.
Why SAS?
• It is widely used
– Industry, government, and academia
• It is very powerful
– programming language
– sophisticated analyses (better than Excel)
JAMA January 12, 2005
Meat Consumption and Risk of Colorectal Cancer, Chao
Colon and rectal cancer incidence rate ratios (RRs) and 95% CIs by meat intake
were estimated using Cox proportional hazards regression modeling. P values for
linear trend were estimated by modeling meat intake (g/wk) using the median
value within quintiles; these results were similar when modeled as continuous
variables.
All P values were 2-sided and considered significant at P<.05. All analyses were
conducted using SAS version 9.0 (SAS Institute Inc, Cary, NC).
Consumption of Veg/Fruits and Risk of Breast Cancer
All analyses were performed using SAS version 8 (SAS Institute Inc, Cary, NC). All
tests were 2-sided with an {alpha} of .05.
JAMA January 12, 2005
Fasting Serum Glucose Level and Cancer Risk in Korean Men
and Women
Age-adjusted death and cancer incidence rates were calculated for each
category of fasting serum glucose level and directly standardized to the
age distribution of the 1995 Korean national population. All analyses were
stratified by sex.
All analyses were conducted using SAS statistical software, version 8.0
(SAS Institute Inc, Cary, NC).
Details
http://www.biostat.umn.edu/~greg-g/ph5415.html
– Homework, readings, programs, data files
– Class slides
Lab/Office hours
• 4 hours per week (TA or instructor)
Details
Text books:
Applied Statistics and the SAS Programming Language, RP
Cody and JK Smith
(Read Chapter 1 for next week)
Introductory Biostatistics, CT Le
The Little SAS Book, LD Delwiche and SJ Slaughter
(Chapter 1 available on website)
Grading
Homework - 30%
(half credit for late homework, can turn in no later than 2 weeks after
due date)
Two tests - 30% each
Short project - 10%
No final exam
Using SAS
SAS is available several ways:
• In the Mayo A-269 (TRC) lab
• Other PCs with SAS
• From biostatistics UNIX computer via
telnet
• Purchase from the University
152 Shepherd Labs (ADCS)
612-625-1300
$150 per year
What is SAS ?
• SAS is a programming language that reads,
processes, and performs statistical analyses of data.
• A SAS program is made up of programming
statements which SAS interprets to do the above
functions.
Raw Data
Read in Data
Process Data
(Create new variables)
Data Step
Output Data
(Create SAS Dataset)
Analyze Data Using Statistical
Procedures
PROCs
Structure of Data
• Made up of rows and columns
• Rows in SAS are called observations
• Columns in SAS are called variables
An observation is all the information for one
entity (patient, patient visit, clinical center,
county)
SAS processes data one observation at a time
Example of Data
12 observations and 5 variables
F
F
F
F
F
F
M
M
M
M
M
M
23
21
22
35
22
25
20
26
27
23
21
29
S
S
S
M
M
S
S
M
S
S
S
M
15
15
09
02
13
13
13
15
05
14
14
15
MN
WI
MN
MN
MN
WI
MN
WI
MN
IA
MN
MN
•Gender
•Age
•Marital status
•Number of credits
•State of residence
* This is a short example program to demonstrate what a
SAS program looks like. This is a comment statement because
it begins with a * and ends with a semi-colon ;
DATA demo;
INPUT gender $ age marstat $ credits state $ ;
if credits > 12 then fulltime = 'Y'; else fulltime = 'N';
if state = 'MN' then resid = 'Y'; else resid = 'N';
DATALINES;
F 23 S 15 MN
F 21 S 15 WI
F 22 S 09 MN
F 35 M 02 MN
F 22 M 13 MN
F 25 S 13 WI
M 20 S 13 MN
M 26 M 15 WI
M 27 S 05 MN
M 23 S 14 IA
M 21 S 14 MN
M 29 M 15 MN
;
RUN;
TITLE 'Running the Example Program';
PROC PRINT DATA=DEMO ;
VAR gender age marstat credits fulltime state ;
RUN;
Rules for SAS Statements and Variables
• SAS statements end with a semicolon (;)
• SAS statements can be entered in lower or
uppercase
• Multiple SAS statements can appear on one line
• A SAS statement can use multiple lines
• Variable names can be from 1-32 characters and
begin with A-Z or an underscore (_)
1 DATA demo; Create a SAS dataset called demo
2 INPUT gender $
What are the variables
age
marstat $
credits
state $ ;
3 if credits > 12 then fulltime = 'Y';
else fulltime = 'N';
4 if state = 'MN' then resid = 'Y';
else resid = 'N';
Statements 3 and 4 create 2 new variables
5
F
F
F
F
F
F
M
M
M
M
M
M
;
DATALINES;
23 S 15 MN
21 S 15 WI
22 S 09 MN
35 M 02 MN
22 M 13 MN
25 S 13 WI
20 S 13 MN
26 M 15 WI
27 S 05 MN
23 S 14 IA
21 S 14 MN
29 M 15 MN
Tells SAS the data is coming
Tells SAS the data is ending
6 RUN; Tells SAS to run the statements
Types of Data
• Numeric (e.g. age, blood pressure)
• Character (patient name, ID, diagnosis)
Each type treated differently by SAS
TITLE 'Running the Example Program';
PROC PRINT DATA=demo ;
VAR gender age marstat credits fulltime
state ;
RUN;
* You can run additional procedures;
PROC MEANS DATA=demo ;
VAR age credits ;
RUN;
PROC FREQ DATA=demo ;
TABLES gender ;
RUN;
Files Generated When SAS Program is
Submitted
• Log file – a text file listing program statements
processed and giving notes, warnings and errors.
(in UNIX the file will be named fname.log)
Always look at the log file !
Tells how SAS understood your program
• Output file – a text file giving the output generated
from the PROCs
(in UNIX the file will be named fname.lst)
Messages in SAS Log
• Notes – messages that may or may not be
important
• Warnings – messages that are usually
important
• Errors – fatal in that program will abort
(notes and warnings will not abort your
program)
LOG FILE
NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0)
Licensed to UNIVERSITY OF MINNESOTA, Site 0009012001.
NOTE: This session is executing on the WIN_NT platform.
NOTE: SAS initialization used:
real time
7.51 seconds
cpu time
0.89 seconds
1
2
3
4
5
6
7
8
9
10
11
* This is a short example program to demonstrate what a
SAS program looks like. This is a comment statement because
it begins with a * and ends with a semi-colon ;
DATA demo;
INFILE DATALINES;
INPUT gender $ age marstat $ credits state $ ;
if credits > 12 then fulltime = 'Y'; else fulltime = 'N';
if state = 'MN' then resid = 'Y'; else resid = 'N';
DATALINES;
NOTE: The data set WORK.DEMO has 12 observations and 7 variables.
NOTE: DATA statement used:
real time
0.38 seconds
cpu time
0.06 seconds
25
26
27
28
29
RUN;
TITLE 'Running the Example Program';
PROC PRINT DATA=demo ;
VAR gender age marstat credits fulltime state ;
RUN;
NOTE: There were 12 observations read from the data set WORK.DEMO.
NOTE: PROCEDURE PRINT used:
real time
0.19 seconds
cpu time
0.02 seconds
30
31
32
PROC MEANS DATA=demo N SUM MEAN;
VAR age credits ;
RUN;
NOTE: There were 12 observations read from the data set WORK.DEMO.
NOTE: PROCEDURE MEANS used:
real time
0.25 seconds
cpu time
0.03 seconds
33
34
PROC FREQ DATA=demo; TABLES gender;
RUN;
NOTE: There were 12 observations read from the data set WORK.DEMO.
NOTE: PROCEDURE FREQ used:
real time
0.15 seconds
cpu time
0.03 seconds
LST FILE
Running the Example Program
Obs
gender
age
marstat
1
2
3
4
5
6
7
8
9
10
11
12
F
F
F
F
F
F
M
M
M
M
M
M
23
21
22
35
22
25
20
26
27
23
21
29
S
S
S
M
M
S
S
M
S
S
S
M
credits
fulltime
15
15
9
2
13
13
13
15
5
14
14
15
Y
Y
N
N
Y
Y
Y
Y
N
Y
Y
Y
state
MN
WI
MN
MN
MN
WI
MN
WI
MN
IA
MN
MN
The MEANS Procedure
Variable
N
Sum
Mean
---------------------------------------------age
12
294.0000000
24.5000000
credits
12
143.0000000
11.9166667
----------------------------------------------The FREQ Procedure
Cumulative
Cumulative
gender
Frequency
Percent
Frequency
Percent
----------------------------------------------------------F
6
50.00
6
50.00
M
6
50.00
12
100.0