The GzLM and SAS Or why it’s a necessary evil to

Download Report

Transcript The GzLM and SAS Or why it’s a necessary evil to

The GzLM and SAS
Or why it’s a necessary evil to
learn code!
Keith Lewis
Department of Biology
Memorial University, St. John’s, Canada
4-Oct-07
GzLM PresentationBIOL 7932
1
Variables, Links, and Models
(Introduction to Categorical Data Analysis, A. Gresti 1996)
R.V
4-Oct-07
E.V.
Error
Link
GzLM PresentationBIOL 7932
Model
2
Variables, Links, and Models
(Introduction to Categorical Data Analysis, A. Gresti 1996)
R.V
Ratio
4-Oct-07
E.V.
Ratio
Error
Normal
Link
Model
Identity Linear Reg.
GzLM PresentationBIOL 7932
3
Variables, Links, and Models
(Introduction to Categorical Data Analysis, A. Gresti 1996)
R.V
Ratio
Ratio
4-Oct-07
E.V.
Error
Ratio
Normal
Categorical Normal
Link
Model
Identity Linear Reg.
Identity ANOVA
GzLM PresentationBIOL 7932
4
Variables, Links, and Models
(Introduction to Categorical Data Analysis, A. Gresti 1996)
R.V
Ratio
Ratio
Ratio
4-Oct-07
E.V.
Ratio
Categorical
Mixed
Error
Normal
Normal
Normal
Link
Identity
Identity
Identity
GzLM PresentationBIOL 7932
Model
Linear Reg.
ANOVA
ANCOVA
5
Variables, Links, and Models
(Introduction to Categorical Data Analysis, A. Gresti 1996)
R.V
Ratio
Ratio
Ratio
Poisson
4-Oct-07
E.V.
Ratio
Categorical
Mixed
Mixed
Error
Normal
Normal
Normal
Poisson
Link
Identity
Identity
Identity
Log (ln)
GzLM PresentationBIOL 7932
Model
Linear Reg.
ANOVA
ANCOVA
Log-linear
6
Variables, Links, and Models
(Introduction to Categorical Data Analysis, A. Gresti 1996)
R.V
Ratio
Ratio
Ratio
Poisson
Poisson
4-Oct-07
E.V.
Ratio
Categorical
Mixed
Mixed
Ratio
Error
Normal
Normal
Normal
Poisson
Poisson
Link
Identity
Identity
Identity
Log (ln)
Identity
GzLM PresentationBIOL 7932
Model
Linear Reg.
ANOVA
ANCOVA
Log-linear
Poisson Reg.
7
Variables, Links, and Models
(Introduction to Categorical Data Analysis, A. Gresti 1996)
R.V
Ratio
Ratio
Ratio
Poisson
Poisson
Binomial
4-Oct-07
E.V.
Ratio
Categorical
Mixed
Mixed
Ratio
Mixed
Error
Normal
Normal
Normal
Poisson
Poisson
Binomial
Link
Identity
Identity
Identity
Log (ln)
Identity
logit
GzLM PresentationBIOL 7932
Model
Linear Reg.
ANOVA
ANCOVA
Log-linear
Poisson Reg.
Logistic Reg.
8
4-Oct-07
GzLM PresentationBIOL 7932
9
4-Oct-07
GzLM PresentationBIOL 7932
10
4-Oct-07
GzLM PresentationBIOL 7932
11
SAS Proc’s: the basics
• Data [dataset];
• Infile [filename];
• input [variables];
• proc [glm (or genmod)];
• model [model];
• run;
4-Oct-07
GzLM PresentationBIOL 7932
12
SAS PROC GLM – Lin. Reg.
• Data nest97;
• infile ‘e:\testdata\97exp1.prn’;
• input lake treat type pred n;
• proc glm;
• model pred = lake treat type;
• run;
4-Oct-07
GzLM PresentationBIOL 7932
13
SAS PROC GLM - ANOVA
• Data nest97;
• infile ‘e:\testdata\97exp1.prn’;
• input lake treat type pred n;
•
•
•
•
proc glm;
class lake treat type;
model pred = lake treat type;
run;
4-Oct-07
GzLM PresentationBIOL 7932
14
SAS PROC GLM - ANOVA
• Data nest97;
• infile ‘e:\testdata\97exp1.prn’;
• input lake $ treat $ type $ pred n;
•
•
•
•
proc glm;
class lake treat type;
model pred = lake treat type;
run;
4-Oct-07
GzLM PresentationBIOL 7932
15
SAS PROC GLM - ANCOVA
• Data nest97;
• infile ‘e:\testdata\97exp1.prn’;
• input lake treat type pred n;
•
•
•
•
proc glm;
class treat type;
model pred = lake treat type;
run;
4-Oct-07
GzLM PresentationBIOL 7932
16
SAS PROC GENMOD – Log-Linear
• Data nest97;
• infile ‘e:\testdata\97exp1.prn’;
• input lake treat type pred n;
• proc genmod;
• class lake treat type;
• model pred = lake treat type / dist=poisson
link=log type1 type3;
• run;
4-Oct-07
GzLM PresentationBIOL 7932
17
SAS PROC GENMOD – Logistic
Regression
• Data nest97;
• infile ‘e:\testdata\97exp1.prn’;
• input lake treat type pred n;
• proc genmod;
• class lake treat type;
• model pred/n = lake treat type / dist=binomial
link=logit type1 type3;
• run;
4-Oct-07
GzLM PresentationBIOL 7932
18
A full example
data an_01;
infile 'C:\Documents and Settings\Micro-Tech Customer\My Documents\MyWork\thesis\
SAS\ch4\An_2000a.csv' firstobs=2 delimiter = ',';
input park $ site $ grid $ nest $ dp vt;
proc genmod;
class park site grid nest;
model dp = park|grid|nest / dist=bin link=logit type1 type3;
/*make obstats out=keith noprint;*/
title 'Schmidts model, 2000 with contrasts';
lsmeans park grid nest;
contrast 'bird v control' nest 1 -1 0;
contrast 'contrl v large' nest 0 1 -1;
estimate 'contrl v large' nest 0 1 -1;
estimate 'bird v control' nest 1 1 0;
estimate 'bF v bS' park 1 -1;
estimate 'con v food' grid 1 -1;
run;
4-Oct-07
GzLM PresentationBIOL 7932
19
Deviance and G-tests
• GzLMs based on Maximum Likelihood
Estimates (MLE)
• D= -2ln[likelihood of (current model) /
(saturated model)]
• G=D(for model w/ variable)-D(model w/o
variable)
• G is analagous to F-tests for GLM
4-Oct-07
GzLM PresentationBIOL 7932
20
GENMOD output
LR Statistics For Type 1 Analysis
Source
Deviance
DF
ChiSquare
Intercept
park
grid
park*grid
nest
park*nest
grid*nest
park*grid*nest
321.4338
319.7385
314.1447
313.5346
310.1887
310.1033
306.9164
306.3648
1
1
1
2
2
2
2
1.70
5.59
0.61
3.35
0.09
3.19
0.55
Pr > ChiSq
0.1929
0.0180
0.4348
0.1877
0.9582
0.2032
0.7590
321.4338-319.7385 = 1.70, Chisquare = 1.70, df = 1 p = 0.1929
4-Oct-07
GzLM PresentationBIOL 7932
21
GENMOD output
LR Statistics For Type 3 Analysis
4-Oct-07
Source
DF
ChiSquare
park
grid
park*grid
nest
park*nest
grid*nest
park*grid*nest
1
1
1
2
2
2
2
2.62
7.45
0.81
3.45
0.13
3.37
0.55
GzLM PresentationBIOL 7932
Pr > ChiSq
0.1052
0.0064
0.3672
0.1783
0.9391
0.1853
0.7590
22
Why we use GzLM
Same Data, Same Distribution
DATA
PROC
Source P-value
Glm
Sp
.1942
limpet (normal error) Se
.0004
(identity link) Sp*Se .2966
From Sokal and Rohlf 1995, Box 11.2
4-Oct-07
GzLM PresentationBIOL 7932
23
Why we use GzLM
Same Data, Same Distribution
DATA
PROC
Source P-value
Glm
Sp
.1942
limpet (normal error) Se
.0004
(identity link) Sp*Se .2966
Genmod
Sp
.1627
limpet Dist = normal
Se
.0001
Link=identity Sp*Se .2493
From Sokal and Rohlf 1995, Box 11.2
4-Oct-07
GzLM PresentationBIOL 7932
24
Why we use GzLMSame Data,
Different Distribution
Data
Proc
Source
P-value
Anest97
Glm
(normal errors)
(identity link)
Lake
Treat
Type
TT*TY
.0505
.3632
.4915
.8619
(K.Lewis, M.Sc data)
4-Oct-07
GzLM PresentationBIOL 7932
25
Why we use GzLMSame Data,
Different Distribution
Data
Proc
Source
P-value
Anest97
Glm
(normal errors)
(identity link)
Anest97
Genmod
Dist=binom.
link=logit
Lake
Treat
Type
TT*TY
Lake
Treat
Type
TT*TY
.0505
.3632
.4915
.8619
.0001
.1229
.2435
.8098
4-Oct-07
GzLM PresentationBIOL 7932
(K.Lewis, M.Sc data)
See Lewis 2005, Oikos
26
SAS v. R
• SAS
–
–
–
–
Powerful
Widely used
Learning curve
Expensive
• R
–
–
–
–
Powerful
“limited” use
Learning curve
Free
• Resources
– Peter Earle
– The web!!!!
4-Oct-07
GzLM PresentationBIOL 7932
27
References
• Criteria:
– Readability
– Examples with the software code!
• A. Agresti. 1996. Introduction to
Categorical Data Analysis. Wiley & Sons,
New York.
• Littel et al. 2002. SAS for Linear Models
4th ed. Cary, NC: SAS Institute Inc.
4-Oct-07
GzLM PresentationBIOL 7932
28