PROC GLIMMIX: AN OVERVIEW

Download Report

Transcript PROC GLIMMIX: AN OVERVIEW

PROC GLIMMIX: AN OVERVIEW
By William E. Jackman
PROC GLIMMIX: AN OVERVIEW
•
•
•
•
•
A new SAS/STAT Product
Experimental in SAS 9.1
Production in SAS 9.2.
%GLIMMIX macro
Combines and extends statistical features found
in other SAS procedures
• Part of a succession of SAS procedures which
have extended the General Linear Model (GLM)
PROC GLIMMIX: AN OVERVIEW
•
•
•
•
•
•
•
•
Regression Analysis Basics
Y = B0 + B1 X1 +B2 X2 ... + Bn Xn + e
y = Xβ + ε (matrix notation)
ε ~ N(0, α2 In)
Estimation by ordinary least squares (OLS).
Essence of the General Linear Model (GLM)
Y's and the X's go by several names
Covariates
PROC GLIMMIX: AN OVERVIEW
• The GLM underlies PROC REG and PROC GLM
• Both procedures use OLS to fit the GLM to
data with continuous response variable
• Same assumptions about residuals
• PROC REG has advantages for continuous
effects (regressors).
• PROC GLM has advantages for discrete effects
(regressors).
PROC GLIMMIX: AN OVERVIEW
• Indicator (dummy) variables and interactions
* PROC REG: must be created in data step
* PROC GLM: use class & model statements
• Which Procedure to use?
* Interested primarily in effect of continuous
variables (covariates)?
* Interested primarily in effect of grouping
variables?
PROC GLIMMIX: AN OVERVIEW
• The generalized linear model (GzLM) extends
(or generalizes) the GLM.
• Presented in 1972; expanded in 1989.
• Non-normal data from exponential family
• Linearity is achieved through the link function.
• Implemented, for example, in PROC GENMOD
• PROC GENMOD can also handle correlated
residuals.
PROC GLIMMIX: AN OVERVIEW
• General form of the GENMOD procedure
• PROC GENMOD options ;
• CLASS variables ;
• MODEL response=effects / dist= link=
options ;
• REPEATED SUBJECT=subjects-effects /
options ;
• RUN ;
PROC GLIMMIX: AN OVERVIEW
Example of the GENMOD procedure
for Poisson regression
proc genmod data=skin ;
class city age ;
model cases=city age / offset=log_pop
dist=poi link=log ;
run ;
where log_pop = log of the population
PROC GLIMMIX: AN OVERVIEW
The generalized linear model (GzLM)
•
•
•
•
Canonical link functions most common.
Obtained from probability density function
Default in PROC GENMOD
For the Poisson distribution the default link
function is the log of the response variable.
• log(μ) = Xβ
• Inverse link functions
• μ = eη
PROC GLIMMIX: AN OVERVIEW
Logistic Regression: A special case of the
generalized linear model (GzLM)
•
•
•
•
•
•
Response variable from binomial distribution
Part of the exponential family so GzLM applies
Link function is the logit.
logit(pi) = ln(pi / (1-pi))
Can be done with PROC GENMOD
Input from David Schlotzhauer of SAS Institute
PROC GLIMMIX: AN OVERVIEW
FURTHER EXTENSIONS OF THE GLM
•
•
•
•
GLM and GzLM cannot handle random effects.
Fixed effects-interest only in levels specified
Random effects-inference to other levels
PROC GENMOD and PROC LOGISTIC cannot
handle random effects.
PROC GLIMMIX: AN OVERVIEW
PROC MIXED: An extension of the GLM
• Can handle random effects and correlated
errors
• fixed effects only model
• y = Xβ + ε
• mixed model
• y = Xβ + Zγ + ε
PROC GLIMMIX: AN OVERVIEW
Mixed models distinguish between G-side
random effects and R-side random effects.
• G-side random effects correspond to
covariates (regressors) in the model which are
random.
• R-side random effects correspond to the
residuals in the model.
PROC GLIMMIX: AN OVERVIEW
Example of PROC MIXED syntax
proc mixed ;
class id time gender ;
model z = gender age gender*age ;
random intercept / subject=id ;
*** G-side effects go here. ;
repeated time /subject=id type=ar(1) ;
*** R-side effects go here. ;
run ;
PROC GLIMMIX: AN OVERVIEW
PROC MIXED: a linear mixed model (LMM)
• PROC MIXED allows for random intercepts for
each subject.
• models the correlation in the repeated measures
within each subject.
• has rich variety of covariance matrices for
dealing with correlated residuals.
• Unlike GzLM’s, LMM’s require a normally
distributed response variable.
PROC GLIMMIX: AN OVERVIEW
• PROC GLIMMIX - PUTTING IT ALL TOGETHER
• A Generalized Linear Mixed Model (GzLMM)
• Combines and extends features of GzLM’s and
LMM’s
• Enables modeling random effects and
correlated errors for non-normal data
PROC GLIMMIX: AN OVERVIEW
The Generalized Linear Mixed Model (GzLMM)
• A linear predictor can contain random effects:
η = Xβ + Z γ
• The random effects are normally distributed
• The conditional mean, μ|γ, relates to the linear predictor through a
link function:
g(μ|γ) = η
• The conditional distribution (given γ) of the data belongs to the
exponential family of distributions.
PROC GLIMMIX: AN OVERVIEW
Other new features of PROC GLIMMIX include:
• low-rank smoothing based on mixed models
• new features for LS-means comparisons and
display.
• SAS programming statements allowed within the
procedure
• Fits models to multivariate data with different
distributions or links
PROC GLIMMIX: AN OVERVIEW
General form of the GLIMMIX procedure:
• PROC GLIMMIX options ;
•
programming statements ;
•
CLASS variables ;
•
MODEL response=fixed-effects / DIST= LINK =
options ;
•
RANDOM random-effects / options ;
•
RANDOM _RESIDUAL_ / options ;
• RUN ;
PROC GLIMMIX: AN OVERVIEW
Like other mixed models, PROC GLIMMIX
distinguishes between G-side random
effects and R-side random effects.
• G-side random effects correspond to
covariates in the model which are random.
• R-side random effects correspond to the
residuals in the model.
PROC GLIMMIX: AN OVERVIEW
Example of a GzLMM using PROC GLIMMIX
for Logistic Regression with Random Effects
• proc glimmix data=example ;
•
class trt clinic ;
•
model y=trt / dist=binomial link=logit ;
•
random clinic trt*clinic ;
•
*** random intercept trt / subject=clinic ;
• run ;
PROC GLIMMIX: AN OVERVIEW
• This example cannot be handled by PROC
LOGISTIC since clinic is a random effect.
• For logistic regression with fixed effect only,
PROC GLIMMIX or PROC LOGISTIC can be
used. Which should you use?
• More input from David Schlotzhauer of the
SAS Institute.
PROC GLIMMIX: AN OVERVIEW
Parameters Estimation Methods in PROC GLIMMIX
• The GLIMMIX procedure has two basic modes of
parameter estimation: GLM-mode and GLMMmode.
• In GLM-mode, the data is never correlated and there
can be no G-side random effect.
• In the GLMM-mode, there might be random effects
and/or correlated data.
PROC GLIMMIX: AN OVERVIEW
Parameter Estimation
for generalized linear models
• Normal distribution: restricted maximum
likelihood
• All other known distributions: maximum
likelihood
• Unknown distributions: quasi-likelihood
PROC GLIMMIX: AN OVERVIEW
Parameter Estimation for generalized linear
models with overdispersion
• Parameters are estimated using maximum
likelihood
• An overdispersion parameter can be
estimated from the Pearson statistic
PROC GLIMMIX: AN OVERVIEW
Parameter Estimation
for generalized linear mixed models
• Pseudo-likelihood
PROC GLIMMIX: AN OVERVIEW
Using PROC GLIMMIX for Linear Mixed Models
• In this example, the response variable is normally-distributed.
• Proc glimmix data= grass ;
•
Class method variety ;
•
Model yield = method / dist=normal ;
•
Random variety method*variety ;
• run ;
• PROC GLIMMIX uses the residual/restricted maximum likelihood as
does PROC MIXED.
PROC GLIMMIX: AN OVERVIEW
• PROC GLIMMIX can do much of what PROC
LOGISTIC, PROC MIXED, PROC REG, and PROC
GLM can do.
• Could be viewed as a “super PROC”
• Input from Jill Tao of the SAS Institute
PROC GLIMMIX: AN OVERVIEW
PROC GLIMMIX versus PROC MIXED
Closely related but important differences
• PROC GLIMMIX is not PROC MIXED with a LINK= and a DIST=
option.
• PROC GLIMMIX models non-normal data. PROC MIXED does not.
• PROC GLIMMIX allows programming statements. PROC MIXED
does not.
• PROC GLIMMIX uses the RANDOM statement to model R-side
random effects. PROC MIXED uses the REPEATED statement to
model R-side random effects.
• PROC GLIMMIX does not support the Kronecker and heterogeneous
covariance structures as supported by PROC MIXED.
PROC GLIMMIX: AN OVERVIEW
•
•
•
•
PROC GLIMMIX versus PROC GENMOD
PROC GLIMMIX
fits unit-specific models with the G-side random effects
fits population-average models without the G-side
effects. (Without the G-side effects, there is no way to
condition the response and make the estimates unitspecific.)
provides sandwich estimators of covariance of fixed
effects through the EMPIRICAL option when the model
is processed by subjects.
computes the parameter estimates by a pseudolikelihood method.
PROC GLIMMIX: AN OVERVIEW
PROC GLIMMIX versus PROC GENMOD
PROC GENMOD
• cannot accommodate random effects
• fits only population-average models
• computes the parameter estimates by a
moment-based method.
PROC GLIMMIX: AN OVERVIEW
Applications Using the GLIMMIX Procedure
(from "Statistical Analysis with the GLIMMIX Procedure")
• Poisson Regression with Random Effects
• An example of Beta Regression
• Repeated Measures Data with Discrete
Response
• Introduction to Radial Smoothing
Applications are explained in detail in the SAS course.
PROC GLIMMIX: AN OVERVIEW
Fitting Models To Multivariate Data In Which
Observations Do Not All Have The Same
Distribution Or Link
• EXAMPLE: JOINT MODELS FOR BINARY AND
POISSON DATA
(from a paper by Oliver Schabenberger of the
SAS Institute)
PROC GLIMMIX: AN OVERVIEW
data joint;
length dist $7;
input d$ patient age OKstatus response @@;
if d = ’B’ then dist=’Binary’;
else dist=’Poisson’;
datalines; (only 3 lines shown)
B 1 78 1 0
P 1 78 1 9
B 2 60 1 0
P 2 60 1 4
B 3 68 1 1
P 3 68 1 7
B 4 62 0 1
P 4 62 0 35
B 5 76 0 0
P 5 76 0 9
B 6 76 1 1
P 6 76 1 7
PROC GLIMMIX: AN OVERVIEW
proc glimmix data=joint;
class patient dist;
model response(event=’1’) =
dist dist*age dist*OKstatus /
noint s dist=byobs(dist);
random int / subject=patient;
run;
PROC GLIMMIX: AN OVERVIEW
• The previous slide showed modeling
correlations through G-side random effects. It
could also be done through R-side random
effects. This is presented in the SAS course
“Statistical Analysis with the GLIMMIX
Procedure” which expands upon this example.