Generalized Linear Mixed Models

Download Report

Transcript Generalized Linear Mixed Models

Data: Challenger
(Binomial with random effects)
Data: Crab mating patterns
(Poisson Regression,
ZIP model, Negative Binomial)
Flu Data:
(Binomial with random effects)
Data: Typists
(Poisson with random effects)
D.A.D.
Mixed (not generalized) Models:
Fixed Effects
and
Random Effects
D.A.D.
SAS Global Forum 2010
“Generalized”  non normal distribution
Binary for probabilities: Y=0 or 1
Mean p
Pr{Y=j}= pj(1-p)(1-j)
Link: L=ln(p/(1-p)) = “Logit”
Range (over all L): 0<p<1
Poisson for counts: Y in {0,1,2,3,4, ….}
Mean count l
Pr{Y=j} = exp(- l )(lj)/(j!)
Link: L = log(l)
Range (over all L): l >0
D.A.D.
SAS Global Forum 2010
Generalized (not mixed) linear models.
Use link L = g(E{Y}), e.g. ln(p/(1-p)) = ln(E{Y}/(1-E{Y})
Assume L is linear model in the inputs with fixed effects.
Estimate model for L, e.g. L=g(E{Y})=bo + b1 X
Use maximum likelihood
Example:
L = -1 + .18*dose
Dose = 10, L=0.8,
p=exp(0.8)/(1+exp(0.8))=
inverse link = 0.86
D.A.D.
SAS Global Forum 2010
Challenger was mission 24
From 23 previous launches we have:
6 O-rings per mission
Y=0 no damage, Y=1 erosion or blowby
p = Pr {Y=1} = f{mission, launch temperature)
Features: Random mission effects
Logistic link for p
proc glimmix data=O_ring;
class mission;
model fail = temp/dist=binomial s;
random mission;
run;
Generalized
Mixed
Demo
O_rings.sas
D.A.D.
SAS Global Forum 2010
Estimated G matrix is not positive definite.
Covariance Parameter Estimates
Cov
Parm
mission
Estimate
2.25E-18
Standard
Error
.
Solutions for Fixed Effects
Effect
Intercept
temp
Estimate
Error
DF
t Value
Pr > |t|
5.0850
-0.1156
3.0525
0.04702
21
115
1.67
-2.46
0.1106
0.0154
Just logistic regression – no mission variance component
D.A.D.
SAS Global Forum 2010
Flu Data CDC
Active Flu Virus
Weekly Data % positive
data FLU;
input fluseasn year t week pos specimens;
pct_pos=100*pos/specimens; logit=log(pct_pos/100/(1+(pct_pos/100)));
label pos = "# positive specimens";
label pct_pos="% positive specimens";
label t = "Week into flu season (first = week 40)";
label week = "Actual week of year";
label fluseasn = "Year flu season started";
logit
pct. pos.
Demo
Get_Flu.sas
“Sinusoids”
S(j) = sin(2pjt/52)
(1) GLM all effects fixed
(harmonic main effects
insignificant)
Logit scale
C(j)=cos(2pjt/52)
PROC GLM DATA=FLU;
class fluseasn;
model logit = s1 c1
fluseasn*s1 fluseasn*c1 fluseasn*s2 fluseasn*c2
fluseasn*s3 fluseasn*c3 fluseasn*s4 fluseasn*c4;
output out=out1 p=p;
data out1; set out1; P_hat = exp(p)/(1+exp(p));
label P_hat = "Pr{pos. sample} (est.)"; run;
Demo
Flu_GLM.sas
(2) MIXED analysis
on logits
Random harmonics.
Normality assumed
Logit scale
PROC MIXED DATA=FLU method=ml; ** reduced model;
class fluseasn;
model logit = s1 c1 /outp=outp outpm=outpm ddfm=kr;
random intercept/subject=fluseasn;
random s1 c1/subject=fluseasn type=toep(1);
random s2 c2/subject=fluseasn type=toep(1);
random s3 c3/subject=fluseasn type=toep(1);
random s4 c4/subject=fluseasn type=toep(1); run;
Probability scale
Demo
Flu_MIXED.sas
(3) GLIMMIX
analysis
Random harmonics.
Binomial assumed
(overdispersed –
lab effects?)
PROC GLIMMIX DATA=FLU; title2 "GLIMMIX Analysis";
class fluseasn;
model pos/specimens = s1 c1 ; * s2 c2 s3 c3 s4 c4;
random intercept/subject=fluseasn;
random s1 c1/subject=fluseasn type=toep(1);
random s2 c2/subject=fluseasn; ** Toep(1) - no converge;
random s3 c3/subject=fluseasn type=toep(1);
random s4 c4/subject=fluseasn type=toep(1);
random _residual_;
output out=out2 pred(ilink blup)=pblup
pred(ilink noblup) overallpearson = p_resid; run;
Mean – no BLUPs
Demo
Flu_GLIMMIX.sas
D.A.D.
SAS Global Forum 2010
Flu data
Binomial
random _residual_ does not affect
the fit (just standard errors)
Could try Beta distribution instead:
PROC GLIMMIX DATA=FLU; title2 "GLIMMIX Analysis";
class fluseasn;
model f = s1 c1 /dist=beta link=logit s;
random intercept/subject=fluseasn;
random s1 c1/subject=fluseasn type=toep(1);
random s2 c2/subject=fluseasn type=toep(1);
random s3 c3/subject=fluseasn type=toep(1);
random s4 c4/subject=fluseasn type=toep(1);
output out=out3 pred(ilink blup)=pblup
pred(ilink noblup)=overall
pearson=p_residbeta; run;
D.A.D.
SAS Global Forum 2010
Horseshoe Crab study
(reference: SAS GLIMMIX course notes):
Female nests have “satellite” males
Count data – Poisson?
Generalized Linear
Features (predictors):
Carapace Width, Weight, Color, Spine condition
Random Effect: Site
Mixed Model
Demo
Get_Horseshoe.sas
proc glimmix data=crab;
class site;
model satellites = weight width / dist=poi solution ddfm=kr;
random int / subject=site;
output out=overdisp pearson=pearson; run;
proc means data=overdisp n mean var;
var pearson; run;
proc univariate data=crab normal plot;
var satellites; run;
Fit Statistics
Gener. Chi-Square / DF
2.77
Cov Parm
Intercept
Subject Estimate
site
0.1625
Effect
Intercept
weight
width
Estimate
-1.1019
0.5042
0.0318
Pr > |t|
0.2527
0.0035
0.5229
N
Mean
Variance
--------------------------173 -0.0258264
2.6737114
--------------------------Histogram
# Boxplot
15.5+*
.*
.
12.5+*
.*
.**
9.5+**
.***
.**
6.5+*******
.********
.**********
3.5+**********
.*****
.********
0.5+*******************************
----+----+----+----+----+----+* may represent up to 2 counts
1
1
0
0
1
1
3
3
6
4
13
15
19
19
9
16
62
|
|
|
|
|
|
|
+-----+
|
|
|
|
*--+--*
|
|
+-----+
Zero Inflated ?
Demo
Crabs_OVERDISP.sas
D.A.D.
SAS Global Forum 2010
Zero Inflated Poisson (ZIP)
 p0 + (1  p0 )e l l0 / 0! p0 + (1  p0 )e  l for j  0
Pr{Y  j}  
l j
(
1

p
)
e
l / j!
for j  0
0

E{Y }  (1  p0 )l

E{Y (Y  1)}  (1  p0 ) j ( j  1)e l l j / j!

j 1
(1  p0 )l2  e l l j 2 /( j  2)!  (1  p0 )l2
j 2
2
  E{Y }  ( EY ) 2  (1  p0 )(l2 + l )  l2 (1  p0 ) 2 
2
(1  p0 )l + l2 p0 (1  p0 )   (1 + lp0 )  
 2   (1 + lp0 )  
D.A.D.
SAS Global Forum 2010
Zero Inflated Poisson (ZIP)
proc nlmixed data=crab;
parms b0=0 bwidth=0 bweight=0 c0=-2 c1=0 s2u1=1 s2u2=1;
x=c0+c1*width+u1; p0 = exp(x)/(1+exp(x));
eta= b0+bwidth*width +bweight*weight +u2;
lambda=exp(eta);
if satellites=0 then
loglike = log(p0 +(1-p0)*exp(-lambda));
else loglike =
log(1-p0)+satellites*log(lambda)-lambda-lgamma(satellites+1);
expected=(1-p0)*lambda; id p0 expected lambda;
model satellites~general(loglike);
Random U1 U2~N([0,0],[s2u1,0,s2u2]) subject=site;
predict p0+(1-p0)*exp(-lambda) out=out1; run;
D.A.D.
SAS Global Forum 2010
Zero Inflated Poisson (ZIP)
Parameter Estimates
Parameter
b0
bwidth
bweight
c0
c1
s2u1
s2u2
Estimate
2.7897
-0.0944
0.4649
13.3739
-0.5447
0.5114
0.1054
t
2.55
-1.65
2.38
4.42
-4.61
1.12
1.67
Pr>|t|
0.0268
0.1267
0.0366
0.0010
0.0008
0.2852
0.1239
Lower
Upper
0.3853 5.1942
-0.2202 0.0314
0.0347 0.8952
6.7078 20.0401
-0.8049 -0.2844
-0.4905 1.5133
-0.0339 0.2447
Demo
Crabs_ZIP.sas
D.A.D.
SAS Global Forum 2010
From fixed part of model, compute
Pr{count=j} and plot (3D) versus
Weight, Carapace width
D.A.D.
SAS Global Forum 2010
Another possibility: Negative binomial
Number of failures until kth success ( p=Prob{success} )
 k + j  1 k 1
 k + j  1 k
j
 p (1  p) p  
 p (1  p) j
Pr{Y  j}  
 k 1 
 k 1 
k (1  p)
k

, p
p
k+
(k +  )
  

p
k
2

l    :  2  l +  1 l2  l
k
as 1 / k  0
D.A.D.
SAS Global Forum 2010
3 trials before first success
Negative Binomial
Crab
beer
Crab
beer
Satellites
D.A.D.
SAS Global Forum 2010
Negative binomial:
In SAS, k is our 1/k
proc glimmix data=crab; class site;
model satellites = weight width / dist=nb solution ddfm=kr;
random int / subject=site; run;
Fit Statistics
-2 Res Log Pseudo-Likelihood
Generalized Chi-Square
Gener. Chi-Square / DF
539.06
174.83
1.03
Covariance Parameter Estimates
Cov Parm
Intercept
Scale
Effect
Intercept
weight
width
Estimate
-1.2022
0.6759
0.01905
Effect
weight
width
Subject
site
Estimate
0.09527
0.7659
Standard
Error
1.6883
0.3239
0.08943
Num
DF
1
1
Den
DF
156.6
166.2
DF
168.5
156.6
166.2
Std. Error
0.07979
0.1349
t Value
-0.71
2.09
0.21
F Value
4.35
0.05
Pr > F
0.0386
0.8316
Pr > |t|
0.4774
0.0386
0.8316
Demo
Crabs_NEGBIN.sas
Population average model vs. Individual Specific Model
8 typists
Y=Error counts (Poisson distributed)
ln(li)= ln(mean of Poisson) = +Ui for typist i.
conditionally (individual specific)
Distributions for Y, U~N(0,1) and =1
l=e=e1=2.7183
= mean for
“typical”
typist
D.A.D.
SAS Global Forum 2010
Population average model
Expectation ||||| | | of individual distributions averaged
across population of all typists.
Run same simulation for 8000 typists, compute mean
of conditional population means, exp(+U).
The MEANS Procedure
Variable
N
Mean
Std Dev Std Error
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
lambda
8000 4.4280478 6.0083831 0.067175
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Z=(4.428-2.7183)/0.06718 = 25.46 !!
Population mean is not e
Conditional means, +U, are lognormal.
Log(Y)~N(1,1)  E{Y}=exp(+0.52) = e1.5 = 4.4817
Demo
Typists.sas
D.A.D.
SAS Global Forum 2010
Main points:
1. Generalized linear models with random effects are
subject specific models.
2. Subject specific models have fixed effects that
represent an individual with random effects 0
(individual at the random effect distributional means).
3. Subject specific models when averaged over the
subjects do not give the model fixed effects.
4. Models with only fixed effects do give the fixed effect
part of the model when averaged over subjects and
are thus called population average models.