PROC MIXED - Research Triangle SAS Users Group

Download Report

Transcript PROC MIXED - Research Triangle SAS Users Group

PROC MIXED
Underlying Ideas
with Examples
D. A. Dickey
NCSU
Random or Fixed ?
RANDOM
FIXED
Levels:
Selected at random
Finite number of
from infinite population possibilities
Another Experiment
Different selections
from same population
Same Levels
Goal
Estimate variance
components
Compare means
Inference
All levels in population Only levels used in the
experiment.
Poll:
5 Clinics
4 Doctors/Clinic
3 Patients/Doctor
3 Drugs
(1 per patient)
Yijk = m + tk + Ci + Dij + Pijk
F
F
F
F
R
R
R
R
Twins: One gets SAS training method 1, the other gets method 2
Response Y = programming times
PROC MIXED Model
Model
 38.25 
 33.00 


 28.75 


30.00


 46.50 


55.25


Y
1
1

1

1
1

1
=
;
Random
0 1
1 0 
m 
0 1  
  M1 
1 0
 M 2 

1 0

0 1
Xb
1 0 0
Variance of g is G = 
 2
0
1
0

 F
0 0 1


1
1

0

0
0

0

;
Repeated ;
0 0
 e1 
e 
0 0 
 2
 F1 
 e3 
1 0  
F

 
 2
1 0  
e4 
 F3 
 e5 
0 1
 

0 1
 e6 
Zg
,Variance of e is R =
+
e
1

0
0

0
0

0
0 0 0 0 0

1 0 0 0 0
0 1 0 0 0 2

0 0 1 0 0
0 0 0 1 0

0 0 0 0 1 
PROC MIXED DATA=TWINS;
CLASS FAMILY METHOD;
MODEL TIME = METHOD;
RANDOM FAMILY;
Covariance Parameter Estimates
Cov Parm
family
Residual
Estimate
21.2184
40.8338
Type 3 Tests of Fixed Effects
Effect
method
Num
DF
1
Den
DF F Value
19
9.60
Pr > F
0.0059
Intraclass correlation (related to heritability)
2F /(2F + 2)
Estimated as 21.2/62 or about 1/3.
Q: Why not usual (Pearson) correlation?
BLUP
Di = Family mean – m  Fi + ei.
best estimate of Fi = ?
Variance of (Fi – b Di) is (1-b)22F + b2 2/2
Use b = 2F /(2F + 2/2)
Estimate: b = 21.2/(21.2 + 40.8/2) = 0.510
Overall mean + 0.510(Family i mean – Overall mean)
PROC GLM DATA=TWINS;
CLASS FAMILY METHOD;
MODEL TIME = FAMILY METHOD;
LSMEANS FAMILY;
PROC MIXED DATA=TWINS;
CLASS FAMILY METHOD;
MODEL TIME = METHOD;
RANDOM FAMILY;
ESTIMATE "1 " intercept 1 | family 1;
ESTIMATE "2 " intercept 1 | family 0 1;
MEANS
(GLM)
and
BLUPs
(MIXED)
REML Estimation
Y: {10, 11, 12, and 15 } mean 12, deviations -2, -1, 0, 3
SSq = 4+1+0+9 = 14
Estimate 2
MLE: 14/4 = 3.5
Unbiased: 14/3 = 4.67
( MLE would be unbiased IF we knew and used m )
 0 

 Z1 
1 1 1 1  Y1 
 

 

 
Z
Y
1

1

1
1
0
1
 2  
  2  ~ N    , I 2 
 0 

 Z 3  2 1 1 1 1  Y3 
  

 

 
Z
Y
1
1
1
1
2
m

 4 
 4
 

Use MLE of {Z1, Z2, Z3 } = {-3,1,-2}  SSq = 14, “n”=3, MLE is 14/3
MLE of {Z1, Z2, Z3 } is REML ! Known mean 0.
MLE
(too narrow)
REML
Just Right!
off center
too wide
Y: {10, 11, 12, and 15 }
REML vs. ML
2
m
Tests for Variance Components
(1) Wald test: very approximate - not recommended 
(2) Likelihood ratio test – next best method 
-2 REML log likelihood for full, reduced model
Difference ~ Chi-Square (df = omitted parameter count)
(Self & Liang: divide p-value by 2 for single variance
component test)
(3) When available use GLM F test
Same as method=type3 in MIXED 
Available for simple variance component structures.
A small Monte Carlo: RCB with 10 blocks, 9 trts.
Y = p-value from Wald test (division by 2)
X = p-value from PROC GLM (exact F test  )
Error variance 1
Block variance
1/8, 1/4, 1/2, 1, 2, 4
1000 reps
A small Monte Carlo: RCB with 10 blocks, 9 trts.
Y = p-value from Wald test (division by 2)
X = p-value from PROC GLM (exact F test  )
“Jittered p-values for F
Error variance 1
Block variance
0, 1/8, 1/4, 1/2, 1, 2, 4
1000 reps
Unbalanced Data
SUBJ 
Ear plug
A
I
25 (L) 19 (L)
C
22 (R)
D
E
29 (R)
8 (R) 7 (L)
II
III
B
7 (R)
F
G
16 (R) 25 (L)
23 (L) 16 (R)
14 (L) 12 (L)
I vs. III free of subject effects for red data.
Misses info in other data.
24 (R)
proc glm; class plug worker;
model loss = worker plug; Random Worker;
Estimate "I vs III - GLM" Plug -1 0 1; run;
proc mixed; class plug worker;
model Loss=Plug; Random Worker;
Estimate "I vs III - Mixed" Plug -1 0 1; run;
Covariance Parameter
Estimates
Cov Parm
Estimate
worker
Residual
37.578
6.1674
GLM
Source
worker
plug
DF
6
2
Parameter
I vs III - GLM
Type III SS
451.9062500
62.6562500
Estimate
-4.8125
F Value Pr > F
12.21 0.0074
5.08 0.0625
Standard
Error t Value Pr > |t|
1.9635
-2.45
0.0579
Type 3 Tests of Fixed Effects
Num
Den
Effect
DF
DF
F Value
Pr > F
plug
2
5
5.79
0.0499
Estimates
Standard
Label
Estimate
Error
DF
t Value Pr > |t|
I vs III - Mixed
-5.2448
1.9347
5
-2.71
0.0422
SPLIT PLOT
six dishes / aquarium
one plant / dish
soil x variety combinations
4 Aquariums,
2 aerated
2 not
ANOVA
Soil
1
1
2
2
3
3
Variety
1
2
1
2
1
2
Source
Air
Error A
V
S
VA
VS
AS
AVS
Error B
Variety j (j=1..b)
Aeration i (i=1..a)
Soil k (k=1…c)
Yijkl  m  Ai  dil
V j  Sk
 eijkl
+ interactions
Soil 1 aerated
Y11  m A1  d1  V  S1  ... e11
Error terms:
Y21  m  A2  d2  V  S1 ...e21
Soil 1 not aerated
Var{d1  d2  e11
2 2 d 2 2
 e21 }  r  br
Var{d1  d2  e11  e21 }  2 b 2 d   2 
br
b=2 varieties
Expected mean squares:
error A: tank(air)
error B
 2 c[b d ]

2
2
r=2 reps
MS(B) + [MS(A)-MS(B)]/c = (1/3)[MS(A)+2 MS(B)]
(1/3)[20.83333 + 2(7.7333)] = 12.10
standard error:
2(12.10) / 4  2.45968
(c=3)
(1/3)[20.83333 + 2(7.7333)] = 12.10
MS(B) + [MS(A)-MS(B)]/c = (1/3)[MS(A)+2 MS(B)]
df=?
(c=3)
Satterthwaite !
1
2
MS ( A) MS ( B )  12.10
3
3
1
2
2
1
2
2
2
[ MS ( A)] [ MS ( B )]2
[
20
.
833
]
[
7
.
733
]
3
3
 3
 3

 26.77
df ( A)
df ( B )
2
10
df  12.12 /26.775.4692
Automatic in MIXED!  (ddfm=satterthwaite)
OK regardless of summing to 0 assumptions!
PROC MIXED;
CLASS VAR AQUARIUM SOIL AIR;
MODEL YIELD = AIR SOIL VAR SOIL*VAR
AIR*SOIL AIR*VAR AIR*SOIL*VAR /
DDFM=SATTERTHWAITE;
RANDOM AQUARIUM(AIR);
ESTIMATE "SOIL 1: AIR EFFECT"
AIR -1 1 AIR*SOIL -1 1 0 0 0 0;
RUN;
Covariance Parameter Estimates
Cov Parm
Estimate
AQUARIUM(AIR)
2.1833
Residual
7.7333
Type 3 Tests of Fixed Effects
Num
Den
DF
DF
F Value
Pr > F
AIR
1
2
16.20
0.0565
SOIL
2
10
7.87
0.0088
VAR
1
10
24.91
0.0005
VAR*SOIL
2
10
0.04
0.9631
SOIL*AIR
2
10
1.08
0.3752
VAR*AIR
1
10
4.22
0.0669
VAR*SOIL*AIR
2
10
0.23
0.7973
Effect
Standard
Label
SOIL 1: AIR EFFECT


Estimate
Error
DF
t Value
Pr > |t|
5.2500
2.4597
5.47
2.13
0.0812
Random Coefficient Models
the basic idea
mistakes
Dave
Average programmer
a0 + b0 t
Joy
Program writing time
Line for individual j: (a0 + aj) + ( b0 + bj )t
  0    A2  AB  
aj 
  ~ N   , 

2 
b 


 j
  0    AB  B  
Random Coefficient Models
Collars measure activity levels, several bears.
Random coefficient model:
Average daily activity pattern all bears
Individual patterns, one per bear
Assume sinusoidal function
Smooth across midnights, periodic
Fun with trigonometry:
a sin(wt + d) =
A sin(wt) + B cos(wt)
where
A = a cos(d)
B = a sin(d)
Moral: put in both to calibrate
amplitude and phase to data.
Random Coefficient Models
When encountering active bears, remember:
You don’t have to outrun the bear,
you just have to outrun your slowest
friend.
Put in fundmental pair (w = 2p/24)
and harmonics (wj = 2pj/24), j =2,3,…
coefficients Aj, Bj
Assume fixed A0’s and B0’s for the average bear.
(model statement)
Assume random (Ak,Bk) ~ N(0, I2 Trig2) for
individual bear k’s random adjustments.
(random statement) ……
ods output solutionR=BLUPbear;
proc mixed data=bears covtest;
class bear;
model Y = s1--c4/solution outpm=means outp=pbear;
random bear;
random s1*bear c1*bear s2*bear c2*bear s3*bear
c3*bear s4*bear c4*bear
/type=toep(1) solution; run;
quit;
Note: s1, c1 are fundamental sine, cosine, others are harmonics.
Toeplitz:
(toep)
a b c d


b
a
b
c


c b a b


d c b a


a

0
toep(1): 
0

0

0
a
0
0
0
0
a
0
0

0
0

a 
Fixed Diurnal Pattern for
Average Bear
Y = probability of activity
X = time of day
Adding in Random Bear Effects
All Individual Bears
Recall: BLUPs were output to a dataset
proc print data=blupbear;
where Probt<0.001; run;
Obs
Effect
bear
Estimate
StdErr
Pred
1
22
24
40
42
60
65
91
94
101
bear
s2*bear
s3*bear
s2*bear
s3*bear
s3*bear
s1*bear
bear
s2*bear
s1*bear
192
396
396
414
414
432
437
465
465
491
-0.1189
0.08687
-0.1047
-0.1601
0.1308
-0.1094
-0.09941
-0.1702
0.1461
0.09137
0.02872
0.02339
0.02339
0.03248
0.03248
0.03247
0.02134
0.03234
0.03246
0.02340
tValue
Probt
-4.14
3.71
-4.47
-4.93
4.03
-3.37
-4.66
-5.26
4.50
3.91
<.0001
0.0002
<.0001
<.0001
<.0001
0.0008
<.0001
<.0001
<.0001
<.0001
The End
(I bearly finished on time)
Appendix: Random & Repeated?
(4 visits per patient)
Model Y=drug ; Random patient ;
Repeated visit / type=AR(1);
Y =
1

1
1
Y  Xb  
1
0



0
0
0
0
1

 e11 
 
0

 e12 
0
 P1   e13 

0    
 P2    e14 
0    
P3
e
0    21 
 e22 

 

  
Xb
 Zg +
e
1

1
1
  P1 

  
2
2

Var Z  P2   ZI 3 x 3 Z  P   P  1
0
  P 
  3 

0


 1


AR(1) :  2

 3


1

2
1 1 1 0
0
1 1 1 0
1 1 1 0
0
0
1 1 1 0
0 0 0 1
0
1
0 0 0 1
1

2

1

3 

2
  2
 Visit


1 











PROC MIXED Model
Model
 38.25 
 33.00 


 28.75 


30.00


 46.50 


55.25


Y
1
1

1

1
1

1
=
;
Random
0 1
1 0 
m 
0 1  
  M1 
1 0
 M 2 

1 0

0 1
Xb
1 0 0
Variance of g is G = 
 2
0
1
0

 F
0 0 1


1
1

0

0
0

0

;
Repeated ;
0 0
 e1 
e 
0 0 
 2
 F1 
 e3 
1 0  
F

 
 2
1 0  
e4 
 F3 
 e5 
0 1
 

0 1
 e6 
Zg
,Variance of e is R =
+
e
1

0
0

0
0

0
0 0 0 0 0

1 0 0 0 0
0 1 0 0 0 2

0 0 1 0 0
0 0 0 1 0

0 0 0 0 1 
V12x12 = Var(Y) = Var( Zg + e )
Block Diagonal, three 4x4 blocks
1

2 1
One block of V :  P 
1

1

1 1 1  1
 
1 1 1  
 2

1 1 1



1 1 1   3

1

2
This is NOT just another AR(1)
CAN use both random and repeated statements!
2

1

3 

2
  2
 Visit


1 
Repeated /Type = CS;
*(compound symmetry);
V12x12 = Var(Y) = Var( Zg + e )
Block Diagonal, three 4x4 blocks
A

B
Com poundSym m etryim plies 
B

B

1

1
One block of V :  P2 
1

1

1 1 1  A B
 
1 1 1  B A

1 1 1  B B
 
1 1 1  B B
B
A
B
B
B
B
A
B
B B   A   P2
 
B B   B   P2

A B   B   P2
 
B A   B   P2
B

B
B

A 
B   P2
B   P2
A   P2
B   P2
B   P2
A   P2
B   P2
B   P2
B   P2 

B   P2 
B   P2 
A   P2 
This is just another CS
Model Y=drug ; Random patient ;
Repeated visit / type=CS;
2

Can’t converge – can’t separate
P from A, B
Repeated /Type = CS;
*(compound symmetry);
V12x12 = Var(Y) = Var( Zg + e )
Block Diagonal, three 4x4 blocks
18 5 5 5   A   P2

 
2
 5 18 5 5   B   P
V: 

2

5 5 `8 5
B


P

 
 5 5 5 18   B   2

 
P
B   P2
B   P2
A   P2
B   P2
B   P2
A   P2
B   P2
B   P2
A = 3, B = 16, P22
A = 1, B = 14, P24
A = 5, B = 18, P20
No unique solution!
B   P2 

2
B  P 
B   P2 
A   P2 
18

14
12

 11

14 12 11 
1


18 14 12 
2 1
 P 

14 18 14
1



1
12 14 18 

1 1 1  1
 
1 1 1  
 2

1 1 1



1 1 1   3

1
2


2
Fun with algebra:
P2Visit218,
P2 Visit2  14,
P2 2Visit2  12
(1) Visit2  18144
(1) Visit2 14122, 2/4  0.5
0.5 Visit2 4, Visit2 8,
P210,
P2Visit218
Visit210
Visit28, =0.5 (no choice – unique solution)
1

3 

2
  2
 Visit


1 