Statistical Analysis of Repeated Measures Data Using SAS

Download Report

Transcript Statistical Analysis of Repeated Measures Data Using SAS

Lecture 3
Empirical Bayes and
Proc Mixed
Ziad Taib
Biostatistics, AZ
April 24, 2009
Name, department
1
Date
Outline of the lecture
1. Reminder
2. Inference for the random effects
3. Proc Mixed
Name, department
2
Date
1. Reminder
Vi
Name, department
3
Date
2. Inference for the Random Effects Empirical Bayes Inference
Name, department
4
Date
f ( yi , bi )
f (bi / yi ) 
f ( yi )
f ( yi / bi ) f (bi )

f ( yi )
 f ( yi / bi ) f (bi )
Name, department
5
Date
Name, department
6
Date
Name, department
7
Date
Comments
 The above EB estimate of the random effect can be obtained using a
set of equations
 It can be shown that using the EB estimate lead to Best Linear
Unbiased Prediction of linear combination of the form:
u   ' β  b ' bi
 When trying to predict the response of an individual, we can use:


Yˆi  X i ˆ  Z i bˆi  ...  iVi 1 X i ˆ  I ni  iVi 1 yi

 


Population average
Observed data
and we see that the observed data are shrunken towards the prior
average profile.
Name, department
8
Date
3. Statistical software
Name, department
9
Date
Software (cont’d)
 SAS – SPSS – BMDP/5v – ML3 – HLM – Splus – R can handle
correlated data but some are more restricted than others.
 Most packages offer a choice between ML and REML and optimisation
is often based on Newton-Raphson, the EM algorithm or Fisher
scoring.
 The user has to specify a model for the mean response that is linear in
the fixed effects and to specify a covariance structure. The user can
select a full parameterisation of the covariance structure (unstructured)
or choose among given covariance structures.
 The covariance structure is also influenced by the inclusion of random
effects and their covariance structure.
Name, department
10 Date
Software (cont’d)
 Output often includes:





history of optimisation iterations
estimates of fixed effects
covariance parameters with standard errors
estimates of user specified contrasts
Graphics is often limited but can be done in another software
Name, department
11 Date
SAS PROC MIXED and Repeated
Measures
 PROC MIXED of SAS offers greater flexibility for the modelling of
repeated measures data than PROC GLM. (Firstly, the procedure
provides a mechanism for modelling the covariance structure
associated with the repeated measures. Secondly, it can handle some
forms of missing data without discarding an entire subject’s-worth of
data. Thirdly, it has some capability to handle the situation when each
subject may be measured at different times and time intervals.)
 In PROC GLM, repeated measures are handled in a multivariate
framework and it requires a multivariate view of the data. PROC
MIXED, on the other hand, requires a univariate or stacked-data view
of the data. In other words, there is only a single response variable.
The repeated information, including all of the information about the
subjects, is contained in other variables. Proc GLM assumes that the
covariance matrix meets a sphericity assumption compound symmetry.
Name, department
12 Date
SAS PROC MIXED
 Proc mixed was designed to handle mixed models. It has a large






choice of covariance structures (unstructured, random effects,
autoregressive, Diggle etc)
PROC MIXED can be used not only to estimate the fixed parameters,
but also the covariance parameters.
By default, PROC MIXED estimates the covariance parameters using
the method of restricted maximum likelihood (REML).
PROC MIXED provides empirical Bayes estimates.
Separate analyses for separate groups can be run using the BY
statement.
Approximate F tests for class variables are obtained using Wald’s test.
All components of the output can be saved as a SAS data set for
further manipulation using other internal (SAS) or external procedures.
Name, department
13 Date
PROC MIXED: Syntax
Proc
PROC MIXED < options > ;
BY variables ;
CLASS variables ;
ID variables ;
MODEL dependent = < fixed-effects > < / options > ;
RANDOM random-effects < / options > ;
REPEATED < repeated-effect > < / options > ;
PARMS (value-list) ... < / options > ;
CONTRAST 'label' < fixed-effect values ... >
< | random-effect values ... > , ... < / options > ;
ESTIMATE 'label' < fixed-effect values ... >
< | random-effect values ... >< / options > ;
LSMEANS fixed-effects < / options > ;
MAKE 'table' OUT=SAS-data-set ;
Name, department
14 Date
Data structure of Proc Mixed
 Consider the example where arm strength is measured on 8 patients at
3 different times and where patients have been randomized to one of 2
treatment groups. The multivariate view associated with e.g. PROC
GLM code: would look like below
Name, department
15 Date
 For analysis of this data set using PROC MIXED, the univariate or
stacked-data view will be required. The univariate view below was
obtained by Proc Transpose:
Name, department
16 Date
Name, department
17 Date
Name, department
18 Date
Name, department
19 Date
Name, department
20 Date
Name, department
21 Date
Name, department
22 Date
Name, department
23 Date
PROC MIXED data=prostate method=REML asycov asycorr covtest ic;
CLASS id group timeclss ;
Proc
MODEL lnpsa = group age group*time age*time group*time2 age*time2
/ noint solution ddfm=satterth covb chisq;
ID id time ;
RANDOM intercept time time2 /type=un subject=id g gcorr v vcorr solution;
REPEATED timeclss / type=simple subject=id r rcorr ;
CONTRAST ‘Final model'
age*time 1,
age*time2 1,
group*time2 1 0 0 0,
group*time2 0 1 0 0,
group*time2 0 0 1 –1
/chisq;
ESTIMATE ‘Diff L/R-BPH, t=5yr’
group 0 –4 4 0
group*time 0 -2 2 0,
group*time2 0 0 1 0,
/ cl alpha=0.05 divisor=4chisq;
MAKE ‘solutionR’ out=randeff;
RUN;
Name, department
24 Date
Proc mixed
 Proc mixed invokes the mixed procedure.
 Method= specifies the estimation methods (ML. REML. MIVQUE0)
 Asycov and Asycorr can be used for printing the asymptotic
covariance and correlation matrices for the marginal model
 Covtest prints the asymptotic standard errors and associated
Wald tests for the variance components
 ic calculates some information criteria
 Class specifies which variables are considered as factors.
Name, department
25 Date
Proc mixed
 Model specifies the model (i.e response and fixed effects
Xi). Intercept included by default.
 Solution prints estimates of the fixed effects in the model together
with standard errors, t-statistics and p-values for significance.
 Covb gives the whole covariance matrix for the estimates.
 ddfm= specifies the number of degrees of freedomin the t- and Fapproximations. One of many options is Satterthwaite.
 Chisq is used to make SAS include Wld tests next to the default tand F-tests for all effects specified in the model.
Name, department
26 Date
Proc mixed
 Id In general, SAS uses the same order as the original
data but it does not hurt to have an extra column helping
identifying the records and the subjects. This is nice to
have e.g. when using predmeans or predicted to get
predicted values.
 Random specifies the random effects (Zi). Notice that
random intercept is not default.
 Solution needed to calculate empirical Bayes estimates
 Subject= id
 G, gcorr, v, vcorr print correlation matrices D and Vi. Default is first
subject but number of subjects can be specified.Name, department
27 Date
Proc mixed
 Repeated used to specify the i. The repeated effects must be
classification variables.
 Type= specifies the structure of i. Simple meabns independence.
 r and rcorr print residual covariance, i , and correlation matrices.
 Contrast eeallows testing hypothesis of the form
H 0 : L  0, versus H A : L  0
 Several contrasts can be specified and thereby we can run several
tests at the same time. A label is needed in single quotes as well as
the linear combinations (the rows in L). F-test is default but the Wald
test can be run using the chisq option.
Name, department
28 Date
Proc mixed
 Estimate permits the estimation of one or several linear
combinations of the fixed effects. A label is needed in
single quotes as well. Very similar to contrast but output
also includes confidence intervals.
 Use the option cl alpha = 0.05 when you require an t-type test with
a = 0.05
 Make is used to convert parts of the output into a sas data
set. In later versions Make is replaced by ODS.
Name, department
29 Date
Modelling the Covariance Structure Using the
RANDOM and REPEATED Statements in PROC
MIXED
Measures on different individuals are independent, so covariance needs
attention only with measures on the same individuals. The covariance
structure refers to variances at individual times and to correlation between
measures at different times on the same individual. There are basically two
aspects of the correlation.
 First, two measures on the same individual are correlated simply because they
share common contributions from that individual. This is due to variation
between indivduals.
 Second, measures on the same individual close in time are often more highly
correlated than measures far apart in time. This is covariation within indivduals.
.
Usually, when using PROC MIXED, the variation between indivduals is
specified by the RANDOM statement, and covariation within indivduals is
specified by the REPEATED statement
Name, department
30 Date
 PROC MIXED fits
many different
structures (some
are listed here).
Note also that a
particular
structure may be
fit using more
than one “TYPE”
designation, and
with combinations
of the RANDOM
and REPEATED
statements.
Name, department
31 Date
Confusing Proc mixed?
 The simple answer to why SAS's PROC MIXED can seem so confusing
is that it's so powerful, but there's more to it than that. Early on, many
guides to PROC MIXED presented an example of fitting a compound
symmetry model to a repeated measures study in which subjects (ID) are
randomized to one of many treatments (TREAT) and then measured at
multiple time points (PERIOD). The command language to analyze these
data can be written as
proc mixed;
class id treat period;
model y=treat period treat*period;
repeat period/sub=id(
treat)type=cs;
 or
proc mixed;
class id treat period;
model y=treat period treat*period;
random id(
treat)
;
Name, department
32 Date
 Because both sets of command language produce the
correct analysis, this immediately raises confusion over the
roles of the repeated and random statements. In order to
sort this out, the underlying mathematics must be
reviewed. Once the reason for the equivalence is
understood, the purposes of the repeated and random
statements will be clear, cf. the following:
http://www.jerrydallal.com/LHSP/mixedq.htm
Name, department
33 Date
Summary:
In Proc Mixed, the mixed model is specified by means of a
number of statements like CLASS, MODEL, RANDOM and
REPEATED.
 The CLASS statement identifies the classification variables (for
example, gender, person, age, etc.).
 The MODEL statement specifies the model’s fixed effects
equation, Xiβ. Thus, the design matrix Xi is defined and the
model’s intercept is included by default.
 The RANDOM statement is used to specify random effects and the
form of covariance matrix D. (Useful options: SOLUTION: print random effects solution).
 The REPEATED statement models the intra-individual variation
and includes the structure of i=Cov(ei), where i is a block
diagonal matrix for each subject. (If the REPEATED statement is not included it is
assumed that i=σ2I).
 LSMEANS Calculates least squares mean estimates of specified
fixed effects.
Name, department
34 Date
The rat data
proc mixed data=rat method=reml;
class id group;
model y = t group*t / solution;
random intercept t / type=un subject=id ;
run;
Name, department
35 Date
?
Name, department
36 Date
Name, department
37 Date
Results
Name, department
38 Date
Using the option nobound
 Non convergence or non positive definiteness can be
indications of negative variance components. Usually Proc
mixed would not allow that to happen. But using the option
nobound in Proc mixed will result in a new set of estimates
where d22 is negative. Consider the fitted variance function:
Hence, the negative variance component suggests
a negative curvature in the variance function.
Name, department
39 Date
Name, department
40 Date
Name, department
41 Date
The prostate data
Age could not be matched
Name, department
42 Date
SAS code
Name, department
43 Date
ML and REML estimates:
Name, department
44 Date
ML and REML estimates (cont’d)
Name, department
45 Date
Fitted average profiles
Name, department
46 Date
Name, department
47 Date
In practice, histograms and scatter plots of certain
components of the estimate of bi can be used to detect
model deviations or subjects with exceptional evolution
over time.
Name, department
48 Date
Histograms and scatter plots
Correlations between components of estimate of b
Name, department
49 Date
Name, department
50 Date
Name, department
51 Date
Mixed Models using R vs. SAS:
Do they give the same answers?
 When you perform a mixed model analysis using R, the
question arises are we using the ”right model”?
 The syntax for lme() –is not as transparent as its SAS
counterpart and its documentation is not as good.
 Therefore it is interesting to compare the two approaches.
 R and SAS give different answers! One of the reasons is
that they apply different restrictions to achieve uniqueness of
estimates.
 It is possible to ”force” R to get the same answer as PROC
MIXED in SAS.
Name, department
52 Date
R commands for linear mixed models
 Commands for linear mixed models are in the library
nlme.
data <- read.table(file.choose(), header = T)
attach(data)
Time = factor(Time)
Group = factor(Group)
Subj = factor(Subj)
library(nlme)
model <- lme(y ~ Time + Group + Time*Group, random = ~1 | Subj)
summary(model)
anova(model)
# This model is very close to the one produced by SAS using compound symmetry,
# when it comes to F values, and the log likelihood is the same. But the AIC
# and BIC are quite different. The StDev for the Random Effects are the same
# when squared. The coefficients are different because R uses the first level
# as the base, whereas SAS uses the last.
Name, department
53 Date
PROC MIXED: Applied to Clustered
Data (SAS/STAT User’s Guide )
 In the following example, the response variable height
measures the heights (in inches) of 18 individuals who are
classified according to their respective family and gender. We
will peform two analyses:
1. A traditional two-way analysis of variance (ANOVA) of these
unbalanced data (which produces output similar to what you get with
PROC GLM) The assumptions concerning the residuals from an
analysis of variance include:
1. normally distributed
2. independent
3. have constant variance
Name, department
54 Date
2. A mixed model which includes both random and fixed effects. The
random effects, family and family*gender, are now listed only on the
RANDOM statement (notice the two terms must not appear on the
MODEL statement with PROC MIXED). The type=vc option specifies the
variance components model for both family and family*gender. The
residual matrix is assumed to equal σ2*I18 where I is an 18x18 identity
matrix. Declaring family as a random effect in this model sets up a
common correlation among all heights measured from the same family.
The interaction of family*gender as a second random effect also
accounts for the correlation between all observations that have the same
level of both family and gender. One interpretation is that females will
have a higher (or lower) correlation with other females in the same family
than males will have with other males in the same family. With the height
data, this random effects model seems reasonable.
Name, department
55 Date
PROC MIXED: Applied to Clustered
Data (SAS/STAT User’s Guide )
DATA heights;
INPUT family obs gender
height @@;
CARDS;
1 1 F 67 1 2 F 66 1 3 F
1 1 M 71 1 2 M 72
2 1 F 63 2 2 F 63 2 3 F
2 1 M 69 2 2 M 68 2 3 M
3 1 F 63 3 1 M 64
4 1 F 67 4 2 F 66
4 1 M 67 4 2 M 67 4 3 M
;
Name, department
56 Date
PROC TABULATE NOseps;
$
CLASS family gender obs;
VAR height;
64
TABLE family, gender*obs='
'*height='
'*sum=' '*f=6.0
67
/ rts=10 BOX=' Heights'
70
misstext=' ';
RUN;
69
Name, department
57 Date
PROC MIXED DATA=heights NOitprint;
CLASS gender family;
MODEL height = gender family family*gender;
RUN;
Name, department
58 Date
PROC MIXED DATA=heights NOitprint;
CLASS gender family;
MODEL height = gender;
RANDOM family family*gender / subject=family type=vc
vcorr ;
RUN;
Name, department
59 Date
Conclusions
 In the second analysis we perform a significance test for the fixed
effect, gender. Note that its p-value (p=0.0667) is larger than the one
observed in the first statistical model that assumed all fixed effects
(p=0.0139). The contrast in these two results illustrates the importance
of modeling family as a random, rather than a fixed, effect. In fact, if
0.05 is applied as the cutoff point for significance, the fixed effects
model shows a significant effect, whereas the model with random
effects does not.
 An additional benefit of a random effects analysis is that it enables you
to make inferences about gender that apply to a population of families,
whereas the inferences about gender from the analysis where family
and family*gender are treated as fixed effects apply only to the
particular families present in the dataset.
 This simple example was designed to show you how PROC MIXED
lets you model correlation in your data directly and make inferences
about fixed effects that apply to entire populations of random effects.
Name, department
60 Date
References
1. 1. Littell, R.C., Milliken, G.A., Stroup, W.W., and
Wolfinger, R.D. (1996). SAS System for Mixed Models,
SAS Institute, Cary, NC.
2. 2. SAS/STAT User’s Guide: The Mixed Procedure.
Chapter 41, Section 6 “Clustered Data Example,”
3. http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/
stat/chap41/sect6.htm
4. http://sas.uoregon.edu/sashtml/stat/chap41/sect6.htm
Name, department
61 Date
?
Any Questions
Name, department
62 Date