MCMC Estimation for Random Effect Modelling – The

Download Report

Transcript MCMC Estimation for Random Effect Modelling – The

MCMC Estimation for Random
Effect Modelling – The MLwiN
experience
Dr William J. Browne
School of Math Sciences
University of Nottingham
Contents
• Random effect modelling and MLwiN.
• Methods comparison – Guatemalan child
health example.
• Extendibility of MCMC algorithms:
• Cross classified and multiple membership
models.
• Artificial insemination and Danish chicken
examples.
• Further Extensions.
Random effect models
• Models that account for the underlying structure in the
dataset.
• Originally developed for nested structures (multilevel
models), for example in education, pupils nested within
schools.
• An extension of linear modelling with the inclusion of
random effects.
A typical 2-level model is
yij   0  xij 1  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
Here i indexes pupils and j indexes schools.
MLwiN
• Software package designed specifically for fitting
multilevel models.
• Developed by a team led by Harvey Goldstein and Jon
Rasbash at the Institute of Education in London over
past 15 years or so. Earlier incarnations ML2, ML3,
MLN.
• Originally contained ‘classical’ IGLS estimation methods
for fitting models.
• MLwiN launched in 1998 also included MCMC
estimation.
• My role in team was as developer of MCMC functionality
in MLwiN during 4.5 years at the IOE.
Estimation Methods for Multilevel
Models
Due to additional random effects no simple matrix
formulae exist for finding estimates in
multilevel models.
Two alternative approaches exist:
1. Iterative algorithms e.g. IGLS, RIGLS, EM in
HLM that alternate between estimating fixed
and random effects until convergence. Can
produce ML and REML estimates.
2. Simulation-based Bayesian methods e.g.
MCMC that attempt to draw samples from the
posterior distribution of the model.
Methods for non-normal responses
• When the response variable is Binomial or
Poisson then different algorithms are required.
• IGLS/RIGLS methods give quasilikelihood
estimates e.g. MQL, PQL.
• MCMC algorithms including Metropolis Hastings
sampling and Adaptive Rejection sampling are
possible.
• Numerical Quadrature can give ML estimates
but is not without problems.
So why use MCMC?
• Often gives better estimates for non-normal
responses.
• Gives full posterior distribution so interval
estimates for derived quantities are easy to
produce.
• Can easily be extended to more complex
problems.
• Potential downside 1: Prior distributions required
for all unknown parameters.
• Potential downside 2: MCMC estimation is much
slower than the IGLS algorithm.
The Guatemalan Child Health
dataset.
This consists of a subsample of 2,449 respondents from the 1987
National Survey of Maternal and Child Helath, with a 3-level
structure of births within mothers within communities.
The subsample consists of all women from the chosen communities
who had some form of prenatal care during pregnancy. The
response variable is whether this prenatal care was modern
(physician or trained nurse) or not.
Rodriguez and Goldman (1995) use the structure of this dataset to
consider how well quasi-likelihood methods compare with
considering the dataset without the multilevel structure and fitting a
standard logistic regression.
They perform this by constructing simulated datasets based on the
original structure but with known true values for the fixed effects
and variance parameters.
They consider the MQL method and show that the estimates of the
fixed effects produced by MQL are worse than the estimates
produced by standard logistic regression disregarding the multilevel
structure!
The Guatemalan Child Health
dataset.
Goldstein and Rasbash (1996) consider the same problem but use
the PQL method. They show that the results produced by PQL
2nd order estimation are far better than for MQL but still biased.
The model in this situation is
yijk ~ Bernouilli( pijk ) with
logit( p ijk )   0  1 x1ijk   2 x2 jk   3 x3k  u jk  vk
where u jk ~ N (0,  u2 ) and vk ~ N (0,  v2 ).
In this formulation i,j and k index the level 1, 2 and 3 units
respectively.
The variables x1,x2 and x3 are composite scales at each level
because the original model contained many covariates at each
level.
Browne and Draper (2004) considered the hybrid Metropolis-Gibbs
method in MLwiN and two possible variance priors (Gamma-1(ε,ε)
and Uniform.
Simulation Results
The following gives point estimates (MCSE) for 4 methods and 500 simulated
datasets.
Parameter
(True)
MQL1
PQL2
Gamma
Uniform
β0 (0.65) 0.474 (0.01) 0.612 (0.01) 0.638 (0.01) 0.655 (0.01)
β1 (1.00) 0.741 (0.01) 0.945 (0.01) 0.991 (0.01) 1.015 (0.01)
β2 (1.00) 0.753 (0.01) 0.958 (0.01) 1.006 (0.01) 1.031 (0.01)
β3 (1.00) 0.727 (0.01) 0.942 (0.01) 0.982 (0.01) 1.007 (0.01)
σ2v (1.00) 0.550 (0.01) 0.888 (0.01) 1.023 (0.01) 1.108 (0.01)
σ2u (1.00) 0.026 (0.01) 0.568 (0.01) 0.964 (0.02) 1.130 (0.02)
Simulation Results
The following gives interval coverage probabilities (90%/95%) for 4 methods
and 500 simulated datasets.
Parameter
(True)
MQL1
PQL2
Gamma
Uniform
β0 (0.65)
67.6/76.8
86.2/92.0
86.8/93.2
88.6/93.6
β1 (1.00)
56.2/68.6
90.4/96.2
92.8/96.4
92.2/96.4
β2 (1.00)
13.2/17.6
84.6/90.8
88.4/92.6
88.6/92.8
β3 (1.00)
59.0/69.6
85.2/89.8
86.2/92.2
88.6/93.6
σ2v (1.00)
0.6/2.4
70.2/77.6
89.4/94.4
87.8/92.2
σ2u (1.00)
0.0/0.0
21.2/26.8
84.2/88.6
88.0/93.0
Summary of simulations
The Bayesian approach yields excellent bias and
coverage results.
For the fixed effects, MQL performs badly but the
other 3 methods all do well.
For the random effects, MQL and PQL both
perform badly but MCMC with both priors is
much better.
Note that this is an extreme scenario with small
levels 1 in level 2 yet high level 2 variance and
in other examples MQL/PQL will not be so bad.
Extension 1: Cross-classified models
For example, schools by neighbourhoods. Schools will draw pupils from
many different neighbourhoods and the pupils of a neighbourhood will go
to several schools. No pure hierarchy can be found and pupils are said to
be contained within a cross-classification of schools by neighbourhoods :
nbhd 1
nbhd 2
School 1
xx
x
School 2
x
x
Nbhd 3
School 3
xx
x
School 4
x
xxx
School
Pupil
Nbhd
S1
S2
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
N1
N2
N3
Notation
With hierarchical models we use a subscript notation that has one
subscript per level and nesting is implied reading from the left. For
example, subscript pattern ijk denotes the i’th level 1 unit within the
j’th level 2 unit within the k’th level 3 unit.
If models become cross-classified we use the term classification
instead of level. With notation that has one subscript per
classification, that captures the relationship between classifications,
notation can become very cumbersome. We propose an alternative
notation introduced in Browne et al. (2001) that only has a single
subscript no matter how many classifications are in the model.
Single subscript notation
School
Pupil
S1
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
Nbhd
i
1
2
3
4
5
6
7
8
9
10
11
12
S2
N1
nbhd(i)
1
2
1
2
1
2
2
3
3
2
3
3
sch(i)
1
1
1
2
2
2
3
3
4
4
4
4
N2
N3
We write the model as
( 2)
(3)
yi  0  unbhd
(i )  usch (i )  ei
(1)
Where classification 2 is neighbourhood and classification 3
is school. Classification 1 always corresponds to the
classification at which the response measurements are
made, in this case patients. For pupils 1 and 11 equation (1)
becomes:
y1   0  u1( 2)  u1(3)  e1
y11   0  u3( 2)  u 4(3)  e1
Classification diagrams
In the single subscript notation we lose information about the
relationship(crossed or nested) between classifications. A useful way of
conveying this information is with the classification diagram. Which has one
node per classification and nodes linked by arrows have a nested relationship
and unlinked nodes have a crossed relationship.
Neighbourhood
School
Neighbourhood
School
Pupil
Nested structure where
schools are contained within
neighbourhoods
Pupil
Cross-classified structure where
pupils from a school come from
many neighbourhoods and pupils
from a neighbourhood attend several
schools.
Example : Artificial insemination by donor
1901 women
279 donors
1328 donations
12100 ovulatory cycles
response is whether conception occurs in a given cycle
In terms of a unit diagram:
Women
Cycles
w1
c1 c2 c3 c4…
Or a classification diagram:
w2
w3
c1 c2 c3 c4…
Donor
c1 c2 c3 c4…
Donation
Donations
Donors
d1
m1
d2
d1 d2 d3
d1 d2
m2
m3
Woman
Cycle
Model for artificial insemination data
We can write the model as
yi ~ Binomial (1, i )
Results:
Parameter
Description
Estimate(se)
( 2)
(3)
( 4)
logit(  i )  ( X ) i  u woman
(i )  u donation(i )  u donor(i )
0
intercept
( 2)
2
u woman
(i ) ~ N (0, u ( 2) )
1
azoospermia *
0.22(0.11)
(3)
2
u donation
(i ) ~ N (0, u (3) )
2
semen quality
0.19(0.03)
( 4)
2
u donor
(i ) ~ N (0, u ( 4) )
3
womens age>35
4
sperm count
0.20(0.07)
5
sperm motility
0.02(0.06)
6
insemination to early
-0.72(0.19)
7
insemination to late
-0.27(0.10)
Note cross-classified models can
be fitted in IGLS but are far easier
to fit using MCMC estimation.
-4.04(2.30)
-0.30(0.14)
 u2( 2)
women variance
1.02(0.21)
 u2(3)
donation variance
0.644(0.21)
 u2( 4)
donor variance
0.338(0.07)
Extension 2: Multiple membership models
When level 1 units are members of more than one higher
level unit we describe a model for such data as a multiple
membership model.
For example,
• Pupils change schools/classes and each school/class has
an effect on pupil outcomes.
• Patients are seen by more than one nurse during the
course of their treatment.
Notation
yi  ( XB)i 
w
( 2)
i, j
jnurse ( i )
u
( 2)
j
 ei
u (j2) ~ N (0,  u2( 2) )
(2)
ei ~ N (0,  e2 )
n1(j=1)
n2(j=2)
n3(j=3)
p1(i=1)
0.5
0
0.5
p2(i=2)
1
0
0
p3(i=3)
0
0.5
0.5
p4(i=4)
0.5
0.5
0
y1  ( XB)1  0.5u1( 2)  0.5u3( 2)  e1
y2  ( XB) 2  1u1( 2)  e2
y3  ( XB)3  0.5u2( 2)  0.5u3( 2)  e3
y4  ( XB) 4  0.5u1( 2)  0.5u2( 2)  e4
Note that nurse(i) now indexes the set of
nurses that treat patient i and w(2)i,j is a
weighting factor relating patient i to nurse j.
For example, with four patients and three
nurses, we may have the following weights:
Here patient 1 was seen by nurse 1
and 3 but not nurse 2 and so on. If
we substitute the values of w(2)i,j , i
and j. from the table into (2) we get
the series of equations :
Classification diagrams for multiple
membership relationships
Double arrows indicate a multiple membership relationship between
classifications.
nurse
We can mix multiple membership, crossed and
hierarchical structures in a single model.
hospital
patient
nurse
GP practice
patient
Here patients are multiple members of nurses,
nurses are nested within hospitals and GP
practice is crossed with both nurse and hospital.
Example involving nesting, crossing and
multiple membership – Danish chickens
Production hierarchy
10,127 child flocks
725 houses
304 farms
Breeding hierarchy
10,127 child flocks
200 parent flocks
As a unit diagram:
farm
As a classification diagram:
f1
f2…
Farm
Houses
Child flocks
Parent flock
h1
c1 c2 c3…
p1
h2
h1
c1 c2 c3….
p2
h2
c1 c2 c3…. c1 c2 c3….
p3
p4
p5….
House
Parent flock
Child flock
Model and results
yi ~ Binomial (1,  i )
logit(  i )  ( XB)i 
( 3)
( 4)
 wi(,2j)u (j2)  uhouse
( i )  u farm( i )  ei
j p . flock( i )
( 3)
2
u (j2 ) ~ N (0, u2( 2 ) ) uhouse
( i ) ~ N (0,  u ( 3) )
4)
2
u (farm
( i ) ~ N (0,  u ( 4 ) )
Note multiple membership
models can be fitted in IGLS
and this model/dataset
represents roughly the most
complex model that the
method can handle.
Such models are far easier to
fit using MCMC estimation.
Results:
Parameter
Description
Estimate(se)
0
intercept
-2.322(0.213)
1
1996
-1.239(0.162)
2
1997
-1.165(0.187)
3
hatchery 2
-1.733(0.255)
4
hatchery 3
-0.211(0.252)
5
hatchery 4
-1.062(0.388)
 u2( 2 )
parent flock variance
0.895(0.179)
 u2( 3)
house variance
0.208(0.108)
 u2( 4)
farm variance
0.927(0.197)
Further Extensions / Work in
progress
1.
2.
3.
4.
Multilevel factor models
Response variables at different levels
Missing data and multiple imputation
Sample size calculations
Multilevel factor analysis modelling
• In survey analysis often there are many responses for
each individual.
• Techniques like factor analysis are often used to identify
underlying latent traits amongst the responses.
• Multilevel factor analysis allows factor analysis modelling
to identify factors at various classifications in the dataset.
• Due to the nature of MCMC algorithms by adding a step
to allow for multilevel factor models in MLwiN, crossclassified models can also be fitted without any
additional programming!
• See Goldstein and Browne (2002,2005) for more detail.
Responses at different levels
• In a household survey some responses refer to
individuals in a household while others refer to
the household itself.
• Models that combine these responses can be
fitted using the IGLS algorithm in MLwiN and
shouldn’t pose any problems to MCMC.
• The Centre for Multilevel modelling in Bristol are
investigating such models.
• See also Nicky Best’s talk on combining
datasets which is a similar issue.
Missing data and multiple
imputation
• Missing data is proliferate in survey research.
• One popular approach to dealing with missing data is
multiple imputation (Rubin 1987) where several imputed
datasets are created and then the model of interest is
fitted to each dataset and the estimates combined.
• Using a multivariate normal response multilevel model to
generate the imputations using MCMC in MLwiN is
described in chapter 17 of Browne (2003)
• James Carpenter (LSHTM) has begun work on macros
in MLwiN that automate the multiple imputation
procedure.
Sample size calculations
• Another issue in data collection is how big a sample do
we need to collect?
• Such sample size calculations have simple formulae if
we can assume that an independent sample can be
generated.
• If however we wish to account for the data structure in
the calculation then things are more complex.
• One possibility is a simulation-based approach similar to
that used in the model comparisons described earlier
where many datasets are simulated to look at the power
for a fixed sample size.
• Bonne Zijlstra will be joining me in Nottingham in
January on an ESRC grant looking at such an approach.
Conclusions
• In this talk we have attempted to show the flexibility of
MCMC methods in attempting to match the complexity of
data structures found in real problems.
• We have also contrasted the methods with the IGLS
algorithm.
• Although we have used MLwiN, all the models
considered here could also be fitted in WinBUGS.
• WinBUGS offers even more flexibility in model choice for
complex data structures however it is typically slower in
fitting models that can also be fitted in MLwiN.
• Genstat and the GLLAMM package in Stata also
deserve mention as alternatives in the ML/REML world
for the models considered.
References
•
•
•
•
•
•
•
•
Browne, W.J. (2003). MCMC Estimation in MLwiN. London: Institute of Education,
University of London.
Browne, W.J. and Draper, D. (2004). A Comparison of Bayesian and Likelihood-based
methods for fitting multilevel models. University of Nottingham Research Report 0401
Browne, W.J., Goldstein, H. and Rasbash, J. (2001). Multiple membership multiple
classification (MMMC) models. Statistical Modelling 1: 103-124.
Goldstein, H. and Browne, W. J. (2002). Multilevel factor analysis modelling using
Markov Chain Monte Carlo (MCMC) estimation. In Marcoulides and Moustaki (Eds.),
Latent Variable and Latent Structure Models. p 225-243. Lawrence Erlbaum, New
Jersey.
Goldstein, H. and Browne, W.J. (2005). Multilevel Factor Analysis Models for
Continuous and Discrete Data. In Maydeu-Olivares, A and McArdle, J.J. (Eds.),
Contemporary psychometrics: a festschrift for Roderick P. McDonald, p 453-475.
Lawrence Erlbaum, New Jersey.
Goldstein, H. and Rasbash, J. (1996). Improved approximations for multilevel models
with binary responses. Journal of the Royal Statistical Society, A. 159: 505-13.
Rodriguez, G. and Goldman, N. (1995). An assessment of estimation procedures for
multilevel models with binary responses. Journal of the Royal Statistical Society,
Series A, 158, 73-89.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: J.
Wiley and Sons.