MCMC Estimation for Random Effect Modelling – The MLwiN

Download Report

Transcript MCMC Estimation for Random Effect Modelling – The MLwiN

MCMC Estimation for Random
Effect Modelling – The MLwiN
experience
Dr William J. Browne
School of Mathematical Sciences
University of Nottingham
Contents
• Random effect modelling, MCMC and MLwiN.
• Methods comparison – Guatemalan child health
example.
• Extendibility of MCMC algorithms:
• Cross classified and multiple membership
models.
• Artificial insemination and Danish chicken
examples.
• Further Extensions.
Random effect models
• Models that account for the underlying structure in the
dataset.
• Originally developed for nested structures (multilevel
models), for example in education, pupils nested within
schools.
• An extension of linear modelling with the inclusion of
random effects.
A typical 2-level model is
yij   0  xij 1  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
Here i indexes pupils and j indexes schools.
MLwiN
• Software package designed specifically for fitting
multilevel models.
• Developed by a team led by Harvey Goldstein and Jon
Rasbash at the Institute of Education in London over
past 15 years or so. Earlier incarnations ML2, ML3,
MLN.
• Originally contained ‘classical’ IGLS estimation methods
for fitting models.
• MLwiN launched in 1998 also included MCMC
estimation.
• My role in team was as developer of MCMC functionality
in MLwiN during 4.5 years at the IOE.
Estimation Methods for Multilevel
Models
Due to additional random effects no simple matrix
formulae exist for finding estimates in
multilevel models.
Two alternative approaches exist:
1. Iterative algorithms e.g. IGLS, RIGLS, EM in
HLM that alternate between estimating fixed
and random effects until convergence. Can
produce ML and REML estimates.
2. Simulation-based Bayesian methods e.g.
MCMC that attempt to draw samples from the
posterior distribution of the model.
MCMC Algorithm
• Consider the 2-level model
yij   0  xij 1  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
• MCMC algorithms work in a Bayesian framework
and so we need to add prior distributions for the
unknown parameters.
• Here there are 4 sets of unknown parameters:
 , u, u2 , e2
• We will add prior distributions
p( ), p( u2 ), p( e2 )
MCMC Algorithm (2)
The algorithm for this model then involves simulating in
turn from the 4 sets of conditional distributions. Such
an algorithm is known as Gibbs Sampling. MLwiN
uses Gibbs sampling for all normal response models.
Firstly we set starting values for each group of unknown
2
2
parameters,  (0) , u(0) ,  u (0) ,  e (0)
Then sample from the following conditional distributions,
firstly
2
2
p( | y, u(0) , u (0) , e(0) )
To get  (1) .
MCMC Algorithm (3)
We next sample from
p(u | y, (1) , u2(0) , e2(0) )
to get u(1), then
p( u2 | y, (1) , u(1) , e2(0) )
to get  u (1) , then finally
p( e2 | y, (1) , u(1) , u2(1) )
2
To get  e2(1) . We have then updated all of the unknowns
in the model. The process is then simply repeated
many times, each time using the previously
generated parameter values to generate the next set
Burn-in and estimates
Burn-in: It is general practice to throw away the
first n values to allow the Markov chain to
approach its equilibrium distribution namely the
joint posterior distribution of interest. These
iterations are known as the burn-in.
Finding Estimates: We continue generating
values at the end of the burn-in for another m
iterations. These m values are then average to
give point estimates of the parameter of interest.
Posterior standard deviations and other
summary measures can also be obtained from
the chains.
Methods for non-normal responses
• When the response variable is Binomial or
Poisson then different algorithms are required.
• IGLS/RIGLS methods give quasilikelihood
estimates e.g. MQL, PQL.
• MCMC algorithms including Metropolis Hastings
sampling and Adaptive Rejection sampling are
possible.
• Numerical Quadrature can give ML estimates
but is not without problems.
So why use MCMC?
• Often gives better estimates for non-normal
responses.
• Gives full posterior distribution so interval
estimates for derived quantities are easy to
produce.
• Can easily be extended to more complex
problems.
• Potential downside 1: Prior distributions required
for all unknown parameters.
• Potential downside 2: MCMC estimation is much
slower than the IGLS algorithm.
The Guatemalan Child Health
dataset.
This consists of a subsample of 2,449 respondents from the 1987
National Survey of Maternal and Child Helath, with a 3-level
structure of births within mothers within communities.
The subsample consists of all women from the chosen communities
who had some form of prenatal care during pregnancy. The
response variable is whether this prenatal care was modern
(physician or trained nurse) or not.
Rodriguez and Goldman (1995) use the structure of this dataset to
consider how well quasi-likelihood methods compare with
considering the dataset without the multilevel structure and fitting a
standard logistic regression.
They perform this by constructing simulated datasets based on the
original structure but with known true values for the fixed effects
and variance parameters.
They consider the MQL method and show that the estimates of the
fixed effects produced by MQL are worse than the estimates
produced by standard logistic regression disregarding the multilevel
structure!
The Guatemalan Child Health
dataset.
Goldstein and Rasbash (1996) consider the same problem but use
the PQL method. They show that the results produced by PQL
2nd order estimation are far better than for MQL but still biased.
The model in this situation is
yijk ~ Bernouilli( pijk ) with
logit( p ijk )   0  1 x1ijk   2 x2 jk   3 x3k  u jk  vk
where u jk ~ N (0,  u2 ) and vk ~ N (0,  v2 ).
In this formulation i,j and k index the level 1, 2 and 3 units
respectively.
The variables x1,x2 and x3 are composite scales at each level
because the original model contained many covariates at each
level.
Browne and Draper (2004) considered the hybrid Metropolis-Gibbs
method in MLwiN and two possible variance priors (Gamma-1(ε,ε)
and Uniform.
Simulation Results
The following gives point estimates (MCSE) for 4 methods and 500 simulated
datasets.
Parameter
(True)
MQL1
PQL2
Gamma
Uniform
β0 (0.65) 0.474 (0.01) 0.612 (0.01) 0.638 (0.01) 0.655 (0.01)
β1 (1.00) 0.741 (0.01) 0.945 (0.01) 0.991 (0.01) 1.015 (0.01)
β2 (1.00) 0.753 (0.01) 0.958 (0.01) 1.006 (0.01) 1.031 (0.01)
β3 (1.00) 0.727 (0.01) 0.942 (0.01) 0.982 (0.01) 1.007 (0.01)
σ2v (1.00) 0.550 (0.01) 0.888 (0.01) 1.023 (0.01) 1.108 (0.01)
σ2u (1.00) 0.026 (0.01) 0.568 (0.01) 0.964 (0.02) 1.130 (0.02)
Simulation Results
The following gives interval coverage probabilities (90%/95%) for 4 methods
and 500 simulated datasets.
Parameter
(True)
MQL1
PQL2
Gamma
Uniform
β0 (0.65)
67.6/76.8
86.2/92.0
86.8/93.2
88.6/93.6
β1 (1.00)
56.2/68.6
90.4/96.2
92.8/96.4
92.2/96.4
β2 (1.00)
13.2/17.6
84.6/90.8
88.4/92.6
88.6/92.8
β3 (1.00)
59.0/69.6
85.2/89.8
86.2/92.2
88.6/93.6
σ2v (1.00)
0.6/2.4
70.2/77.6
89.4/94.4
87.8/92.2
σ2u (1.00)
0.0/0.0
21.2/26.8
84.2/88.6
88.0/93.0
Summary of simulations
The Bayesian approach yields excellent bias and
coverage results.
For the fixed effects, MQL performs badly but the
other 3 methods all do well.
For the random effects, MQL and PQL both
perform badly but MCMC with both priors is
much better.
Note that this is an extreme scenario with small
levels 1 in level 2 yet high level 2 variance and
in other examples MQL/PQL will not be so bad.
Extension 1: Cross-classified models
For example, schools by neighbourhoods. Schools will draw pupils from
many different neighbourhoods and the pupils of a neighbourhood will go
to several schools. No pure hierarchy can be found and pupils are said to
be contained within a cross-classification of schools by neighbourhoods :
nbhd 1
nbhd 2
School 1
xx
x
School 2
x
x
Nbhd 3
School 3
xx
x
School 4
x
xxx
School
Pupil
Nbhd
S1
S2
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
N1
N2
N3
Notation
With hierarchical models we use a subscript notation that has one
subscript per level and nesting is implied reading from the left. For
example, subscript pattern ijk denotes the i’th level 1 unit within the
j’th level 2 unit within the k’th level 3 unit.
If models become cross-classified we use the term classification
instead of level. With notation that has one subscript per
classification, that captures the relationship between classifications,
notation can become very cumbersome. We propose an alternative
notation introduced in Browne et al. (2001) that only has a single
subscript no matter how many classifications are in the model.
Single subscript notation
School
Pupil
S1
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
Nbhd
i
1
2
3
4
5
6
7
8
9
10
11
12
S2
N1
nbhd(i)
1
2
1
2
1
2
2
3
3
2
3
3
sch(i)
1
1
1
2
2
2
3
3
4
4
4
4
N2
N3
We write the model as
( 2)
(3)
yi  0  unbhd
(i )  usch (i )  ei
(1)
Where classification 2 is neighbourhood and classification 3
is school. Classification 1 always corresponds to the
classification at which the response measurements are
made, in this case patients. For pupils 1 and 11 equation (1)
becomes:
y1   0  u1( 2)  u1(3)  e1
y11   0  u3( 2)  u 4(3)  e1
Classification diagrams
In the single subscript notation we lose information about the
relationship(crossed or nested) between classifications. A useful way of
conveying this information is with the classification diagram. Which has one
node per classification and nodes linked by arrows have a nested relationship
and unlinked nodes have a crossed relationship.
Neighbourhood
School
Neighbourhood
School
Pupil
Nested structure where
schools are contained within
neighbourhoods
Pupil
Cross-classified structure where
pupils from a school come from
many neighbourhoods and pupils
from a neighbourhood attend several
schools.
Example : Artificial insemination by donor
1901 women
279 donors
1328 donations
12100 ovulatory cycles
response is whether conception occurs in a given cycle
In terms of a unit diagram:
Women
Cycles
w1
c1 c2 c3 c4…
Or a classification diagram:
w2
w3
c1 c2 c3 c4…
Donor
c1 c2 c3 c4…
Donation
Donations
Donors
d1
m1
d2
d1 d2 d3
d1 d2
m2
m3
Woman
Cycle
Model for artificial insemination data
We can write the model as
yi ~ Binomial (1, i )
Results:
Parameter
Description
Estimate(se)
( 2)
(3)
( 4)
logit(  i )  ( X ) i  u woman
(i )  u donation(i )  u donor(i )
0
intercept
( 2)
2
u woman
(i ) ~ N (0, u ( 2) )
1
azoospermia *
0.22(0.11)
(3)
2
u donation
(i ) ~ N (0, u (3) )
2
semen quality
0.19(0.03)
( 4)
2
u donor
(i ) ~ N (0, u ( 4) )
3
womens age>35
4
sperm count
0.20(0.07)
5
sperm motility
0.02(0.06)
6
insemination to early
-0.72(0.19)
7
insemination to late
-0.27(0.10)
Note cross-classified models can
be fitted in IGLS but are far easier
to fit using MCMC estimation.
-4.04(2.30)
-0.30(0.14)
 u2( 2)
women variance
1.02(0.21)
 u2(3)
donation variance
0.644(0.21)
 u2( 4)
donor variance
0.338(0.07)
Extension 2: Multiple membership models
When level 1 units are members of more than one higher
level unit we describe a model for such data as a multiple
membership model.
For example,
• Pupils change schools/classes and each school/class has
an effect on pupil outcomes.
• Patients are seen by more than one nurse during the
course of their treatment.
Notation
yi  ( XB)i 
w
( 2)
i, j
jnurse ( i )
u
( 2)
j
 ei
u (j2) ~ N (0,  u2( 2) )
(2)
ei ~ N (0,  e2 )
n1(j=1)
n2(j=2)
n3(j=3)
p1(i=1)
0.5
0
0.5
p2(i=2)
1
0
0
p3(i=3)
0
0.5
0.5
p4(i=4)
0.5
0.5
0
y1  ( XB)1  0.5u1( 2)  0.5u3( 2)  e1
y2  ( XB) 2  1u1( 2)  e2
y3  ( XB)3  0.5u2( 2)  0.5u3( 2)  e3
y4  ( XB) 4  0.5u1( 2)  0.5u2( 2)  e4
Note that nurse(i) now indexes the set of
nurses that treat patient i and w(2)i,j is a
weighting factor relating patient i to nurse j.
For example, with four patients and three
nurses, we may have the following weights:
Here patient 1 was seen by nurse 1
and 3 but not nurse 2 and so on. If
we substitute the values of w(2)i,j , i
and j. from the table into (2) we get
the series of equations :
Classification diagrams for multiple
membership relationships
Double arrows indicate a multiple membership relationship between
classifications.
nurse
We can mix multiple membership, crossed and
hierarchical structures in a single model.
hospital
patient
nurse
GP practice
patient
Here patients are multiple members of nurses,
nurses are nested within hospitals and GP
practice is crossed with both nurse and hospital.
Example involving nesting, crossing and
multiple membership – Danish chickens
Production hierarchy
10,127 child flocks
725 houses
304 farms
Breeding hierarchy
10,127 child flocks
200 parent flocks
As a unit diagram:
farm
As a classification diagram:
f1
f2…
Farm
Houses
Child flocks
Parent flock
h1
c1 c2 c3…
p1
h2
h1
c1 c2 c3….
p2
h2
c1 c2 c3…. c1 c2 c3….
p3
p4
p5….
House
Parent flock
Child flock
Model and results
yi ~ Binomial (1,  i )
logit(  i )  ( XB)i 
( 3)
( 4)
 wi(,2j)u (j2)  uhouse
( i )  u farm( i )  ei
j p . flock( i )
( 3)
2
u (j2 ) ~ N (0, u2( 2 ) ) uhouse
( i ) ~ N (0,  u ( 3) )
4)
2
u (farm
( i ) ~ N (0,  u ( 4 ) )
Note multiple membership
models can be fitted in IGLS
and this model/dataset
represents roughly the most
complex model that the
method can handle.
Such models are far easier to
fit using MCMC estimation.
Results:
Parameter
Description
Estimate(se)
0
intercept
-2.322(0.213)
1
1996
-1.239(0.162)
2
1997
-1.165(0.187)
3
hatchery 2
-1.733(0.255)
4
hatchery 3
-0.211(0.252)
5
hatchery 4
-1.062(0.388)
 u2( 2 )
parent flock variance
0.895(0.179)
 u2( 3)
house variance
0.208(0.108)
 u2( 4)
farm variance
0.927(0.197)
Further Extensions / Work in
progress
1.
2.
3.
4.
Multilevel factor models
Response variables at different levels
Missing data and multiple imputation
ESRC grant: Sample size calculations,
MCMC efficiency & Model identifiability
5. Wellcome Fellowship grant for Martin
Green
Multilevel factor analysis modelling
• In sample surveys there are often many responses for
each individual.
• Techniques like factor analysis are often used to identify
underlying latent traits amongst these responses.
• Multilevel factor analysis allows factor analysis modelling
to identify factors at various levels/classifications in the
dataset so we can identify shared latent traits as well as
individual level traits.
• Due to the nature of MCMC algorithms by adding a step
to allow for multilevel factor models in MLwiN, crossclassified models can also be fitted without any
additional programming!
• See Goldstein and Browne (2002,2005) for more detail.
Responses at different levels
• In a medical survey some responses may refer to
patients in a hospital while others may refer to the
hospital itself.
• Models that combine these responses can be fitted using
the IGLS algorithm in MLwiN and shouldn’t pose any
problems to MCMC estimation.
• The Centre for Multilevel modelling in Bristol are
investigating such models as part of their LEMMA node
in the ESRC research methods program. I am a named
collaborator for the Lemma project.
• They are also looking at MCMC algorithms for latent
growth models.
Missing data and multiple
imputation
• Missing data is proliferate in survey research.
• One popular approach to dealing with missing data is
multiple imputation (Rubin 1987) where several imputed
datasets are created and then the model of interest is
fitted to each dataset and the estimates combined.
• Using a multivariate normal response multilevel model to
generate the imputations using MCMC in MLwiN is
described in chapter 17 of Browne (2003)
• James Carpenter (LSHTM) has begun work on macros
in MLwiN that automate the multiple imputation
procedure.
Sample size calculations
• Another issue in data collection is how big a sample do
we need to collect?
• Such sample size calculations have simple formulae if
we can assume that an independent sample can be
generated.
• If however we wish to account for the data structure in
the calculation then things are more complex.
• One possibility is a simulation-based approach similar to
that used in the model comparisons described earlier
where many datasets are simulated to look at the power
for a fixed sample size.
• Mousa Golalizadeh Lehi will be joining me in February
on an ESRC grant looking at such an approach. A 4th
year MMath. student (Lynda Leese) is looking at the
approach for nested models.
Efficient MCMC algorithms
• In MLwiN we have tended to use the simplest, most
generally applicable MCMC algorithms for multilevel
models.
• For particular models there are many approaches that
may improve the performance / mixing of the MCMC
algorithm.
• We will also investigate some of these methods in the
ESRC grant.
• Browne (2004) looked at some reparameterisation
methods for cross-classified models in a bird nesting
dataset.
• A second 4th year MMath. student (Francis Bourchier) is
looking at MCMC methods based around the IGLS
representation of nested models which are interesting.
Model Identifiability
• The final part of the ESRC grant is to look at whether a
model is identifiable/estimable given a particular set of
data.
• Cross-classified datasets where there are few level 1
units per higher level unit can result in each observation
being factored into several random effects with very few
observations being used to estimate each random effect.
• We are interested in establishing whether we can really
estimate all parameters in such models.
• An example where we can’t would be a dataset of
patients who are attended by doctors in wards. Now if
there is only one doctor per ward and likewise one ward
per doctor then we cannot tease out doctor and ward
effects. Again this work was motivated by a bird nesting
dataset.
Wellcome Fellowship
• Martin Green has been successful in obtaining 4 years of
funding from Wellcome to come and work with me.
• The project is entitled ‘Use of Bayesian statistical
methods to investigate farm management strategies,
cow traits and decision-making in the prevention of
clinical and sub-clinical mastitis in dairy cows.’
• Martin is a qualified veterinary and a specialist farm
animal surgeon.
• He completed a PhD in 2004 at the University of
Warwick veterinary epidemiology and used MCMC to fit
binary response multilevel models in both MLwiN and
WinBUGS to look at various factors that affect clinical
mastitis in dairy cows.
Wellcome Fellowship
• In the 4 years we are aiming to analyse a huge dataset
that Martin has been collecting in a Milk Development
Council grant.
• In particular we will look at how farm management
practices, environmental conditions and cow
characteristics influence the risk of mastitis during the
dry period.
• We will hope to get interesting applied results and also
some novel statistical methodology from the grant.
MCMC will again play a big part.
• Martin has been appointed as a professor in the new vet
school and will be working there 1 day a fortnight during
the grant before moving there full time after the four
years.
Conclusions
• In this talk we have attempted to show the flexibility of
MCMC methods in attempting to match the complexity of
data structures found in real problems.
• We have also contrasted the methods with the IGLS
algorithm.
• Although we have used MLwiN, all the models
considered here could also be fitted in WinBUGS.
• WinBUGS offers even more flexibility in model choice for
complex data structures however it is typically slower in
fitting models that can also be fitted in MLwiN.
• Genstat and the GLLAMM package in Stata also
deserve mention as alternatives in the ML/REML world
for the models considered.
• Slides available at
http://www.maths.nottingham.ac.uk/personal/pmzwjb/billt
alk.html
References
• Browne, W.J. (2003). MCMC Estimation in MLwiN.
London: Institute of Education, University of London.
• Browne, W.J. (2004). An illustration of the use of
reparameterisation methods for improving MCMC
efficiency in crossed random effect models. Multilevel
Modelling Newsletter 16 (1): 13-25
• Browne, W.J. and Draper, D. (2004). A Comparison of
Bayesian and Likelihood-based methods for fitting
multilevel models. University of Nottingham Research
Report 04-01
• Browne, W.J., Goldstein, H. and Rasbash, J. (2001).
Multiple membership multiple classification (MMMC)
models. Statistical Modelling 1: 103-124.
References
• Goldstein, H. and Browne, W. J. (2002). Multilevel factor analysis
modelling using Markov Chain Monte Carlo (MCMC) estimation. In
Marcoulides and Moustaki (Eds.), Latent Variable and Latent
Structure Models. p 225-243. Lawrence Erlbaum, New Jersey.
• Goldstein, H. and Browne, W.J. (2005). Multilevel Factor Analysis
Models for Continuous and Discrete Data. In Maydeu-Olivares, A
and McArdle, J.J. (Eds.), Contemporary psychometrics: a festschrift
for Roderick P. McDonald, p 453-475. Lawrence Erlbaum, New
Jersey.
• Goldstein, H. and Rasbash, J. (1996). Improved approximations for
multilevel models with binary responses. Journal of the Royal
Statistical Society, A. 159: 505-13.
• Rodriguez, G. and Goldman, N. (1995). An assessment of
estimation procedures for multilevel models with binary responses.
Journal of the Royal Statistical Society, Series A, 158, 73-89.
• Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys.
New York: J. Wiley and Sons.