Lecture 2 - Eis.bris.ac.uk

Download Report

Transcript Lecture 2 - Eis.bris.ac.uk

Lecture 15
Generalised linear mixed models
in WinBUGS
Lecture Contents
•
•
•
•
•
•
Differences in WinBUGS methods
Rejection sampling algorithm
Adaptive rejection (AR) sampling
Reunion Island dataset
Hierarchical centering
Probit formulation
Logistic regression model (recap)
• A standard Bayesian logistic regression
model (e.g. for the rat tumour example) can
be written as follows:
yi ~ Binom ial(ni , pi )
logit( pi )  X
 0 ~ N (0, m0 ), 1 ~ N (0, m1 )
• Both MLwiN and WinBUGS can fit this
model but can we write out the conditional
posterior distributions and use Gibbs
Sampling?
Conditional distribution for β0
p(  0 | y, 1 )  p(  0 ) p( y |  0 , 1 )

 exp( 0  1 xi ) 

1

exp(
) 
2m0 i  1  exp( 0  1 xi ) 
m0
2
0
yi


1


 1  exp( 0  1 xi ) 
ni  yi
p(  0 | y, 1 ) ~ ?
This distribution is not a standard distribution and
so we cannot simply simulate from a standard
random number generator. However both
WinBUGS and MLwiN can fit this model using
MCMC. We will in this lecture describe how WinBUGS
deals with this problem.
Why not use Gibbs?
To use Gibbs we would need to generate samples from the
distribution p(0 | y, 1 ).
Although this is not a standard distribution and so there is
no easily available function to generate directly from this
distribution other methods exist.
The method that WinBUGS uses is called adaptive rejection
(AR) sampling and can be used to sample from any log
concave distribution function.
Before going on to AR sampling we will consider rejection
sampling in general.
Rejection Sampling
Assume we wish to generate from a distribution function
f(x) but it is difficult to generate from this function directly.
However there exists a distribution function g(x) that is
easy to sample from and f(x) ≤ M g(x) for all x.
Note that this is often easier for distributions with a
bounded range but can also be used for unbounded
distributions.
Then we sample from g(x) and accept the sampled value
with probability Mg(x)/f(x). If we reject we sample again
until we accept a value. Note the algorithm works best if
g(x) is close to f(x) and M is small.
Truncated Normal example
Let us assume that we wish to
generate random variables from a
truncated standard normal
truncated at -2 and +2
The standard method would be to
generate from a Normal(0,1) and
reject values outside the range
(-2,2)
An alternative would be to generate
from a (scaled) Uniform
distribution on (2,-2) as shown.
The probability of acceptance is
around 0.95*sqrt(2π)/4=0.595
5000 random draws results in
acceptance rate of .606
Adapted Rejection Sampling
AR sampling (Gilks and Wild 1992) takes rejection
sampling a step further by clever construction of the
function g(x) that is used. The algorithm works for all log
concave functions f(x). A few points in f(x) are chosen
and the tangent to log f(x) is constructed at each point.
These tangents are joined to form a piecewise linear
function g(x) that is an envelope for log f(x).
When a point is chosen from g(x) it is accepted/rejected
using rejection sampling. Also however the tangent at
the point is evaluated and the envelope function is
updated to be a closer approximation to log f(x).
An illustration of AR sampling
WinBUGS 1.4 New Method
• In fact for the models we consider WinBUGS 1.4 no
longer uses AR sampling for fixed effects. 
• Instead it uses a method by Gamerman(1997). This is
essentially a Metropolis-Hastings algorithm where at
each iteration the MV normal proposal distribution is
formed by performing one iteration, starting at the
current point, of Iterative Weighted Least Squares
(IWLS). A routine that is similar to the IGLS estimation
method used in MLwiN for MQL/PQL estimates.
• To see the AR sampler in action you will need to use
WinBUGS 1.3 or go to blocking options under the
Options/Blocking options screen and remove the tick
box. (You can try this in the practical if you wish).
Reunion Island 2 level dataset
We will here consider the 2 level reunion dataset
as considered in MLwiN and the final model:
WinBUGS code for the model
model
{
# Level 1 definition
for(i in 1:N) {
fscr[i] ~ dbin(p[i],denom[i])
logit(p[i]) <- beta[1] * cons[i]
+ beta[2] * ai[i]
+ u2[herd[i]] * cons[i]
}
# Higher level definitions
for (j in 1:n2) {
u2[j] ~ dnorm(0,tau.u2)
}
# Priors for fixed effects
for (k in 1:2) { beta[k] ~ dflat() }
# Priors for random terms
tau.u2 ~dgamma(0.001000,0.001000)
sigma2.u2 <- 1/tau.u2
}
Here we see the code for a 2-level logistic
regression model. Note the use of dbin for the
binomial distribution and the logit link function.
The file also contains initial values from the 1st
order MQL run of this model and data values.
We are using Gamma(ε,ε) priors.
WinBUGS node info
On the info menu you can select node info.
This will inform you (in a funny language)
what method is being used for each node.
For this model we get:
beta[1] UpdaterGLM.LogitUpdater – means the Gamerman method.
u2[1] UpdaterRejection.Logit – means the AR method. (I think!)
tau.u2 UpdaterGamma.Updater – means conjugate Gibbs sampling.
WinBUGS chains
The model was run for
5,000 iterations after
a burnin of 500 which
took just over 2
minutes.
The chains all look
reasonable.
beta[1]
1.5
1.0
0.5
0.0
501
2000
4000
iteration
beta[2]
-0.5
-1.0
-1.5
-2.0
501
2000
4000
iteration
sigma2.u2
0.6
0.4
0.2
0.0
501
2000
4000
iteration
Point Estimates
Estimates are similar to
MLwiN:
Node
Mean
SD
Beta[1]
0.651
0.169
Beta[2]
-1.113
0.174
Sigma2.u2
0.091
0.056
MLwin estimates after 50,000 iterations
Reunion Island 3 level dataset
We will here consider the 3 level reunion dataset
as considered in MLwiN and the final model
which gave estimation difficulties:
WinBUGS chains
The model was run for 5000 iterations after a
burnin of 500 which took 10 minutes!
Although the intercept chain on the left looks good
and better than MH, the cow level variance has
clearly not converged as yet. So the WinBUGS
method does not fix the problem.
beta[1]
sigma2.u2
1.0
0.6
0.75
0.4
0.5
0.2
0.25
0.0
0.0
501
2000
4000
iteration
501
2000
4000
iteration
Between cows variance after
50k iterations using WinBUGS
Hierarchical centering formulation
This is a method for improving the mixing of an MCMC
method.
Consider the VC model
yij   0  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
This can be written equivalently as
yij   j  eij
 j ~ N (  0 ,  u2 ), eij ~ N (0,  e2 )
and the Gibbs sampling algorithm written using this
parameterisation. This may improve mixing if j is less
correlated with 0 than uj is. See Browne (2004) for
models where this works well.
Hierarchical centering in
WinBUGS
model
{
# Level 1 definition
for(i in 1:N) {
fscr[i] ~ dbin(p[i],denom[i])
logit(p[i]) <- beta[2] * ai[i]
+ u2[cow[i]]
}
# Higher level definitions
for (j in 1:n2) {
u2[j] ~ dnorm(u3[herd[j]],tau.u2)
}
for (j in 1:n3) {
u3[j] ~ dnorm(beta[1],tau.u3)
}
# Priors for fixed effects
for (k in 1:2) { beta[k] ~ dflat() }
# Priors for random terms
tau.u2 ~ dgamma(0.001000,0.001000)
sigma2.u2 <- 1/tau.u2
tau.u3 ~ dgamma(0.001000,0.001000)
sigma2.u3 <- 1/tau.u3
}
It is easy to modify the code in
WinBUGS to fit a hierarchically
centred model as shown to the
left. Note the main difficulty is for
a 3 level model the mapping
vector herd must now map from
cows to herd rather than
observations to herd.
This means changing the datafile.
Trace plot for hierarchical
centred formulation
Unfortunately this doesn’t improve things much!
The parameter expansion method also discussed in Browne (2004)
may work better here and may be worth examining.
Estimates comparison
In the following table we compare estimates after 50k (100k
for MLwiN) iterations from 3 methods and see
reasonable agreement.
Parameter
MLwiN
WinBUGS
H.C.
β0
0.563
(0.125)
0.560
(0.127)
0.570
(0.131)
β1
-1.014
(0.130)
-1.007
(0.130)
-1.021
(0.131)
σ2v
0.094
(0.043)
0.094
(0.043)
0.096
(0.044)
σ2u
0.229
(0.137)
0.202
(0.139)
0.231
(0.127)
Probit Regression
The logistic link function is only one possible link
function for Binomial data.
Any functions that will map from (0,1) to the whole
real line will do and another popular choice is
the probit link (the inverse of the normal cdf).
This link with the normal distribution can work to
our advantage and allow another way of using
Gibbs sampling for a binary data model. This is
through the use of latent variables.
Latent variable approach
One source of binary data is the
thresholding of a continuous response, for
example in education students often take
exams and rather than a mark we observe
whether the student passes or fails.
The latent variable approach works like the
reverse of this i.e. we see the binary
response and from this we generate the
underlying continuous variable.
Simple example
Consider the random effects probit regression model:
yij ~ Bernoulli( pij )
probit(yij )  X ij   u j , u j ~ N (0,  u2 )
This model is equivalent to the following Normal response
model:
y *ij  X ij   u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
where yij* is unobserved but is restricted to positive values
when y = 1 and negative values when y = 0.
Gibbs Sampling algorithm
We then have four steps for our latent variable model:
1. Generate yij* from its truncated Normal conditional
distribution for all i and j.
2. Generate  from its (multivariate) Normal conditional
distribution.
3. Generate uj from its Normal conditional distribution for
each j.
4. Generate 2u from its inverse Gamma conditional
distribution.
Probit model for 2 level reunion
island dataset (demo)
Using Gibbs in MLwiN:
Trace plots for only 5k iterations
Probit for 3 level reunion island
dataset
• Still not great but ESS upto 206!
Introduction to the Practical
In the next practical you will have the
chance to explore the possible methods
from this lecture on the two datasets from
the last practical:
• The Bangladesh contraceptive use
dataset.
• The pig pneumonia dataset.