What MCMC is about - S-GEM
Download
Report
Transcript What MCMC is about - S-GEM
MCMC for
Stochastic Epidemic Models
Philip D. O’Neill
School of Mathematical Sciences
University of Nottingham
This includes joint work with…
Tom Britton (Stockholm)
Niels Becker (ANU, Canberra)
Gareth Roberts (Lancaster)
Peter Marks (NHS, Derbyshire PCT)
Contents
1. MCMC: overview and basics
2. Example: Vaccine efficacy
3. Data augmentation
4. Example: SIR epidemic model
5. Model choice
6. Example: Norovirus outbreak
7. Other topics
1. Markov chain Monte Carlo (MCMC)
Overview and basics
The key problem is to explore a density
function π known up to proportionality.
The output of an MCMC algorithm is a
sequence of samples from the correctly
normalised π.
These samples can be used to estimate
summaries of π, e.g. its mean, variance.
How MCMC works
Key idea is to construct a discrete time
Markov chain X1, X2, X3, … on state space
S whose stationary distribution is π.
If P(dy,dx) is the transitional kernel of the
chain this means that
(dx ) (dy ) P ( y, dx )
S
How MCMC works (2)
Subject to some technical conditions,
Distribution of Xn → π as n →
Thus to obtain samples from π we simulate
the chain and sample from it after a “long
time”.
Example:
π (x) x e-2x
Example:
X1, X2, …
π (x) x e-2x
XN = 1.2662
Example:
π (x) x e-2x
XN = 1.2662
XN+1 = 1.7840
Example:
π (x) x e-2x
XN = 1.2662
XN+1 = 1.7840
XN+2 = 0. 6629
Example:
π (x) x e-2x
Suppose Markov chain output is
..., XN = 1.2662, XN+1 = 1.7840,
XN+2 = 0.6629, …. ,XN+M = 1.0312
(i.e. discard initial N values, burn-in)
1
E( X )
M
NM
X
j N
j
1.032
How to build the Markov chain
Surprisingly, there are many ways to
construct a Markov chain with stationary
distribution π.
Perhaps the simplest is the MetropolisHastings algorithm.
Metropolis-Hastings algorithm
Set an initial value X1.
If the chain is currently at Xn = x, randomly
propose a new position Xn+1 = y according to
a proposal density q(y | x).
Accept the proposed jump with probability
( y ) q( x | y )
.
min 1,
( x) q ( y | x)
If not accepted, Xn+1 = x.
Why the M-H algorithm works
Let P(dx,dy) denote the transition kernel of the
chain.
Then P(dx,dy) is approximately the probability
that the chain jumps from a region dx to a
region dy.
We can calculate P(dx,dy) as follows:
Why M-H works (2)
(dy)q(dx | dy)
P(dx, dy) q(dy | dx)
1
(dx)q(dy | dx)
(dy)
q(dx | dy) q(dy | dx)
(dx)
(dx) P(dx, dy) (dy)q(dx | dy) (dx)q(dy | dx)
T hus (dx) P(dx, dy) (dy) P(dy, dx)
Why M-H works (3)
(dx) P(dx, dy) (dy) P(dy, dx)
(dx ) P(dx, dy )
dyS
(dy ) P(dy , dx )
(dy ) P(dy , dx )
dyS
(dx )
dyS
This last equation shows that π is a
stationary distribution for the Markov chain.
Comments on M-H algorithm (1)
The choice of proposal q(y|x) is fairly
arbitrary.
Popular choices include
q(y|x) = q(y) (Independence sampler)
q(y|x) ~ N(x, 2) (Gaussian proposal)
q(y|x) = q(|y-x|) (Symmetric proposal)
Comments on M-H (2)
In practice, MCMC is almost always used
for multi-dimensional problems. Given a
target density π(x1, x2, … ,xn) it is possible
to update each component separately, or
even in blocks, using different M-H
schemes.
Comments on M-H (3)
A popular multi-dimensional scheme is the
Gibbs sampler, in which the proposal for a
component xi is its full conditional density
π (xi | (x1,…,xi-1, xi+1, … ,xn) )
The M-H acceptance probability is equal to
one in this case.
General comments on MCMC
How to check convergence? There is no
guaranteed way.
Visual inspection of trace plots; diagnostic
tools (e.g. looking at autocorrelation).
Starting values – try a range
Acceptance rates – not too large/small
Mixing – how fast does the chain move
around?
Contents
1. MCMC: overview and basics
2. Example: Vaccine efficacy
3. Data augmentation
4. Example: SIR epidemic model
5. Model choice
6. Example: Norovirus outbreak
7. Other topics
2. Example: Vaccine Efficacy
Outbreak of Variola Minor, Brazil 1956
Data on cases in households (size 1 to 12)
338 households: 126 had no cases
1542 individuals:
809 vaccinated, 85 cases
733 unvaccinated, 425 cases
Objective: estimate vaccine efficacy
Disease transmission model
Population divided into separate households.
Divide transmission into community-acquired
and within-household.
q = P( individual avoids outside infection )
= P ( one individual fails to infect another
in the same household )
Disease transmission model
q
Vaccine response model
For a vaccinated individual, three responses can
occur: complete protection; vaccine failure; or
partial protection and infectivity reduction.
c = P(complete protection)
f = P(vaccine failure)
a = proportionate susceptibility reduction
b = proportionate infectivity reduction
Vaccine response : (A,B)
A convenient way of summarising the
random response is to suppose that an
individual’s susceptibility and infectivity
reduction is given by a bivariate random
variable (A,B). Thus
P[ (A,B) = (0, -)] = c
P[ (A,B) = (1,1) ] = f
P[ (A,B) = (a,b) ] = 1-c-f
Efficacy Measures
Furthermore it is sensible to define measures of
vaccine efficacy using (A,B).
VES = 1- E[A]
= 1 - f - a(1-f-c) is a protective measure
VEI = 1 - E[AB] / E[A]
= 1 - [f + ab(1-f-c)] / [f + a(1-f-c)]
is a measure of infectivity reduction
Note both are functions of basic model
parameters
Bayesian inference
Object of inference is the posterior density
( | n ) = ( a,b,c,f,q, | n )
where n is the data set. By Bayes’ Theorem
(| n ) (n | ) ( ),
where
and
(| n ) is the likelihood,
()
is the prior density for .
MCMC details
There are six parameters: a,b,c,f,q,
Each parameter has range [0,1]
Update each parameter separately using a
Metropolis-Hastings step with Gaussian
proposal centered on the current value
MCMC pseudocode
Initialise parameters (e.g. a = 0.5, b=0.5,…)
User input burn-in (B), sample size (S),
thinning gap (T)
LOOP: counter from –B to (S x T)
Update a, update b, …, update
IF (counter > 0) AND (counter/T is integer)
THEN store current values
END LOOP
Updating details for a
Propose ã~ N(a, 2)
Accept with probability
~
~
(n | a , b, c, f , q, ) (a )
1
(n | a, b, c, f , q, ) (a)
Note that the (symmetric) proposal cancels out
The other parameters are updated similarly
Trace plot for a
Density estimate for a
Scatterplot of a versus c
Results for VES
Posterior mean: VES = 1 – E(A) = 0.84
Posterior Standard Deviation = 0.03
These results are easily obtained using the raw
Markov chain output for the model parameters.
Contents
1. MCMC: overview and basics
2. Example: Vaccine efficacy
3. Data augmentation
4. Example: SIR epidemic model
5. Model choice
6. Example: Norovirus outbreak
7. Other topics
4. Data augmentation
Suppose we have a model with unknown
parameter vector = (1,2,…,n).
Available data are y = ( y1, y2,…, ym ).
If the likelihood π (y | ) is intractable…
…one solution is to introduce extra
parameters (“missing data”)
x = (x1, x2,…, xp)
such that π (y , x | ) is tractable.
Data augmentation (2)
The extra parameters x = (x1, x2,…, xp) are
simply treated as unknown model
parameters as before.
To obtain samples from π ( y | ), take
samples from π (y , x | ) and ignore x.
Such a scheme is often easy using MCMC.
Data augmentation (3)
Can also add parameters to improve the
mixing of the Markov chain (auxiliary
variables).
Choosing how to augment data is not
always obvious!
Contents
1. MCMC: overview and basics
2. Example: Vaccine efficacy
3. Data augmentation
4. Example: SIR epidemic model
5. Model choice
6. Example: Norovirus outbreak
7. Other topics
4. SIR Epidemic Model
Suppose we observe daily numbers of
cases during an epidemic outbreak in
some fixed population.
Objective is to say something about
infection rates and infectious period
duration of the disease.
Epidemic curve
(SARS in Canada)
Model definition
Population of N individuals
At time t there are:
St susceptibles
It infectives
Rt recovered/immune individuals
Thus St + It + Rt = N for all t.
Initially (S0, I0 ,R0 ) = (N-1,1,0).
Model definition (2)
Each infectious individual remains so for a
length of time TI ~ Exponential().
During this time, infectious contacts occur
with each susceptible according to a
Poisson process of rate /N.
Thus overall infection rate is St It/N.
Two model parameters, and .
Data, likelihood, augmentation
Suppose we observe removals at times
0 ≤ r1 ≤ r2 ≤ … ≤ rn ≤ .
Define r = ( r1, r2 , …, rn ).
The likelihood of the data, π (r | , ), is
practically intractable.
However, given the (unknown) infection
times i = ( i1, i2 , …, in ), π (i ,r | , ) is
tractable.
MCMC algorithm
Specifically,
n
n
(i, r | β , γ) βSijI ij γI rj exp ( βStI t γIt )dt
0
j 2
j 1
It follows that if π() ~ Gamma distribution
then π( | …) ~ Gamma distribution also.
Same is true for .
So can update and using a Gibbs step.
MCMC algorithm – infection times
It remains to update the infection times
i = ( i1, i2 , …, in )
Various ways of doing this.
A simple way is to use a M-H scheme to
randomly move the times.
For example, propose a new ik by picking a
new time uniformly at random in (0,).
Updating infection times
Updating I2 :
I6
I6
I2
I4
I4
I2*
Acceptance prob. π (i*,r | ,) / π (i,r | ,)
Extensions
Epidemic not known to be finished by
Non-exponential infectious periods
Multi-group models (e.g. age-stratified
data)
More sophisticated updates of infection
times
Inclusion of latent periods
Contents
1. MCMC: overview and basics
2. Example: Vaccine efficacy
3. Data augmentation
4. Example: SIR epidemic model
5. Model choice
6. Example: Norovirus outbreak
7. Other topics
5. Model Choice
Bayesian model choice problems can also
be implemented using MCMC.
So-called “transdimensional MCMC” (alias
“Reversible Jump MCMC”) is used.
The basic idea is to construct the Markov
chain on the union of the different sample
spaces and (essentially) use M-H.
Simple example
Model 1 has two parameters: ,
Model 2 has one parameter:
The Markov chain moves between models
and within models
E.g. Xn = (1, , , ) for model 1, ignore
Practical question – how to jump between
models?
Contents
1. MCMC: overview and basics
2. Example: Vaccine efficacy
3. Data augmentation
4. Example: SIR epidemic model
5. Model choice
6. Example: Norovirus outbreak
7. Other topics
6. Example: Norovirus outbreak
Outbreak of gastroenteritis in summer 2001
at school in Derbyshire, England.
A single strain of Norovirus virus found to be
the causative agent.
Believed to be person-to-person spread
Outbreak data
15 classrooms, each child based in one.
Absence records plus questionnaires.
492 children of whom 186 were cases.
Data include age, period of illness, times
of vomiting episodes in classrooms.
Question of interest
Does vomiting play a significant role in
transmission?
Total of 15 vomiting episodes in
classrooms.
Epidemic curve in Classroom 10
Number ill for first day
12
10
8
6
4
2
0
1
3
5
7
9
11
Day of outbreak
13
15
17
Stochastic transmission model
Assumption:
A susceptible on weekday t remains so on day
t+1 if they avoid infection from each infective
child;
per-infective daily avoidance probabilities are
classmate : qc
schoolmate: qs
in class, vomiters : qv
Transmission model (2)
At weekends, a susceptible remains so by
avoiding infection from all infectives in the
community,
per-infective avoidance probability is q.
Two models
M1 : Full model: qc, qv, qs, q
Vomiters treated separately
M2 : Sub-model: qc, qv=qc, qs, q
Vomiters classed as normal infectives
MCMC algorithm
Construct Markov chain on state space
S = { ( qc, qs, qv, q, M) }
where M = 1 or 2 is the current model
Model-switching step to update M
Random walk updates for the q’s
Between-model jumps
Full model sub-model:
Propose qv = qc
Sub-model full model:
Propose qv = qc + N(0, 2 )
Acceptance probabilities straightforward
Results
Uniform(0,1) q. priors;
Mean
qc
0.997
qs
0.998
S. dev
0.0014 0.00018
P(M1) = 0.5
qv
0.936
q
0.999
0.018
0.000066
P(M1 | data) = 0.0001 (Full model)
P(M2 | data) = 0.999 (Sub model)
Contents
1. MCMC: overview and basics
2. Example: Vaccine efficacy
3. Data augmentation
4. Example: SIR epidemic model
5. Model choice
6. Example: Norovirus outbreak
7. Other topics
7. Other topics
1. Improving algorithm performance
2. Perfect simulation
3. Some conclusions
Improving algorithm performance
Choose parameters to reduce correlation
Trade-off between ease of computation
and mixing behaviour of chain
Choice of M-H proposal distributions
Choice of blocking schemes
Perfect simulation
Detecting convergence can be a real
problem in practice.
Perfect simulation is a method for
constructing a chain that is known to have
converged by a certain time.
Unfortunately it is far less applicable than
MCMC.
Some conclusions
MCMC methods are hugely powerful
The methods enable analysis of very
complicated models
Sample-based methods easily permit
exploration of both parameters and
functions of parameters
Implementation is often relatively easy
Software available (e.g. BUGS)