What MCMC is about - S-GEM

Download Report

Transcript What MCMC is about - S-GEM

MCMC for
Stochastic Epidemic Models
Philip D. O’Neill
School of Mathematical Sciences
University of Nottingham
This includes joint work with…
Tom Britton (Stockholm)
 Niels Becker (ANU, Canberra)
 Gareth Roberts (Lancaster)
 Peter Marks (NHS, Derbyshire PCT)

Contents
1. MCMC: overview and basics
 2. Example: Vaccine efficacy
 3. Data augmentation
 4. Example: SIR epidemic model
 5. Model choice
 6. Example: Norovirus outbreak
 7. Other topics

1. Markov chain Monte Carlo (MCMC)
Overview and basics
The key problem is to explore a density
function π known up to proportionality.
 The output of an MCMC algorithm is a
sequence of samples from the correctly
normalised π.
 These samples can be used to estimate
summaries of π, e.g. its mean, variance.

How MCMC works
Key idea is to construct a discrete time
Markov chain X1, X2, X3, … on state space
S whose stationary distribution is π.
 If P(dy,dx) is the transitional kernel of the
chain this means that

 (dx )    (dy ) P ( y, dx )
S
How MCMC works (2)

Subject to some technical conditions,
Distribution of Xn → π as n →

Thus to obtain samples from π we simulate
the chain and sample from it after a “long
time”.
Example:
π (x)  x e-2x
Example:
X1, X2, …
π (x)  x e-2x
XN = 1.2662
Example:
π (x)  x e-2x
XN = 1.2662
XN+1 = 1.7840
Example:
π (x)  x e-2x
XN = 1.2662
XN+1 = 1.7840
XN+2 = 0. 6629
Example:
π (x)  x e-2x
Suppose Markov chain output is
..., XN = 1.2662, XN+1 = 1.7840,
XN+2 = 0.6629, …. ,XN+M = 1.0312
(i.e. discard initial N values, burn-in)

1
E( X ) 
M
NM
X
j N
j
 1.032
How to build the Markov chain
Surprisingly, there are many ways to
construct a Markov chain with stationary
distribution π.
 Perhaps the simplest is the MetropolisHastings algorithm.

Metropolis-Hastings algorithm



Set an initial value X1.
If the chain is currently at Xn = x, randomly
propose a new position Xn+1 = y according to
a proposal density q(y | x).
Accept the proposed jump with probability
  ( y ) q( x | y ) 
.
min 1,
  ( x) q ( y | x) 

If not accepted, Xn+1 = x.
Why the M-H algorithm works



Let P(dx,dy) denote the transition kernel of the
chain.
Then P(dx,dy) is approximately the probability
that the chain jumps from a region dx to a
region dy.
We can calculate P(dx,dy) as follows:
Why M-H works (2)
  (dy)q(dx | dy) 
P(dx, dy)  q(dy | dx) 
 1
  (dx)q(dy | dx) 
 (dy)

q(dx | dy)  q(dy | dx)
 (dx)
 (dx) P(dx, dy)   (dy)q(dx | dy)   (dx)q(dy | dx)
T hus  (dx) P(dx, dy)   (dy) P(dy, dx)
Why M-H works (3)
 (dx) P(dx, dy)   (dy) P(dy, dx)

 (dx ) P(dx, dy ) 
dyS

 (dy ) P(dy , dx )

 (dy ) P(dy , dx )
dyS
 (dx ) 
dyS
This last equation shows that π is a
stationary distribution for the Markov chain.
Comments on M-H algorithm (1)
The choice of proposal q(y|x) is fairly
arbitrary.
 Popular choices include
q(y|x) = q(y) (Independence sampler)
q(y|x) ~ N(x, 2) (Gaussian proposal)
q(y|x) = q(|y-x|) (Symmetric proposal)

Comments on M-H (2)

In practice, MCMC is almost always used
for multi-dimensional problems. Given a
target density π(x1, x2, … ,xn) it is possible
to update each component separately, or
even in blocks, using different M-H
schemes.
Comments on M-H (3)
A popular multi-dimensional scheme is the
Gibbs sampler, in which the proposal for a
component xi is its full conditional density
π (xi | (x1,…,xi-1, xi+1, … ,xn) )
 The M-H acceptance probability is equal to
one in this case.

General comments on MCMC
How to check convergence? There is no
guaranteed way.
Visual inspection of trace plots; diagnostic
tools (e.g. looking at autocorrelation).
 Starting values – try a range
 Acceptance rates – not too large/small
 Mixing – how fast does the chain move
around?

Contents
1. MCMC: overview and basics
 2. Example: Vaccine efficacy
 3. Data augmentation
 4. Example: SIR epidemic model
 5. Model choice
 6. Example: Norovirus outbreak
 7. Other topics

2. Example: Vaccine Efficacy
Outbreak of Variola Minor, Brazil 1956
 Data on cases in households (size 1 to 12)
 338 households: 126 had no cases
 1542 individuals:
809 vaccinated, 85 cases
733 unvaccinated, 425 cases

Objective: estimate vaccine efficacy
Disease transmission model




Population divided into separate households.
Divide transmission into community-acquired
and within-household.
q = P( individual avoids outside infection )
 = P ( one individual fails to infect another
in the same household )
Disease transmission model
q

Vaccine response model

For a vaccinated individual, three responses can
occur: complete protection; vaccine failure; or
partial protection and infectivity reduction.

c = P(complete protection)
f = P(vaccine failure)
a = proportionate susceptibility reduction
b = proportionate infectivity reduction



Vaccine response : (A,B)

A convenient way of summarising the
random response is to suppose that an
individual’s susceptibility and infectivity
reduction is given by a bivariate random
variable (A,B). Thus
P[ (A,B) = (0, -)] = c
P[ (A,B) = (1,1) ] = f
P[ (A,B) = (a,b) ] = 1-c-f
Efficacy Measures




Furthermore it is sensible to define measures of
vaccine efficacy using (A,B).
VES = 1- E[A]
= 1 - f - a(1-f-c) is a protective measure
VEI = 1 - E[AB] / E[A]
= 1 - [f + ab(1-f-c)] / [f + a(1-f-c)]
is a measure of infectivity reduction
Note both are functions of basic model
parameters
Bayesian inference
Object of inference is the posterior density
(  | n ) = ( a,b,c,f,q, | n )
where n is the data set. By Bayes’ Theorem

(| n )  (n | )  ( ),
where
and
(| n ) is the likelihood,
 ()
is the prior density for .
MCMC details
There are six parameters: a,b,c,f,q,
 Each parameter has range [0,1]
 Update each parameter separately using a
Metropolis-Hastings step with Gaussian
proposal centered on the current value

MCMC pseudocode
Initialise parameters (e.g. a = 0.5, b=0.5,…)
 User input burn-in (B), sample size (S),
thinning gap (T)

LOOP: counter from –B to (S x T)
Update a, update b, …, update 
IF (counter > 0) AND (counter/T is integer)
THEN store current values
 END LOOP

Updating details for a


Propose ã~ N(a, 2)
Accept with probability
~
~
 (n | a , b, c, f , q,  ) (a )
1
 (n | a, b, c, f , q,  ) (a)


Note that the (symmetric) proposal cancels out
The other parameters are updated similarly
Trace plot for a
Density estimate for a
Scatterplot of a versus c
Results for VES
Posterior mean: VES = 1 – E(A) = 0.84
Posterior Standard Deviation = 0.03
These results are easily obtained using the raw
Markov chain output for the model parameters.
Contents
1. MCMC: overview and basics
 2. Example: Vaccine efficacy
 3. Data augmentation
 4. Example: SIR epidemic model
 5. Model choice
 6. Example: Norovirus outbreak
 7. Other topics

4. Data augmentation
Suppose we have a model with unknown
parameter vector  = (1,2,…,n).
 Available data are y = ( y1, y2,…, ym ).
 If the likelihood π (y |  ) is intractable…
 …one solution is to introduce extra
parameters (“missing data”)
x = (x1, x2,…, xp)
such that π (y , x |  ) is tractable.

Data augmentation (2)
The extra parameters x = (x1, x2,…, xp) are
simply treated as unknown model
parameters as before.
 To obtain samples from π ( y |  ), take
samples from π (y , x |  ) and ignore x.
 Such a scheme is often easy using MCMC.

Data augmentation (3)
Can also add parameters to improve the
mixing of the Markov chain (auxiliary
variables).
 Choosing how to augment data is not
always obvious!

Contents
1. MCMC: overview and basics
 2. Example: Vaccine efficacy
 3. Data augmentation
 4. Example: SIR epidemic model
 5. Model choice
 6. Example: Norovirus outbreak
 7. Other topics

4. SIR Epidemic Model
Suppose we observe daily numbers of
cases during an epidemic outbreak in
some fixed population.
 Objective is to say something about
infection rates and infectious period
duration of the disease.

Epidemic curve
(SARS in Canada)
Model definition
Population of N individuals
 At time t there are:
St susceptibles
It infectives
Rt recovered/immune individuals
Thus St + It + Rt = N for all t.
Initially (S0, I0 ,R0 ) = (N-1,1,0).

Model definition (2)
Each infectious individual remains so for a
length of time TI ~ Exponential().
 During this time, infectious contacts occur
with each susceptible according to a
Poisson process of rate /N.
 Thus overall infection rate is  St It/N.
 Two model parameters,  and .

Data, likelihood, augmentation
Suppose we observe removals at times
0 ≤ r1 ≤ r2 ≤ … ≤ rn ≤ .
 Define r = ( r1, r2 , …, rn ).
 The likelihood of the data, π (r | , ), is
practically intractable.
 However, given the (unknown) infection
times i = ( i1, i2 , …, in ), π (i ,r | , ) is
tractable.

MCMC algorithm

Specifically,
 n
 n

 
 (i, r | β , γ)    βSijI ij   γI rj  exp   ( βStI t  γIt )dt
 0
 j 2
 j 1

It follows that if π() ~ Gamma distribution
then π( | …) ~ Gamma distribution also.
 Same is true for .
 So can update  and  using a Gibbs step.





MCMC algorithm – infection times
It remains to update the infection times
i = ( i1, i2 , …, in )
 Various ways of doing this.
 A simple way is to use a M-H scheme to
randomly move the times.
 For example, propose a new ik by picking a
new time uniformly at random in (0,).

Updating infection times
Updating I2 :
I6
I6
I2
I4
I4
I2*
Acceptance prob. π (i*,r | ,) / π (i,r | ,)
Extensions
Epidemic not known to be finished by 
 Non-exponential infectious periods
 Multi-group models (e.g. age-stratified
data)
 More sophisticated updates of infection
times
 Inclusion of latent periods

Contents
1. MCMC: overview and basics
 2. Example: Vaccine efficacy
 3. Data augmentation
 4. Example: SIR epidemic model
 5. Model choice
 6. Example: Norovirus outbreak
 7. Other topics

5. Model Choice
Bayesian model choice problems can also
be implemented using MCMC.
 So-called “transdimensional MCMC” (alias
“Reversible Jump MCMC”) is used.
 The basic idea is to construct the Markov
chain on the union of the different sample
spaces and (essentially) use M-H.

Simple example
Model 1 has two parameters: , 
 Model 2 has one parameter: 
 The Markov chain moves between models
and within models
 E.g. Xn = (1, , , ) for model 1, ignore 
 Practical question – how to jump between
models?

Contents
1. MCMC: overview and basics
 2. Example: Vaccine efficacy
 3. Data augmentation
 4. Example: SIR epidemic model
 5. Model choice
 6. Example: Norovirus outbreak
 7. Other topics

6. Example: Norovirus outbreak
Outbreak of gastroenteritis in summer 2001
at school in Derbyshire, England.
A single strain of Norovirus virus found to be
the causative agent.
Believed to be person-to-person spread
Outbreak data
15 classrooms, each child based in one.
 Absence records plus questionnaires.
 492 children of whom 186 were cases.
 Data include age, period of illness, times
of vomiting episodes in classrooms.

Question of interest

Does vomiting play a significant role in
transmission?

Total of 15 vomiting episodes in
classrooms.
Epidemic curve in Classroom 10
Number ill for first day
12
10
8
6
4
2
0
1
3
5
7
9
11
Day of outbreak
13
15
17
Stochastic transmission model






Assumption:
A susceptible on weekday t remains so on day
t+1 if they avoid infection from each infective
child;
per-infective daily avoidance probabilities are
classmate : qc
schoolmate: qs
in class, vomiters : qv
Transmission model (2)

At weekends, a susceptible remains so by
avoiding infection from all infectives in the
community,

per-infective avoidance probability is q.
Two models
M1 : Full model: qc, qv, qs, q
 Vomiters treated separately

M2 : Sub-model: qc, qv=qc, qs, q
 Vomiters classed as normal infectives

MCMC algorithm
Construct Markov chain on state space
S = { ( qc, qs, qv, q, M) }
where M = 1 or 2 is the current model

Model-switching step to update M
 Random walk updates for the q’s

Between-model jumps

Full model  sub-model:
Propose qv = qc

Sub-model  full model:
Propose qv = qc + N(0, 2 )

Acceptance probabilities straightforward
Results

Uniform(0,1) q. priors;
Mean
qc
0.997
qs
0.998
S. dev
0.0014 0.00018
P(M1) = 0.5
qv
0.936
q
0.999
0.018
0.000066
P(M1 | data) = 0.0001 (Full model)
 P(M2 | data) = 0.999 (Sub model)

Contents
1. MCMC: overview and basics
 2. Example: Vaccine efficacy
 3. Data augmentation
 4. Example: SIR epidemic model
 5. Model choice
 6. Example: Norovirus outbreak
 7. Other topics

7. Other topics
1. Improving algorithm performance
 2. Perfect simulation
 3. Some conclusions

Improving algorithm performance
Choose parameters to reduce correlation
 Trade-off between ease of computation
and mixing behaviour of chain
 Choice of M-H proposal distributions
 Choice of blocking schemes

Perfect simulation
Detecting convergence can be a real
problem in practice.
 Perfect simulation is a method for
constructing a chain that is known to have
converged by a certain time.
 Unfortunately it is far less applicable than
MCMC.

Some conclusions
MCMC methods are hugely powerful
 The methods enable analysis of very
complicated models
 Sample-based methods easily permit
exploration of both parameters and
functions of parameters
 Implementation is often relatively easy
 Software available (e.g. BUGS)
