Transcript Document

Some Developments of ABC

David Balding John Molitor David Welch Imperial College London

Canonical ABC (rejection) Algorithm

• Simulate parameter value θ* from prior • Simulate dataset X* under the model with θ = θ* • If X* is sufficiently close to observed data x then retain θ*, otherwise discard.

Iterate as required.

Retained θ* values form an approximate random sample from the posterior P(θ|X=x).

θ values of these points are treated as random sample from posterior

Features of canonical ABC

• We get approximate likelihood inferences without calculating the likelihood.

• Closeness of X* to x is usually assessed via summary statistics, but this isn’t essential.

• Simulation from prior also isn’t essential.

• Different definitions of “close” can be tried for the same set of simulations.

• Simulations can be reused for different observed data.

When to use ABC ?

When likelihood is hard to compute because of need for integration over many nuisance parameters BUT easy to simulate –

Population genetics

: nuisance parameters are the branching times and topology of the genealogical tree underlying the observed DNA sequences/genes.

Epidemic models

: nuisance parameters are infection times and infectious periods.

Should be regarded as a method of last resort! But it can give reasonable answers when nothing else works.

Population genetics example

Parameters:

N = effective population size; μ = mutation rate per generation; G = genealogical tree (topology + branch lengths)

Summary Statistics:

S 1 number of distinct alleles/sequences S 2 number of polymorphic/segregating sites

Algorithm:

1. simulate N and μ from joint prior 2. simulate G from the standard coalescent model 3. simulate mutations on G, generate dataset X* and calculate S* 4. accept (N, μ,G) if S* ≈ S Generates a sample from approx. joint posterior of (N,μ,G).

Model comparison via ABC

Can use ABC for model comparison, as well as for parameter estimation within models. Ratio of acceptances:  

M

1 (

S

*

M

2 (

S

*  

S

)

S

) approximates the Bayes Factor.

Better: fit (weighted) multinomial regression to predict model from observed data. Beaumont (2006) used this to infer the topology of a tree representing the history of 3 Californian fox populations.

Problems/limitations

• Rejection-ABC is inefficient: most X* are far from x and must be rejected.

– No learning.

• How to find/assess good summary statistics?

– Adding summary statistics can make matters worse (see later) • How to choose metric for (high-dimensional) S

ABC as a regression problem

Accepted (θ*,X*) can be used to approximate the joint density of parameter and data, ( θ,X).

Goal is to obtain a “slice” at the observed data.

Few simulated points will lie on this slice so need to assume

smoothness

: required posterior is approximately the same for datasets “close” to that observed.

High dimensional densities are hard to estimate; instead obtain regression point estimate of θ given X = x.

Summary statistic,

S

Posterior density –

p

( F Marginal likelihood –

p

(

S

)

| S

) Likelihood –

p

(

S |

F )

Beaumont, Zhang, and DJB Approximate Bayesian Computation in Population Genetics. Genetics 162 : 2025-2035, 2002 Use local-linear regression to adjust for the distance between observed and simulated datasets.

Use a smooth (Epanechnikov) weighting according to distance.

Can now weaken the “close” criterion (i.e. increase the tolerance) and utilize many more points.

Parameter 1 0 Summary Statistic Weight

1 0

Estimation of scaled mutation rate

q

= 2N

m

Full data:-

• 445 Y chromosomes each typed at 8 microsatellite loci i.e. 3560 numbers

Summary statistics:-

• mean variance in length • mean heterozygosity • number of haplotypes i.e. 3 numbers Standard Rejection MCMC With regression adjustment Tolerance

Population growth

Population constant size N A until t generations ago, then exponentially rate r per gen. growth to N C . 4 model params, but only 3 identifiable. We choose: Data same as above, except smaller sample size n = 200 (because of time taken for MCMC to converge).

ABC applications in population genetics:

Standard rejection method :

Estoup

et al.

(2002,

Genetics

)– Demographic history of invasion of islands by cane toads. 10 microsatellite loci, 22 allozyme loci. 4/3 summary statistics, 6 demographic parameters.

Estoup and Clegg (2003,

Molecular Ecology

) – Demographic history of colonisation of islands by silvereyes.

With regression adjustment :

Tallmon et al (2004,

Genetics

) – Estimating effective population size by temporal method. One main parameter of interest (Ne), 4 summary statistics.

Estoup et al. (2004,

Evolution

) – Demographic history of invasion of Australia by cane toads. 75/63 summary statistics, model comparison, up to 5 demographic parameters.

More sophisticated regressions?

Although global linear regression usually gives a poor fit to joint θ/S density, Calabrese (USC, unpublished) uses projection pursuit regression: to fit a large “feature set” of summary statistics. Iterate to improve fit within vicinity of S. Application to estimate human recombination hotspots.

Could also consider quantile regression to adapt adjustment to different parts of the distribution.

Do ABC within MCMC

Marjoram

et al.

(2003). Two accept/reject steps: 1. Simulate a dataset at the current parameter values; if it isn’t close to observed data, start again.

2. If it is close, accept or reject according to prior ratio times Hastings ratio (no likelihood ratio) Note: now “close” must be defined in advance; also cannot reuse simulations for different observed datasets. Can apply regression adjustment to MCMC outputs.

Problems:

1. proposals in tree space 2. few acceptances in tail of target distribution - stickiness

Importance sampling within MCMC

In fact, the Marjoram

et al.

MCMC approach can be viewed as a special case of a more general approach developed by Beaumont (2003).

Instead of simulating a new dataset forward-in-time, Beaumont used a backward-in-time IS approach to approximate the likelihood.

His proof of the validity of the algorithm is readily extended to forwards-in-time approaches based on one or multiple datasets (cf O’Neill

et al.

2000). Could also use a regression adjustment.

ABC within Sequential MC Sisson et al UNSW, Sydney

Sample initial generation of θ “particles” from prior.

Sample θ from previous generation, propose (small) update and generate dataset X*; calculate S*.

Repeat until S* ≈ S –

BUT tolerance reduces each gen.

Calculate prior ratio times Hastings ratio: use as weight W for sampling the next generation.

If variance of W is large, resample with replacement according to W and set all W=1/N.

Application to estimate parameters of TB infection.

Adaptive ABC algorithm: Jean-Marie Cornuet INRA Montpellier – visiting IC

• Standard ABC simulation is always from prior – no learning • Any simulation density can be used instead of prior – need to weight simulated θ* values by the ratio of prior density to actual simulation density • Adaptive approach – divide simulation run into batches; in each batch simulate θ* values from the posterior density obtained from the previous batch.

Wiz-Bang adaptive algorithm John Molitor

1. Generate a batch of B pairs (θ*,X*), simulating θ* from prior. Evaluate a “distance” of each X* from x, say d(X*,x).

2. Obtain a “pseudo-posterior” density estimate using θ* values weighted by the corresponding d(X*,x) values For multivariate parameter vector, update one component at a time, then calculate d(X*,x).

Wiz-Bang algorithm (continued)

3. Continue to simulate θ* but now from density obtained at step 2. Weight each θ* by ratio of prior to simulation density. 4. Continue to update simulation density by retaining the “best” B particles, i.e. those minimising d(X*,x).

5. Continue until convergence is reached: indicated by few updates of “best” particles.

Example – Gaussian Mixture known number of components, K common variance

n = 200 K = 3 σ = 0.2

) 

k K

  1

p k

k

2  p = (1/3,1/3,1/3) μ = (-1/2,0,1/2)

Data distance

• Data x consist of real-valued variables • Define d(X*,x) to be sum of distances between order statistics of X* and x: 

i

| *

X

(

i

) 

x

(

i

) |

Results - SD

Variable Dimension Mixture Model

• Now K unknown: update from previous values • Examine posterior output (means, sd, etc.) when K=1,K=2, etc.

• In example below value of K is again 3 but this time not specified in inference:

Variable Dimension Mixture Results

Variable Dimension Mixture Results

Comparison of methods

Which is better, adaptive or sequential? For which problems might we prefer one over the other?

Are both of these always better than - canonical rejection ABC ?

- rejection ABC with regression adjustment David Welch has written some R code to compare methods in a simple mixtures problem:

Sequential MC

Tolerance for d(X*,x) is updated in 7 stages: • "effective sample size for population 2 is 25.15" • "effective sample size for population 3 is 23.14" • "effective sample size for population 4 is 24.07" • "effective sample size for population 5 is 22.69" • "effective sample size for population 6 is 25.63" • "effective sample size for population 7 is 23.29“ Total number of data generation steps: 103,132

Adaptive algorithm

• "iteration 2: keeping 0.1 of old samples" "effective sample size is 215.61“ "largest distance is 239.71“ • "iteration 5: keeping 0.369 of old samples "effective sample size is 33.30" "largest distance is 112.61" • "iteration 7: keeping 0.633 of old samples "effective sample size is 15.96" "largest distance is 98.97" "total number of data generation steps 70,000"

Generation steps required:

rejection ABC: 306K final sample size 15 Adaptive IS ABC: 70K Sequential ABC 103K final ESS = 23 final ESS = 15

Overall comparisons:

Adaptive and sequential approaches perform similarly for this problem, both are superior to standard ABC.

Both are easy to code, but sequential requires more setting up (update proposals, schedule for decreasing tolerance).

ABC to “rescue” poor estimators (inspired by DJ Wilson, Lancaster)

• evaluate estimator based on simplistic model at many datasets simulated under more sophisticated model.

• for observed dataset, use as estimator regression predictor of simplistic estimator at the observed data value.

• for example, many population genetics estimators assume no recombination, and infinite sites mutation model – use this estimator and simulations to correct for recombination and finite-sites mutation

Acknowledgments

David Welch is funded by an EPSRC grant to further develop ABC ideas and apply particularly in population genomics, in collaboration with Mark Beaumont and Joao Lopes at Reading.