Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane [email protected] Joint work with Rob Reeves CRICOS No.
Download ReportTranscript Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane [email protected] Joint work with Rob Reeves CRICOS No.
Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane [email protected] Joint work with Rob Reeves CRICOS No. 000213J Queensland University of Technology Outline 1. 2. 3. 4. 5. 6. 7. Some problems with intractable likelihoods. Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. Approximating Hierarchical model Indirect Inference and likelihood free MCMC Conclusions. a university for the real world R CRICOS No. 000213J Stochastic models (Riley et al, 2003) Macroparasite within a host. Juvenile worm grows to adulthood in a cat. Host fights back with immunity. Number of Juveniles, Adults and amount of Immunity (all integer). J (t ), A(t ), I (t ) evolve through time according to Markov process unknown parameters, eg Juvenile → Adult rate of maturation Immunity changes with time Juveniles die due to Immunity Moment closure approximations for distribution of ( J (t ), A(t ), I (t )) limited to restricted parameter values. a university for the real world R CRICOS No. 000213J Numerical computation of pr ( J , A, I ) limited by small maximum values of J, A, I. Can simulate ( J (t ), A(t ), I (t )) process easily. Data: J at t=0 and A at t (sacrifice of cat), replicated with several cats Source: Riley et al, 2003. a university for the real world R CRICOS No. 000213J Other stochastic process models include spatial stochastic expansion of species (Hamilton et al, 2005; Estoup et al, 2004) birth-death-mutation process for estimating transmission rate from TB genotyping (Tanaka et al, 2006) population genetic models, eg coalescent models (Marjoram et al 2003) Likelihood free Bayesian MCMC methods are often employed with quite precise priors. a university for the real world R CRICOS No. 000213J Normalizing constant/partition function problem. The algebraic form of the distribution for y is known but it is not normalized, eg Ising model p( y | ) exp 0 yi 1 Ind ( yi y j ) i~ j i For yi {1,1} i 1, , n and i ~ j means neighbours (on a lattice, say). The normalizing constant involves in general a sum over 2n terms. Write p ( y | ) a university for the f ( y; ) known z ( ) unknown real world R CRICOS No. 000213J N-S and E-W neighbourhood a university for the real world R CRICOS No. 000213J Outline 1. 2. 3. 4. 5. 6. 7. Some problems with intractable likelihoods. Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. Approximating Hierarchical model Indirect Inference and likelihood free MCMC Conclusions. a university for the real world R CRICOS No. 000213J Monte Carlo methods and Inference. Intractable likelihood, instead use easily simulated values of y. Simulated method of moments (McFadden, 1989). Method of estimation: comparing theoretical moments or frequencies with observed moments or frequencies. Can be implemented using a chi-squared goodness-fit-statistic, eg Riley et al, 2003. Data: number of adult worms in cat at sacrifice. a university for the real world R CRICOS No. 000213J Plot of goodness-of-fit statistic versus parameter. Greedy Monte Carlo. Precision of estimate? Source: Riley et al 2003. a university for the real world R CRICOS No. 000213J Outline 1. 2. 3. 4. 5. 6. 7. Some problems with intractable likelihoods. Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. Approximating Hierarchical model Indirect Inference and likelihood free MCMC Conclusions. a university for the real world R CRICOS No. 000213J 3. Normalizing constant/partition function and MCMC (half-way to likelihood free inference) Here we assume (Møller, Pettitt, Reeves and Berthelsen, 2006) f ( y; ) z ( ) p( y | ) known unknown z( ) f ( y; )dy and difficult to find. Key idea Importance sample estimate of Sample y ~ p( y | ) then f ( y; ) z( ) . unbiased estimate of f ( y; ) a university for the z ( ) z ( ) given by z( ) real world R CRICOS No. 000213J Used off-line to estimate z( ) / z( ) then carry out standard MetropolisHastings with z ( ) interpolation over a grid of values.( eg Green z ( ) and Richardson, 2002, in a Potts model). Standard Metropolis Hastings: Simulating from target distribution p( | y) p( y | ) p( ). Acceptance ratio for changing , proposal q( | ) p( y | ) p( ) q( | ) A( | ) p( y | ) p( ) q( | ) f ( y; ) p( ) q( | ) z ( ) f ( y; ) p( ) q( | ) z ( ) accepted with probability min{1, A( | )} . Key Question: Can z ( ) be calculated on-line or avoided? z ( ') a university for the real world R CRICOS No. 000213J On-line algorithm – single auxiliary variable method. Introduce auxiliary variable x on same space as y and extend target distribution for the MCMC p( x, | y) p( x | y, ) p( y | ) p( ). Key Question: How to choose distribution of x so that z ( ) removed from A( | ). Now acceptance ratio is A( , x | , x) as a new pair ( , x) proposed. Proposal q( | ) becomes q( , x | , x) . Assume the factorisation q( , x | , x) q( x | ) q( | , x) Choose the proposal so that q( x | ) f ( x; ) z ( ) Then algebra → cancellation of z( )s and A( , x | , x) does not depend on z ( ) a university for the real world R CRICOS No. 000213J Note: Need perfect or exact simulation from Key Question: How to choose distribution? f ( y; ) for the proposal. z ( ) p( x | y, ) , the auxiliary variable The best choice p( x | y, ) q( x, | x, ) f ( x; ) but z( ) needed in M-H!! z ( ) a university for the real world R CRICOS No. 000213J Choice (i) a university for the real world R CRICOS No. 000213J Choice (ii) a university for the real world R CRICOS No. 000213J Choice (i) f ( x;ˆ( y)) ˆ Fix , say at ( y) a good estimate of . Then p( x | y, ) ˆ z ( ( y)) so does not depend on only y and z (ˆ) cancels in A( | ) . Choice (ii) f ( x; ) , z( ) p( x | y, ) approximation to Eg Partially ordered Markov mesh model for Ising data Comment Both choices can suffer from getting stuck because p( x | y, ) f ( x; ) can be very different from the ideal . z ( ) a university for the real world R CRICOS No. 000213J Example: Ising Model 60 by 60 array with 0 =.2,1 =.1 Single auxiliary variable method (Moller et al, 2006) Auxiliary variable is Choice (ii). Approximation to Ising model. Partially ordered Markov mesh model with same neighbourhood as Ising DAG with N, W as parents, S, E as children Run chain 500,000 iterations and thin 1 in 100 a university for the real world R CRICOS No. 000213J Source: Møller et al, 2006 Single auxiliary method tends to get stuck Murray et al (2006) offer suggestions involving multiple auxiliary variables a university for the real world R CRICOS No. 000213J Outline 1. 2. 3. 4. 5. 6. 7. Some problems with intractable likelihoods. Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. Approximating Hierarchical model Indirect Inference and likelihood free MCMC Conclusions. a university for the real world R CRICOS No. 000213J 4. Likelihood free MCMC Single Auxiliary Variable Method as almost Approximate Bayesian Computation (ABC) We wish to eliminate f ( y; ) / z( ) or equivalently likelihood from the M-H algorithm. Solution: The distribution of x given y and the observed data, p( x | , y) Ind ( x y) then with A( , x | , x) x ~ f ( x; ) z ( ) p( y | ), the puts all probability on y, p( ) q( | ) Ind ( x y ) p( ) q( | ) the likelihood This might work for discrete data, sample size small, and if the proposal q( | ) q( | , y) were a very good approximation to p( | y). If sufficient statistics s(y) exist then Ind (s( x) s( y)) replaces Ind ( x y). a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J Likelihood free methods, ABC- MCMC Change of notation, observed data yobs(fixed), y is pseudo data or auxiliary data generated from the likelihood p( y | ) . Instead of y yobs, now have y close to yobs in the sense of statistics s( distance d ( y, yobs ) || s( y) s( yobs ) || . ABC allows d ( y, yobs ) e rather than equal to 0 Target distribution for variables ( y, ) p( y, | yobs , e ) p( y | ) p( ) Ind (d ( y, yobs ) e ). Standard M-H with proposals ~ q( | , yobs ) (Marjoram et al 2003; ABC MCMC) y ~ p( y | ) for acceptance of ( y, ) ( y, ) . Ideally e should be small but this leads to very small acceptance probabilities. a university for the real world R CRICOS No. 000213J yobs ), Issues of implementing Metropolis-Hastings ABC (a) Tune for e to get reasonable acceptance probabilities; (b) All ( y, ) satisfying d ( y, yobs ) e (hard) accepted with equal probability rather than smoothly weighted by d ( y, yobs ) (soft). (c) statistics Choose summary statistics carefully if no sufficient a university for the real world R CRICOS No. 000213J Tune for e A solution is to allow e to vary as a parameter (Bortot et al, 2004). The target distribution is p( y, , e | yobs ) p( y | ) p( ) Ind (d ( y, yobs ) e ) p(e ). Run chain and post filter output for small values of e a university for the real world R CRICOS No. 000213J Outline 1. 2. 3. 4. 5. 6. 7. Some problems with intractable likelihoods. Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. Approximating Hierarchical model Indirect Inference and likelihood free MCMC Conclusions. a university for the real world R CRICOS No. 000213J Beaumont, Zhang and Balding (2002) use kernel smoothing in ABC-MC a university for the real world R CRICOS No. 000213J Soft Constraint for d ( y, yobs ) (Reeves and Pettitt , 2005) Replace Ind (d ( y, yobs ) e ) by exp( 1 d ( y, yobs )), a soft 2 constraint, with replacing e . Key Idea. Interpret as an approximate likelihood for (yobs | y ) Simple case ( yobs | y ) N ( y, ) with φ known Approximating Hierarchical model with joint probability p ( ) p ( y | ) p ( yobs | y ) a university for the real world R CRICOS No. 000213J Approximating Hierarchical Model a university for the real world R CRICOS No. 000213J Simple case. Normal model with mean , variance 2 ,sample n. Sufficient statistic yobs. d ( y, yobs ) ( y yobs ) 2 and "likelihood" ( yobs | y ) N ( y , ) Integrate out pseudo data y from p ( y, , yobs ) to get marginal p ( , yobs ) to obtain p ( , yobs ) N ( ; yobs , 2 n ) p ( ) Key idea Approximation using pseudo data and match to observed data introduces variance inflation in likelihood approximation. Will affect posterior mean (if prior not improper vague) and variance. a university for the real world R CRICOS No. 000213J General scheme to overcome variance inflation and bias of posterior Implement a tempering scheme with 0 1 k and ˆ j is the posterior mean estimated from chain j . Combine estimates using weighted least squares and bias estimated using a quadratic in , say ˆ j 0 1 j 2 j2 error , j 1, , k 0 gives combined estimate of posterior mean for 0 Similarly for posterior variance and quantile estimates using chain j (compare Liu, 2001, MCMC for "indirect models") a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J Parallel Tempering to improve mixing For chains j -1 and j , propose swaps of ( y, ) values to improve mixing. M-H ratio does not involve intractable likelihood. Soft constraint improves mixing a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J Example: Ising Model (continued) 60 by 60 array with 0 =.2,1 =.1 Compare Exact method, auxiliary variable method ABC with = 502 ,30 2 ,152 with correct sufficient statistics y i and Ind ( y i j i yj) Run chains 500,000 and thin 1 in 100 a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J Outline 1. 2. 3. 4. 5. 6. 7. Some problems with intractable likelihoods. Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. Approximating Hierarchical model Indirect Inference and likelihood free MCMC Conclusions. a university for the real world R CRICOS No. 000213J Indirect Inference (Gourieroux et al, 1993) (also Heggland and Frigessi, 2004) Observe data yobs . Suppose True model pT ( y | ) but Approximating model is intractable pA ( x | ) is tractable, x and y with the same support. Can find ˆ ( x ) easily. Put y into x model and obtain ˆ ( y | ). Repeat many times for simulated y from to find an accurate value ˆ ( ). Find so that a university for the pT ( y | ) ˆ ( yobs ) is close to ˆ ( ) giving ˆ( yobs ). real world R CRICOS No. 000213J Hierarchical Model using ideas of Indirect Inference.(Reeves and Pettitt, 2005; P & R, 2006) Consider the True hierarchical model p ( ) pT ( yobs | ) and the Approximating hierarchical model p ( ) pT ( y | ) p A ( | y ) p A ( yobs | ) with y is pseudo data from pT ( y | ) pT ( y | ) being the True intractable likelihood pA ( | y) is an Approximating model posterior distribution p A ( yobs | ) is an Approximating likelihood evaluated at the observed data yobs Key point: Marginilising the Approximating HM over random y and should be close to the True HM a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J A Metropolis Hastings algorithm for Indirect Inference Can be implemented with the proposal using the idea from Moller et al (2006). q ( , y, | , y, ) q( | ) pT ( y | ) p A ( | y) and then the MH acceptance probability is given with A( , y, | , y, ) p A ( yobs | ) q ( | ) p( ) p A ( yobs | ) q ( | ) p( ) which replaces the likelihood ratio for intractable pT and replaces it by a ratio involving tractable p A . a university for the real world R CRICOS No. 000213J Example: Ising Model (continued) 60 by 60 array with 0 =.2,1 =.1 MH Indirect Inference implemented with Approximate Likelihood taken as the POMM with 2 parameters equivalent to Ising model 0 , 1 20,000 iterations with Approximating posterior found from "side MH chain" with 400 iterations. No summary statistics required, implied by Approximating Likelihood a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J a university for the real world R CRICOS No. 000213J Some points • How could approximate posterior be made more precise? – Use more parameters in approximating likelihood, the POMM? (Gouriéroux at al (1993), Heggland and Frigassi (2004) discuss this in the frequentist setting) – More iterations for side chain “exact” calculation of approximate posterior? • How to choose a good approximating likelihood? • Relationship to summary statistics approach? a university for the real world R CRICOS No. 000213J Outline 1. 2. 3. 4. 5. 6. 7. Some problems with intractable likelihoods. Monte Carlo methods and Inference. Normalizing constant/partition function. Likelihood free Markov chain Monte Carlo. Approximating Hierarchical model Indirect Inference and likelihood free MCMC Conclusions. a university for the real world R CRICOS No. 000213J Conclusions 1. 2. 3. For the normalizing constant problem we presented a single online M-H algorithm. We linked these ideas to ABC-MCMC and developed a hierarchical model (HM) to approximate the true posterior – showed variance inflation. We showed that the approximating HM could be tempered swaps made to improve mixing using parallel chains, variance inflation effect corrected by smoothing posterior summaries from the tempered chains. 4. 5. 6. We extended indirect inference to an HM to find a way of implementing the Metropolis Hastings algorithm which is likelihood free. We demonstrated the ideas with the Ising/autologistic model. Application to specific examples is on-going and requires refinement of general approaches. a university for the real world R CRICOS No. 000213J Acknowledgements Support of the Australian Research Council Co-authors Rob Reeves, Jesper Møller, Kasper Berthelsen Discussions with Malcolm Faddy, Gareth Ridall, Chris Glasbey, Grant Hamilton … a university for the real world R CRICOS No. 000213J