Bayesian Analysis of Computer Code Outputs

Download Report

Transcript Bayesian Analysis of Computer Code Outputs

The Estimation of the Net CO

2

Flux for England & Wales and its Uncertainty using Emulation

Marc Kennedy, Tony O’Hagan, Clive Anderson, Mark Lomas, John Paul Gosling and Ian Woodward (University of Sheffield) Andreas Heinemeyer (University of York)

Carbon flux

 Carbon dioxide (CO2) is one of the principal greenhouse gases that drives global warming  To what extent can vegetation reduce the quantity of CO2 going into the atmosphere?

– Source or sink?

 Kyoto agreement signatories are required each year to account for carbon (C) emissions  How to estimate this?

– – Inventories Models

Computer models

  In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real world processes – For understanding, prediction, control Growing realisation of importance of uncertainty in model predictions – Can we trust them?

– Without any quantification of output uncertainty, it’s easy to dismiss them

Uncertainty analysis

 Consider just one source of uncertainty – We have a computer model that produces output

y

=

f

(

x

) when given input

x

– But for a particular application we do not know x precisely – So

X

is

Y

is a random variable, and so therefore =

f

(

X

) – We are interested in the uncertainty distribution of

Y

– How can we compute it?

Monte Carlo

 The usual approach is Monte Carlo – – – Sample values of

x

from its distribution Run the model for all these values to produce sample values

y i

=

f

(

x i

) These are a sample from the uncertainty distribution of

Y

 Neat but impractical if it takes minutes or hours to run the model – We can then only make a small number of runs

Emulation

 A computer model encodes a function, that takes inputs and produces outputs  An emulator is a statistical approximation of that function – – – Estimates what outputs would be obtained from given inputs With statistical measure of estimation error Given enough training data, estimation error variance can be made small

So what?

  So we can do uncertainty analysis etc fast and efficiently  A good emulator – – – estimates the model output accurately with small uncertainty and runs “instantly” Conceptually, we – – use model runs to train the emulator then derive any desired properties of model

Gaussian process

  We use Gaussian process (GP) emulation – – – Nonparametric, so can fit any function Error measures can be validated Analytically tractable, so can often do uncertainty analysis etc analytically – Highly efficient for up to 100 inputs The method uses Bayesian theory – Formally, the posterior distribution of the function is a GP – This posterior distribution is the emulator

BACCO

    This has led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models All based on building the GP emulator of the model from a set of training runs This area is known as BACCO – Bayesian Analysis of Computer Code Output Includes not just uncertainty analysis – Sensitivity analysis, calibration, data assimilation, validation, optimisation …

CTCD and MUCM

 Centre for Terrestrial Carbon Dynamics (CTCD) – http://ctcd.group.shef.ac.uk

– Mission: To understand C fluxes from vegetation  Managing Uncertainty in Complex Models (MUCM) – http://mucm.group.shef.ac.uk

– To develop robust and widely applicable BACCO methods

The England & Wales carbon flux in 2000

  Recent application of these methods Dynamic vegetation model (SDGVMd) – Predicts carbon sequestration and release from vegetation and soils  NBP (net biosphere production)  GPP (gross primary production) – – Over 700 pixels across E&W 4 plant functional types separately modelled  Deciduous broadleaf (DcBl), evergreen needleleaf (EvNl), C3 grasses and crops

SDGVMd outputs for 2000

Outline of analysis

1.

2.

3.

Build emulators for each PFT at a sample of sites Identify most important inputs Define distributions to describe uncertainty in important inputs – – Analysis of soils data Elicitation of uncertainty in PFT parameters – Need to consider correlations

4.

5.

6.

Carry out uncertainty analysis in each sampled site Interpolate across all sites – Mean corrections and standard deviations Aggregate across sites and PFTs – Allowing for correlations

Sensitivity analysis for one pixel/PFT

Elicitation

 Beliefs of expert (developer of SDGVMd) regarding plausible values of PFT parameters – Important to allow for uncertainty about mix of species in a pixel and role of parameter in the model  In the case of leaf life span for evergreens, this was more complex

EvNl leaf life span

Correlations

 PFT parameter in one pixel may differ from in another – Because of variation in species mix – Common uncertainty about average over all species induces correlation  Elicit beliefs about average over whole UK – EvNl joint distributions are mixtures of 25 components, with correlation both between and within years

Mean NBP corrections

NBP standard deviations

Land cover (from LCM2000)

Aggregate across 4 PFTs

Sensitivity analysis

  Map shows proportion of overall uncertainty in each pixel that is due to uncertainty in the parameters of PFTs – As opposed to soil parameters Contribution of PFT uncertainty largest in grasslands/moorlands

Aggregate over England & Wales

PFT Grass Crop Deciduous Evergreen Covariances Total Plug-in estimate (Mt C) 5.279

0.853

2.132

0.798

9.061

Mean (Mt C) 4.639

0.445

1.683

0.781

7.548

Variance (Mt C 2 ) 0.269

0.034

0.013

0.001

0.001

0.321

Conclusions

  BACCO methods offer a powerful basis for computation of uncertainties in model predictions Analysis of E&W aggregate NBP in 2000 – Good case study for uncertainty and sensitivity analyses – – Involved several technical extensions Has important implications for our understanding of C fluxes – Policy implications